In this post I will give you a tip on how long it would be good to keep transaction logs. To understand why, basics of point-in-time recovery from backups are explained. If you are in a hurry: keep transaction log backups up to the second latest full backup, so you can recover if your last full backup is corrupted.
Restore from backup – basics
Full backup of our database might be several hundred GBs, and to write it to the backup location can take hours. So full backup is the widest box on the the time axis. If we restore from the full backup, it can bring the database only into one point-in-time: time when backup ENDED. That is a bit surprising, because one might think that the „snapshot“ is taken of the state as the file was at the START, and it is true, but also part of the log file is included in backup file so it can roll forward to the point when full backup nearly finished. You cant use that log portion to STOPAT some other point in time, it is possible to restore only to a single LSN of that full backup. We can restore to a point where the full backup ends, and nothing else.
Differential backup is my favourite. I like it very much because it is so much smaller and so much faster than full backup. It contains only the difference from the last FULL backup. Not from the last differential backup, but from the last FULL backup. That means for a restore, we need a full backup, and only ONE differential backup. We do not need differential backups that are in between. As with full backup, it will bring the database to only one point in time, approximately at the time when differential backup finished.
Here is the downside: If the full backup is not accessible (e.g. corrupted), all the differential backups based on that full backup become useless. Because we need a full backup from which differential backup is a difference. If somebody manually makes a full backup, puts it on his laptop and runs away with it so we cannot access it, all the differential backups created after the full backup are useless. Until we make a new full backup so it becomes a new base for the differential backups that follow. You might want to create an alert if someone makes a full backup outside of supposed location, and does a full backup as a countermeasure.
Transaction log backups are very different from both full and differential backups. They are like shooting a movie, capturing all the movements on the scene, all the changes of the data as they happen. Each one continue where the last one ended. Not where full backup ended, but where kast transaction log backup ended. That means we have a chain of transaction log backups that are all one next to each other. Unbroken chain, without any gaps.
If we have a database restored to a certain point in time already, e.g. with restoring full backup, if that point falls inside tlog backup file you have, you can use that tlog backup file to roll forward into any future point in time, using the chain of gap-less transaction log backups, as they are a railway for the train. You can stop at any point you like. But if you come across the gap, that gap cannot be crossed. You can jump over that gap only with full or diff backup restored to a point after that gap.
SQL in FULL recovery model has ability to restore to ANY point in time, not just to a point when transaction log backup was taken – it is much more powerful than that. It is like a continuous backup.
For example, if someone accidentally deletes the data, you can restore the transaction log backups and STOP AT a microsecond before that delete command, and save the data. That works even if log backup period is very long, e.g. few hours – you can restore to literally ANY point in time in the tlog backup chain. Very powerful.
That is also the disadvantage. If one of the tlog backup files is corrupted, that is a gap. You cannot restore after that point using the log backups. You must do a full or diff backup to jump across that gap.
So, back to the main question of this video:
How long should we keep the transaction log files?
The short answer is: to a SECOND latest full backup. If you do a full backup once per week, differential once per day, and transaction once per hour, you should keep tlog backups two weeks at least.
Why? The reason is, if the last full backup is corrupted, all the differential backups are useless, and your log backup chain is too short to recover from the last diff backup to the current time.
But, if your log backup chain is long enough to reach the second full backup, you can restore that second full backup, and use the log backup chain to bring the database all the way to the present time, not losing any data. And that is awesome!
Take care and see you later!
There is no general recepy for specific situation. For example, you might have a huge log generation rate, so you can’t afford to keep so much tlog backups. You will choose other methods to assure you can recover, e.g. multiple copies of the backups, so the chance of corruption is smaller. But, if you can, keep togs at least that long (to reach the second latest full backup). That is a minimum I would recommend. Sometimes the business owners will demand (RPO) to keep it even longer.