I have the following scenario:
Our system is running a SQL Server Express 2005 database locally (on each users desktop, if you will). The system is storing a lot of production data from a machine. There are high demands on the safety of the data, and doing a backup each night, or even each hour is not enough. We need a backup strategy that will ensure almost instantaneous/continuous backup of the database.
Is there anyone out there that has successfully implemented a system similar to this, and/or has got some ideas of how to accomplish it? The only thing I can think of right now is to have mirrored drives (raid) to hold the data, but that would be complicated and expensive.
I would appreciate any and all thoughts on this, since it is a real issue for me and my company. Thanks in advance!
Update:
I was not clear enough in my description of the scenario. The system is storing data in a vehicle that has no connection to anything. A centralized database is therefor not possible. Neither can we use a standard/enterprise version of SQL Server, since it would be to expensive (each vehicle would need a license). Thanks for your input!
Switch your database into "Full" recovery mode. Do full backup every night and do delta backup after major user action. The delta backups can be done to the flash memory or different hard-drive, and all data can be synchronized with server when online.
Another simple way is to trace all user changes and important data in a text file that stored on a separate drive. If SQL database crashes the user or other operator can repeat steps to restore data.
One way I've seen this done is by using DoubleTake.
I will assume that a central database on a server is not feasible because your systems are running standalone and are not connected to anything. So this is what I would do
Set up RAID on the computer. This insures you against simple disk failure.
Any SQLSever database can be recovered to the point of the last commited transaction if you have a full database backup and a set of transaction logs available. Basically you simply restore the last full backup then apply the transaction logs going forward. See these links.
http://www.enterpriseitplanet.com/storage/features/article.php/11318_3776361_3
https://web.archive.org/web/1/http://blogs.techrepublic%2ecom%2ecom/datacenter/?p=132
So what you need to do is set up a periodic full backup of both the database and transaction logs, and more regular transaction log backups (and ensure that your transaction log can never run out of space).
In the event of failure you restore the last full backup, then apply the transaction logs going forward.
Myself, if these are critical systems, I would be inclined to add an additional drive to the system and make sure that the backups are copied over to that. This is because as good as raid is it does sometimes have issues - raid controllers fail, disks get wiped accidentally in parallel, disk failures go unnoticed so your just running on one disk etc. If you ensure backups are copied to a separate disk then you can always recover to the last transaction log backup. You should also ensure tape backups of course, but they are generally a last resort in the event of trouble.
If for some reason you cannot set up raid then you should still install a second disk, but place the database file on one drive and the transaction log on the other and copy backups to both disks. In the event of failure of the C drive, or some other software issue crashing the database you can still recover to the last commited transaction. Failure on the D drive limits you to the last transaction log backup (Oracle used to allow you to mirror the transaction log from the database, which again would completely cover you, but I don't think this facility exists in SQL Server)
If you are looking for a scheduler for SQL Server Express (which doesn't come with one) then I've been using SQLScheduler quite happily without problems, and it's free.
The most obvious answer would be to ditch SQL Server Express running locally and use a single source for your data (such as a standard SQL server install on a central storage location). Unless your system requires individual back ups of every single person's own individual instance of SQL Server Express.
If your requirements are so stringent as to call for instantaneous backups on every operation, you should definitely think about a different method of storage than local instances of SQL Server Express.
Wouldn't it be easier to just use one centralized SQL Server and back that up every hour or so? If you truly need instantaneous backup, your company (which seems not to want to spend money by installing Express on each machine) will need to spring for two servers and two SQL Server Enterprise licenses to implement Mirroring.
Raid isn't that expensive, but it is also not the best option. If you really want high availability data you should upgrade to sql server standard on a remote server where each user connects to and use transaction based replication to an sql server (express) instance on another machine. Raid doesn't always protect you from dataloss. If the data is that important for you then the costs should not be that much of an issue.
Update in response to the question update.
If you can't use remote servers then there a couple of options:
You write a trigger which initiates a backup script on each insert or update and stores it on a seperate harddrive.
You use raid. But beware that if the raid controller fails that you still got a problem.
RAID is not expensive. Use RAID to protect against hard drive failure. You also need monitoring though. No point in having this if you let both drives fail.
Also, implement hourly incremental backups, then daily incremental backups and finally weekly full backups.
You need all of these strategies working together because they protect against different things. RAID does not protect against human or coding errors destroying data. Hourly and weekly backups don't protect against hard drive failure.
Related
We have some machines storing experimental data on Win7 Machines. The experimental software is only compatible with Windows 7 and as such our IT department does not want to connect to this isolated machine anymore. Is there a way to completely "mirror" this Win7 PCs SQL server or completely copy it, and keep it relatively up to date, on a win 10 PC? and by server I mean multiple databases which may grow
in number. Thank you.
Is there a way to completely "mirror" this Win7 PCs SQL server or
completely copy it, and keep it relatively up to date
What you seem to want is LOG SHIPPING. The basic steps are:
Back Up your DB
Restore it on N number of other servers that you want to keep relatively up to date
Configure log shipping, either through he wizard or manually. This will ship the transaction log backups to your other servers
Restore the transaction logs
Of course, this is all automated once you set it up. The relatively up to date is based on how often you do the transaction log backups, how long it takes to travel, and how long it takes to restore. If you only want it updated daily, or weekly, or some other time--you could just copy your DB backups to each of the other servers via PowerShell or what ever method you want, and restore the FullBackup instead of all of the transaction logs. This is a lot simpler, and the only downtime would be during the restore of your secondary servers (how ever long the restore takes).
There are other options which include Database Mirroring and Availability Groups. Each have their own pros and cons, and come at a price.
I'm using PostgreSQL v9.1 for my organization. The database is hosted in Amazon Web Services (EC2 instance) below a Django web-framework which performs tasks on the database (read/write data). The problem is, to backup this database in a periodic fashion in a specified format (see Requirements).
Requirements:
A standby server is available for backup purposes.
The master-db is to be backed up every hour. Once the hour is ticked, the db is quickly backed up in entirety and then copied to slave in a file-system archive.
Along with hourly backups, I need to perform a daily backup of the database at midnight and a weekly backup on midnight of every Sunday.
Weekly-backups will be the final backups of the db. All weekly-backups will be saved. Daily-backups of the last week will only be saved and Hourly-backups of the last day will only be saved.
But I have the following constraints too.
Live data comes into the server every day (rate of insertion is per 2 seconds).
The database now hosting critical customer data which implies that it cannot be turned off.
Usually, data stops coming into the db during nights, but there's a good chance that data might be coming into master-db during some nights for which I have no control over to stop the insertions (Customer-data will be lost)
If I use traditional backup mechanisms/software (example, barman), I've to configuring archiving mode in postgresql.conf and authenticate users in pg_hba.conf which implies I need a server-restart to turn it on which again, stops the incoming data for some minutes. This is not permitted (see above constraint).
Is there a clever way to backup the master-db for my needs? Is there a tool which can automate this job for me?
This is a very crucial requirement as data has begun to appear into the master-db since few days and I need to make sure there's replication of master-db on some standby-server all the time.
Use EBS snapshots
If, and only if, your entire database including pg_xlog, data, pg_clog, etc is on a single EBS volume, you can use EBS snapshots to do what you describe because they are (or claim to be) atomic. You can't do this if you stripe across multiple EBS volumes.
The general idea is:
Take an EBS snapshot using the EBS APIs using command line AWS tools or a scripting interface like the wonderful boto Python library.
Once the snapshot completes, use AWS API commands to create a volume from it and attach the volume your instance, or preferably to a separate instance, and then mount it.
On the EBS snapshot you will find a read-only copy of your database from the point in time you took the snapshot, as if your server crashed at that moment. PostgreSQL is crashsafe, so that's fine (unless you did something really stupid like set fsync=off in postgresql.conf). Copy the entire database structure to your final backup, e.g archive it to S3 or whatever.
Unmount, unlink, and destroy the volume containing the snapshot.
This is a terribly inefficient way to do what you want, but it will work.
It is vitally important that you regularly test your backups by restoring them to a temporary server and making sure they're accessible and contain the expected information. Automate this, then check manually anyway.
Can't use EBS snapshots?
If your volume is mapped via LVM, you can do the same thing at the LVM level in your Linux system. This works for the lvm-on-md-on-striped-ebs configuration. You use lvm snapshots instead of EBS, and can only do it on the main machine, but it's otherwise the same.
You can only do this if your entire DB is on one file system.
No LVM, can't use EBS?
You're going to have to restart the database. You do not need to restart it to change pg_hba.conf, a simple reload (pg_ctl reload, or SIGHUP the postmaster) is sufficient, but you do indeed have to restart to change the archive mode.
This is one of the many reasons why backups are not an optional extra, they're part of the setup you should be doing before you go live.
If you don't change the archive mode, you can't use PITR, pg_basebackup, WAL archiving, pgbarman, etc. You can use database dumps, and only database dumps.
So you've got to find a time to restart. Sorry. If your client applications aren't entirely stupid (i.e. they can handle waiting on a blocked tcp/ip connection), here's how I'd try to do it after doing lots of testing on a replica of my production setup:
Set up a PgBouncer instance
Start directing new connections to the PgBouncer instead of the main server
Once all connections are via pgbouncer, change postgresql.conf to set the desired archive mode. Make any other desired restart-only changes at the same time, see the configuration documentation for restart-only parameters.
Wait until there are no active connections
SIGSTOP pgbouncer, so it doesn't respond to new connection attempts
Check again and make sure nobody made a connection in the interim. If they did, SIGCONT pgbouncer, wait for it to finish, and repeat.
Restart PostgreSQL
Make sure I can connect manually with psql
SIGCONT pgbouncer
I'd rather explicitly set pgbouncer to a "hold all connections" mode, but I'm not sure it has one, and don't have time to look into it right now. I'm not at all certain that SIGSTOPing pgbouncer will achieve the desired effect, either; you must experiment on a replica of your production setup to ensure that this is the case.
Once you've restarted
Use WAL archiving and PITR, plus periodic pg_dump backups for extra assurance.
See:
WAL-E
PgBarman
... and of course, the backup chapter of the user manual, which explains your options in detail. Pay particular attention to the "SQL Dump" and "Continuous Archiving and Point-in-Time Recovery (PITR)" chapters.
PgBarman automates PITR option for you, including scheduling, and supports hooks for storing WAL and base backups in S3 instead of local storage. Alternately, WAL-E is a bit less automated, but is pre-integrated into S3. You can implement your retention policies with S3, or via barman.
(Remember that you can use retention policies in S3 to shove old backups into Glacier, too).
Reducing future pain
Outages happen.
Outages of single-machine setups on something as unreliable as Amazon EC2 happen a lot.
You must get failover and replication in place. This means that you must restart the server. If you do not do this, you will eventually have a major outage, and it will happen at the worst possible time. Get your HA setup sorted out now, not later, it's only going to get harder.
You should also ensure that your client applications can buffer writes without losing them. Relying on a remote database on an Internet host to be available all the time is stupid, and again, it will bite you unless you fix it.
I have installed SQL server database (mainserver) in one instance and SQL server database for RerportServer in others. what is the best way to replicate data from mainServer to report Server? Data in mainServer changes frequently and actual information in the ReportSever is very important.
And there is many ways to do this:
mirroring
shipping log
transactional replication
merge replication
snapshot replication
are there some best-practices about this?
Thanks
You need Transactional Replication for your case. Here is why you would not need the other 4 cases:
Mirroring
This is generally used to increase the availability of a database server and provides for automatic failover in case of a disaster.
Typically even though you have more than a single copy of the database (recommended to be on different server instances), only one of them is active at a time, called the principle server.
Every operation on this server instance is mirrored on the others continuously (as soon as possible), so this doesn't fit your use case.
Log Shipping
In this case, apart from the production database servers, you have extra failover servers such that the backup of the production server's database, differential & transactional logs are automatically shipped (copied) to the failovers, and restored.
The replication here is relatively scheduled to be at a longer interval of time than the other mechanisms, typically ranging from an hour to a couple of hours.
This also provides for having the failver servers readies manually in case of a disaster at the production sites.
This also doesn't fit your use case.
Merge Replication
The key difference between this and the others is that the replicated database instances can communicate to the different client applications independent of the changes being made to each other.
For example a database server in North America being updated by clients across Americas & Europe and another one in Australia being updated by clients across the Asia-Pacific region, and then the changes being merged to one another.
Again, it doesn't fit your use case.
Snapshot Replication
The whole snapshot of the database is published to be replicated to the secondary database (different from just the log files being shipped for replication.)
Initially however, for each type of replication a snapshot is generated to initialized the subscribing database, i.e only once.
Why you should use Transactional Replication?
You can choose the objects (Tables, Views, etc) to be replicated continuously, so if there are only a subset of the tables which are used to reporting, it would save a lot of bandwidth. This is not possible in Mirroring and Log Shipping.
You can redirect traffic from your application to the reporting server for all the reads and reports (which you can also do in others too, btw).
You can have independent batch jobs generating some of the more used reports running on the reporting server, reducing the load on the main server if it has quite frequent Inserts, Updates or Deletes.
Going through your list from top to bottom.
Mirroring: If you mirror your data from your mainServer to your reportServer you will not be able to access your reportServer. Mirroring puts the mirrored database into a continuous restoring state. Mirroring is a High Availability solution. In your case the reportServer will only be available to query if you do a fail over. The mirrored server is never operational till fail over. This is not what you want as you cannot use the reportServer till it is operational.
Log Shipping: Log shipping will allow you to apply transactional log backups on a scheduled event to the reportServer. If you backup the transaction log every 15 minutes and apply the data to the reportServer you will have a delay of 15+ minutes between your mainServer and Log server. Mirroring is actually real time log shipping. Depending on how you setup log shipping your client will have to disconnect while the database is busy restoring the log files. Thus during a long restore it might be impossible to use reporting. Log Shipping is also a High Availability feature and not really useful for reporting. See this link for a description of trying to access a database while it is trying to restore http://social.msdn.microsoft.com/forums/en-US/sqldisasterrecovery/thread/c6931747-9dcb-41f6-bdf4-ae0f4569fda7
Replication : I am lumping all the replication together here. Replication especially transactional replication can help you scale out your reporting needs. It would generally be mush easier to implement and also you would be able to report on the data all of the time where in mirroring you cant report on the data in transaction log shipping you will have gaps. So in your case replication makes much more sense. Snapshot replication would be useful if your reports could be say a day old. You can make a snapshot every morning of the data you need from mainServer and publish this to the subscribers reportServer. However if the database is extremely large then Snapshot is going to be problematic to deal with on a daily basis. Merge replication is only usefull when you want to update the replicated data. In your case you want to have a read only copy of the data to report on so Merge replication is not going to help. Transactional Replication would allow you to send replications across the wire. In your case where you need frequently updated information in your reportServer this would be extremely useful. I would probably suggest this route for you.
Just remember that by implementing the replication/mirroring/log shipping you are creating more maintenance work. Replication CAN fail. So can mirroring and so can transaction log shipping. You will need to monitor these solutions to make sure they are running smoothly. So the question is do you really need to scale out your reports to another server or maybe spend time identifying why you cant report on the production server?
Hope that helps!
Warm Standby SQL Server/Web Server
This question might fall into the IT category but as the lead developer I must come up with a solution to integration as well as pure software issues.
We just moved our web server off site and now I would like to keep a warm standby of both the website and database (SQL 2005) on site.
What are the best practices for going about doing this with the following environment?
Offsite: our website and database (SQL 2005) are on one Windows 2003 server. There is a firewall in front of the server which makes
replication or database mirroring not an option. A vpn is also not an option.
My thoughts as a developer were to write a service which runs at the remote site to zip up and ftp the database backup to an ftp server
on site. Then another process would unzip the backup and restore it to the warm standby database here.
I assume I could do this with the web site as well.
I would be open to all ideas including third party solutions.
If you want a remote standby you probably want to look into a log shipping solution.
This article may help you out. In the past I had to develop one of these solutions for the exact same problem, writing it from scratch is not too hard. The advantage you get with log shipping is that you have the ability to restore to any point in time and you avoid shipping these big heavy full backups around and instead ship light transaction log backups, and occasionally a big backup.
You have to keep in mind that transaction log backups are useless without having both the entire sequence of transaction log backups and a full backup.
You have exactly the right idea. You could maybe write a script that would insert the names of the files that you moved into a table that your warm server could read. Your job could then just poll this table at intervals and not worry about timing.
Forget about that - just found this. Sounds like what you are setting out to do.
http://www.sqlservercentral.com/articles/Administering/customlogshipping/1201/
I've heard good things about Syncback:
http://www.2brightsparks.com/syncback/sbpro-features.html
Thanks for the link to the article sambo99. Transaction log shipping was my original idea, but I dismissed it because the servers are not in the same domain not even in the same time zone. My only method of moving the files from point A to point B is via FTP. I am going to experiment with just shipping the transaction logs. And see if there is a way to fire off a restore job at given intervals.
www.FolderShare.com is a good tool from Microsoft. You could log ship to a local directory and then synchronize the directory to any other machine. You could also syncrhronize the website folders as well.
"Set it and forget it" type solution. Setup your SQL jobs to clear older files and you'll never have to edit anything after the initial install.
FolderShare (free, in beta) is currently limited to 10,000 files per library.
For all interested the following question also ties into my overall plan to implement log shipping:
SQL Server sp_cmdshell
The backup and restore process of a large database or collection of databases on sql server is very important for disaster & recovery purposes. However, I have not found a robust solution that will guarantee the whole process is as efficient as possible, 100% reliable and easily maintainable and configurable accross multiple servers.
Microsft's Maintenance Plans doesn't seem to be sufficient. The best solution I have used is one that I created manually using many jobs with many steps per database running on the source server (backup) and destination server (restore). The jobs use stored procedures to do the backup, copying & restoring. This runs once a day (full backup/restore) and intraday every 5 mins (transaction log shipping).
Although my current process works and reports any job failures via email, I know the whole process isn't very reliable and cannot be easily maintained/configured on all our servers by a non-DBA without having in-depth knowledge of the process.
I would like to know if others have this same backup/restore process and how others overcome this issue.
I've used a similar step to keep dev/test/QA databases 'zero-stepped' on a nightly basis for developers and QA folks to use.
Documentation is the key - if you want to remove what Scott Hanselman calls 'bus factor' (i.e. the danger that the creator of the system will get hit by a bus and everything starts to suck).
That said, for normal database backups and disaster recovery plans, I've found that SQL Server Maintenance Plans work out pretty well. As long as you include:
1) Decent documentation
2) Routine testing.
I've outlined some of the ways to go about doing that (for anyone drawn to this question looking for an example of how to go about creating a disaster recovery plan):
SQL Server Backup Best Practices (Free Tutorial/Video)
The key part of your question is the ability for the backup solution to be managed by a non-DBA. Any native SQL Server answer like backup scripts isn't going to meet that need, because backup scripts require T-SQL knowledge.
Because of that, you want to look toward third-party solutions like the ones Mitch Wheat mentioned. I work for Quest (the makers of LiteSpeed) so of course I'm partial to that one - it's easy to show to non-DBAs. Before I left my last company, I had a ten minute session to show the sysadmins and developers how the LiteSpeed console worked, and that was that. They haven't called since.
Another approach is using the same backup software that the rest of your shop uses. TSM, Veritas, Backup Exec and Microsoft DPM all have SQL Server agents that let your Windows admins manage the backup process with varying degrees of ease-of-use. If you really want a non-DBA to manage it, this is probably the most dead-easy way to do it, although you sacrifice a lot of performance that the SQL-specific backup tools give you.
I am doing precisely the same thing and have various issues semi regularly even with this process.
How do you handle the spacing between copying the file from Server A to Server B and restoring the transactional backup on Server B.
Every once in a while the transaction backup is larger than normal and takes a longer time to copy. The restore job then gets an operating system error that the file is in use.
This is not such a big deal since the file is automatically applied the next time around however it would be nicer to have a more elegant solution in general and one that specifically fixes this issue.