We have a database on SQL Server 2000 which should be truncated from time to time. It looks like the easiest solution would be to create a duplicate database and copy the primary database there. Then the primary database may be safely truncated by specially tailored stored procedures.
One way replication would guarantee that the backup database contains all updates from the primary one.
We plan to use backup database for reporting and primary for operative data.
Primary database will be truncated at night once in 2 days.
Database is several gigabytes. Only several tables are quite large (1-2 mln rows)
What are possible pitfalls? How reliable would such a solution be? Will it slow down the primary database?
Update: Variant with DTS for doing copying sounds good but has own disadvantages. It requires quite robust script which would run for about an hour to copy updated rows. There is also issue with integrity constraints in primary database which would make truncating it non-trivial task. Because of this replication cold straighten things up considerably.
It is also possible but not quite good variant to use union VIEW because system which woks mostly in unattended mode whiteout dedicated support personnel. It is related issue but not technical though.
While replication is usually robust, there are times where it can break and require a refresh. Managing and maintaining replication can become complicated. Once the primary database is truncated, you'll have to make sure that action is not replicated. You may also need an improved system of row identification as after you've truncated the primary database tables a couple of times, you'll still have a complete history in your secondary database.
There is a performance hit on the publisher (primary) as extra threads have to run to read the transaction log. Unless you're under heavy load at the moment, you likely won't notice this effect. Transaction log management can become more important also.
Instead, I'd look at a different solution for your problem. For example, before truncating, you can take a backup of the database, and restore it as a new database name. You then have a copy of the database as it was before the truncation, and you can query both at once using three-part names.
You've mentioned that the purpose of the secondary data is to keep report off. In this case you can create a view like SELECT * FROM Primary.dbo.Table UNION ALL SELECT * FROM SecondaryDBJune2008.dbo.Table UNION ALL SELECT * FROM SecondaryDBOctober2008.dbo.Table. You wouild then need to keep this view up to date whenever you perform a truncate.
The other alternative would be to take a snapshot of the current data before truncation and insert it into a single reporting database. Then you'd just have the Primary and the Historical databases - no need to modify views once they're created.
How much data are we talking about in GB?
As you're planning to perform the truncation once every two days, I'd recommend the second alternative, snapshotting the data before truncation into a single Historical database. This can be easily done with a SQL Agent job, without having to worry about replication keeping the two sets of data in synch.
I would not use replication for this. We have a fairly complex replication setup running with 80+ branches replicating a few tables to one central database. When connectivity goes down for a few days, the data management issues are hair raising.
If you want to archive older data, rather use DTS. You can then build the copying and truncation/deletion of data into the same DTS package, setting it so that the deletion only happens if the copy was successful.
Related
My main application has nightly SSIS jobs that move some (not all) of the data from the production servers to various test and development servers.
These jobs do nothing fancy. They delete the destination table and repopulate it from the source. There are many dozens of tables and 4 or 5 servers in the mix, with plenty of foreign keys, but everything is SQL to SQL and there is no merging or lookups.
Using SSIS to do this has proven painfully brittle. When a new application release changes the schema, more than half the time, jobs begin failing. Why? The updates are done by the developers, and the packages have been tweaked and changed dozens of times by different developers and often the SSIS changes happen during crunch time of the development cycle.
It has occurred to me that SSIS may or may not be the right tool for this. SSIS is pretty bloated for simple table copies. (Column Mapping, etc.) Is there a better way? Ideally, it would simply take as input (preferably from a central source):
an unordered list of tables (Nothing but exact match column naming supported)
a source server
a destination server
It would then simply:
- Begin execution on a schedule at the designated server.
- sort the list referentially (to not violate FK constraints)
- delete/truncate all the destination tables in referential order
- Copy tables from source to destination in referential order
- Report back success or failure.
The only challenging things on the list (I think) would be a good low-maintenance way to schedule the jobs and for the jobs to report back success or failure.
[Note: I'm not looking for technical details. I am simply looking for a lower maintenance, less brittle way to make these data moves happen. I'll post my initial idea below as a possible solution, but I'll be quite happy if there is a simpler solution out there.]
Instead of truncating tables and copying data, you could DROP the tables, and use the Transfer SQL Server Objects Task to copy the table, which would include DDL changes.
Caveat being that you would have to handle foreign keys accordingly.
I have an application that is in production with its own database for more than 10 years.
I'm currently developing a new application (kind of a reporting application) that only needs read access to the database.
In order not to be too much linked to the database and to be able to use newer DAL (Entity Framework 6 Code First) I decided to start from a new empty database, and I only added the tables and columns I need (different names than the production one).
Now I need some way to update the new database with the production database regularly (would be best if it is -almost- immediate).
I hesitated to ask this question on http://dba.stackexchange.com but I'm not necessarily limited to only using SQL Server for the job (I can develop and run some custom application if needed).
I already made some searches and had those (part-of) solutions :
Using Transactional Replication to create a smaller database (with only the tables/columns I need). But as far as I can see, the fact that I have different table names / columns names will be problematic. So I can use it to create a smaller database that is automatically replicated by SQL Server, but I would still need to replicate this database to my new one (it may avoid my production database to be too much stressed?)
Using triggers to insert/update/delete the rows
Creating some custom job (either a SQL Job or some Windows Service that runs every X minutes) that updates the necessary tables (I have a LastEditDate that is updated by a trigger on my tables, so I can know that a row has been updated since my last replication)
Do you some advice or maybe some other solutions that I didn't foresee?
Thanks
I think that the Transactional replication is the better than using triggers.
Too much resources would be used in source server/database due to the trigger fires by each DML transaction.
Transactional rep could be scheduled as a SQL job and run it few times a day/night or as a part of nightly scheduled job. IT really depends on how busy the source db is...
There is one more thing that you could try - DB mirroring. it depends on your sql server version.
If it were me, I'd use transactional replication, but keep the table/column names the same. If you have some real reason why you need them to change (I honestly can't think of any good ones and a lot of bad ones), wrap each table in a view. At least that way, the view is the documentation of where the data is coming from.
I'm gonna throw this out there and say that I'd use Transaction Log shipping. You can even set the secondary DBs to read-only. There would be some setting up for full recovery mode and transaction log backups but that way you can just automatically restore the transaction logs to the secondary database and be hands-off with it and the secondary database would be as current as your last transaction log backup.
Depending on how current the data needs to be, if you only need it done daily you can set up something that will take your daily backups and then just restore them to the secondary.
In the end, we went for the Trigger solution. We don't have that much changes a day (maybe 500, 1000 top), and it didn't put too much pressure on the current database. Thanks for your advices.
Whats the best way to track/Log inserted/updated/deleted rows in all tables for a given database in SQL Server 2008?
Or is there a better "Audit" feature in SQL Server 2008?
Short answer is that there is no one single solution fits all. It depends on the system but and requirements but here are couple different approaches.
DML Triggers
Relatively easy to implement, because you have to write one that works well for one table and then apply it to other tables.
Downside is that it can get messy when you have a lot of tables and even more triggers. Managing 600 triggers for 200 tables (insert, update and delete trigger per table) is not an easy task.
Also, it might cause a performance impact.
Creating audit triggers in SQL Server
Log changes to database table with trigger
Change Data Capture
Very easy to implement, natively supported but only in enterprise edition which can cost a lot of $ ;). Another disadvantage is that CDC is still not as evolved as it should be. For example, if you change your schema, history data is lost.
Transaction log analysis
Biggest advantage of this is that all you need to do is to put the database in full recovery mode and all info will be stored in transaction log
However, if you want to do this correctly you’ll need a third party log reader because this is not natively supported.
Read the log file (*.LDF) in SQL Server 2008
SQL Server Transaction Log Explorer/Analyzer
If you want to implement this I’d recommend you try out some of the third party tools that exist out there. I worked with couple tools from ApexSQL but there are also good tools from Idera and Netwrix
ApexSQL Log – auditing by reading transaction log
ApexSQL Comply – uses traces in the background and then parses those traces and stores results in central database.
Disclaimer: I’m not affiliated with any of the companies mentioned above.
Change Data Capture is designed to do what you want, but it requires each table be set up individually, so depending on the number of tables you have, there may be some logistics to it. It will also only store the data in capture tables for a couple of days by default, so you may need an SSIS package to pull it out and store for longer periods.
I don't remember whether there is already some tool for this, but you could always use triggers (then you will have access for temporal tables with changed rows- INSERTED and DELETED). Unfortunately, it could be quite a work to do if you would like to track all tables. I believe that there should be some simpler solution, but do not remember as I said.
EDIT.
Maybe this could be helpful:
--Change tracking
http://msdn.microsoft.com/en-us/library/cc280462.aspx
http://msdn.microsoft.com/en-us/library/cc280386.aspx
This allows you to do audits at the database level; it may or may not be enough to meet the business requirements, as database records usually don't make all that much sense without the logic to glue them together. For instance, knowing that user x inserted a record into the "time_booked" table with a foreign key to the "projects", "users", "time_status" tables may not make all that much sense without the SQL query to glue those 4 tables together.
You may also need to have each database user connect with their own user ID - this is fine with integrated security and a client app, but probably won't work with a website using a connection pool.
The sql server logs are not possible to analyze just like that. There are some 3rd party tools available to read the logs but as far as I know you can't query them for statistics and such. If you need this kind of info you'll have to create some sort of auditing to capture all these events in separate tables. You can use "DDL triggers".
I am new in designing a ETL process. Currently I have two database, one is the live database where the application use it for every day transaction. The other one is the data warehouse.
I have a table in the live database that regularly have new data insert into it. The goal is that every night the ETL Process will transfer the data in the live database to the data warehouse, follow by deleting the data in the live database.
Due to my lack of knowledge, the solution that I got is to implement something call a rolling table. Basically on the live database, I have two tables that have the same structure. I call them tblLive1 and tblLive2. I also has a synonym call tblLive. All insert is done on the synonym. The synonym would point at one of the table.
When I run the ETL process, I have a stored procedure that would drop and create a new synonym that point to tblLive2. This allow the ETL process to transform data from tblLive1 without effecting the application. The assumption is that the ETL Process takes an hour to run, and I won't want the ETL process lock the table preventing the application insert new data to it.
This solution should theoretically work, but not elegant.
I am sure this problem is a common problem, are there any other solutions out there?
To add to Bob's answer (above), It is usual in DWH/BI applications, that all necessary tables are essentially copied into a "staging" database or a "staging" schema on your DWH database(depending on the number of tables / size etc). These would ordinarily be on a different server to your OLTP system - for a DWH implementation of any size that is)
To answer the question on performance impact, it depends on your server spec/io configuration.
Is data being inserted into the OLTP system 24hours/day? or are there downtimes? or low traffic times?
It might be worthwhile using database compression as IO is going to be your biggest enemy and this will help considerably.
Read the table into a staging area and process the staging table. You usually want to spend as little time on the production system as you have too. Especially if it is in use.
You may also want to look into using tables loaded by a trigger. Or Change Data Capture if you are on SQL 2008
I have a very large (100+ gigs) SQL Server 2005 database that receives a large number of inserts and updates, with less frequent selects. The selects require a lot of indexes to keep them functioning well, but it appears the number of indexes is effecting the efficiency of the inserts and updates.
Question: Is there a method for keeping two copies of a database where one is used for the inserts and updates while the second is used for the selects? The second copy wouldn't need to be real-time updated, but shouldn't be more than an hour old. Is it possible to do this kind of replication while keeping different indexes on each database copy? Perhaps you have other solutions?
Your looking to setup a master/child database topology using replication. With SQL server you'll need to setup replication between two databases (preferrably on separate hardware). The Master DB you should use for inserts and updates. The Child will service all your select queries. You'll want to also optimize both database configuration settings for the type of work they will be performing. If you have heavy select queries on the child database you may also want to setup view's that will make the queries perform better than complex joins on tables.
Some reference material on replication:
http://technet.microsoft.com/en-us/library/ms151198.aspx
Just google it and you'll find plenty of information on how to setup and configure:
http://search.aim.com/search/search?&query=sql+server+2005+replication&invocationType=tb50fftrab
Transactional replication can do this as the subscriber can have a number of aditional indexes compared with the publisher. But you have to bear in mind a simple fact: all inserts/updates/deletes are going to be replicated at the reporting copy (the subscriber) and the aditional indexes will... slow down replication. It is actually possible to slow down the replication to a rate at wich is unable to keep up, causing a swell of the distribution DB. But this is only when you have a constant high rate of updates. If the problems only occur durink spikes, then the distribution DB will act as a queue that absorbes the spikes and levels them off during off-peak hours.
I would not take this endevour without absolute, 100% proof evidence that it is the additional indexes that are slowing down the insert/updates/deletes, and w/o testing that the insert/updates/deletes are actually performing significantly better without the extra indexes. Specifically , ensure that the culprit is not the other usual suspect: lock contention.
Generally, all set-based operations (including updating indexes) are faster than non set-based ones
1,000 inserts will most probably be slower than one insert of 1,000 records.
You can batch the updates to the second database. This will, first, make the index updating more fast, and, second, smooth the peaks.
You could task schedule a bcp script to copy the data to the other DB.
You could also try transaction log shipping to update the read only db.
Don't forget to adjust the fill factor when you create your two databases. It should be low(er) on the database with frequent updates, and 100 on your "data warehouse"/read only database.