Experienced a 4 min slow down when running reports against SQL Server replication database, specifically created to run reports against.
Prod runs fine, replication use to run in < 1, now takes 4 minutes.
We did two things prior to slowdown:
Truncate log file from 400gb to 100mb
re-created replication job after new data was not happening on Monday
The items were working Friday. From what I can see it the replication is a smaller database as we dont use all the data in prod for reports. I think it might be related to the execution plan being recreated when the new replication job was created but seems very odd, any idea guys?
It's likely that your replicated database doesn't have the same indexes as the primary database. Check that primary key constraints are being replicated (in the article properties), and check that indexes are being replicated.
Take a look at all the indexes and keys in the replicated database and compare them to the source database. It sounds highly likely that they're different.
Related
I have an application that is in production with its own database for more than 10 years.
I'm currently developing a new application (kind of a reporting application) that only needs read access to the database.
In order not to be too much linked to the database and to be able to use newer DAL (Entity Framework 6 Code First) I decided to start from a new empty database, and I only added the tables and columns I need (different names than the production one).
Now I need some way to update the new database with the production database regularly (would be best if it is -almost- immediate).
I hesitated to ask this question on http://dba.stackexchange.com but I'm not necessarily limited to only using SQL Server for the job (I can develop and run some custom application if needed).
I already made some searches and had those (part-of) solutions :
Using Transactional Replication to create a smaller database (with only the tables/columns I need). But as far as I can see, the fact that I have different table names / columns names will be problematic. So I can use it to create a smaller database that is automatically replicated by SQL Server, but I would still need to replicate this database to my new one (it may avoid my production database to be too much stressed?)
Using triggers to insert/update/delete the rows
Creating some custom job (either a SQL Job or some Windows Service that runs every X minutes) that updates the necessary tables (I have a LastEditDate that is updated by a trigger on my tables, so I can know that a row has been updated since my last replication)
Do you some advice or maybe some other solutions that I didn't foresee?
Thanks
I think that the Transactional replication is the better than using triggers.
Too much resources would be used in source server/database due to the trigger fires by each DML transaction.
Transactional rep could be scheduled as a SQL job and run it few times a day/night or as a part of nightly scheduled job. IT really depends on how busy the source db is...
There is one more thing that you could try - DB mirroring. it depends on your sql server version.
If it were me, I'd use transactional replication, but keep the table/column names the same. If you have some real reason why you need them to change (I honestly can't think of any good ones and a lot of bad ones), wrap each table in a view. At least that way, the view is the documentation of where the data is coming from.
I'm gonna throw this out there and say that I'd use Transaction Log shipping. You can even set the secondary DBs to read-only. There would be some setting up for full recovery mode and transaction log backups but that way you can just automatically restore the transaction logs to the secondary database and be hands-off with it and the secondary database would be as current as your last transaction log backup.
Depending on how current the data needs to be, if you only need it done daily you can set up something that will take your daily backups and then just restore them to the secondary.
In the end, we went for the Trigger solution. We don't have that much changes a day (maybe 500, 1000 top), and it didn't put too much pressure on the current database. Thanks for your advices.
We've got a project site where we have to replicate a legacy database system into SQL Server 2008 on a nightly basis.
We are using the SQL DataWizard tool from Maestro to do the job, and because we cannot get an accurate delta every night, it was decided that we would dump the previous SQL Server database and take a fresh snapshot every night. Several million rows in about 10 different tables. The snapshot takes about 2 hours to run.
Now, we also need to create some custom indexes on the snapshot copy of the data, so that certain BI tools can query the data quickly.
My question is: is it more efficient to create the tables AND the indexes before the snapshot copy is run, or do we just create the table structures first, run the snapshot copy then create the indexes after the tables are populated?
Is there a performance different in the SQL Server database building the index WHILE adding rows vs adding all rows first then creating the indexes on the final data set?
Just trying to work out which way will result in less database server CPU overhead.
When you perform a snapshot replication, the first task is to bulk copy the data. After the data has been copied, primary and secondary indexes are added. The indexes don't exists until the second step is complete. So no, there is no improvement gain by applying an index after the snapshot.
We have a table in our sql server database which holds 'today's' data and is updated by many scheduled jobs around the clock. Each job deletes rows it previously inserted and inserts new rows. The table data is also made available via a web site which runs many queries against it. The problem is the indexes are constantly fragmented and although it only has 1.5m rows querys are generally very slow and the website times out frequently.
So I would like to know if anyone else has experienced a similar scenario and if so how did you deal with it.
You need to ReOrg the indexes on a daily basis. Here's am image of a defrag job from SSMS:
I have transactional replication with updatable subscriptions going between a few SQL 2008 R2 servers (publisher is Enterprise, subscribers are Express).
I need to add another subscriber, and come to discover that my database has outgrown the 10GB limit for Express. My current subscribers are under the 10GB limit, however the publishing database is 13GB.
So I delete some large unused columns and data from the largest tables, run dbcc cleantable, run update statistics on them, the tables go down in size a bit and I thought I was good to go!
However, the publishing database is still a good 11.5GB while the subscribers all went down to 8GB.
I compare table sizes between the publishers and subscribers and the few largest tables that I had deleted data from are larger in the publishing database than the subscribing databases - by a couple gigs.
I compare table structures and use RedGate's Data Compare - the tables are identical between the publisher and subscribers so I am at a loss. I don't know what is cause the discrepancy let alone how to resolve it so I can add another subscriber (without having to buy SQL Standard licenses for the subscriber). I have a feeling it has to do with being the publisher and it's row-count has grown significantly within the last year.
As a side note - I do also have a couple SQL Standard 2008 licenses, however they're 2008, not 2008 R2 therefore incompatible to initialize the subscriber using a backup. The sites have slow connections so I have always initialized replication from backups.
Would it be possible to drop the replication and recreate it? Replication always seems to be finicky and it might still have remnants of the columns out there (where you can't see it)
Remember you can script the rep so you don't have to start from scratch.
We have 70+ SQL Server 2008 databases that need to be copied from an OLTP environment to a separate reporting server. Once the DB's are copied, we will do some partial data transformation: de-normalization, row level security, etc.
SSRS Reports will be written based on these static denormalized tables and views.
We have a small nightly window for copying and transforming all 70 databases (3 hours).
Currently databases average about 10GB.
Options:
1. Transactional replication:
We would need to create 100+ static denormalized tables on each reporting database.
Doing this for all 70 databases almost reaches our nightly time limit.
As the databases grow we will exceed the time limit. We thought of mixing denormalized tables with views to speed up transformation. But then there would be some dynamic and some static data which is not a solution we can use.
Also with 70 databases using transactional replication we are concerned about bandwidth usage.
2. Snapshot replication:
Copy the entire database each night.
This means we could have a mixture of denormalized tables and views so the data transformation process is quicker.
But the snapshot is a full data copy, so as the DB grows, we will exceed our time limit for completing copy and transformation.
3. Log shipping:
In our nightly window, we could use the log shipping to update the reporting databases, then truncate and repopulate the denormalized tables and use some views.
However, I understand that with log shipping, extra tables and views cannot be added to the subscribing database.
4. Mirroring:
Mirroring is being deprecated, but also the DB is not active for reporting against until failover.
5. SQL Server 2012 AlwaysOn.
We don't have SQL Server 2012 yet, can this be configured to do an update once a day instead of realtime?
And can extra tables and views be created on the subscribing database (our reporting databases)?
6. Merge replication:
This is meant to be for combining multiple data sources into one database.
But is looks like it allows for a scheduled update (once per day) and only updates the subscriber DB with the latest changes rather than doing an entire snapshot.
It requires adding a rowversion column to every table but we could handle this. Also with this solution would additional tables be able to be created on the subscriber database without the update getting out of sync?
The final option is that we use SSIS to select only the data we need from the OLTP databases. I think this options creates more risk as we would have to handle inserts/updates/deletes to our denormalized tables, rather than just drop and recreate the denormalized tables daily.
Any help on our options would be greatly appreciated.
If I've made any incorrect assumptions, please say.
If it were me, I'd go with transactional replication that runs continuously and have views (possibly indexed) at the subscriber. This has the advantage of not having to wait for the data to come over since it's always coming over.