Replicating a SQL Server database for read access - sql-server

I have an application that is in production with its own database for more than 10 years.
I'm currently developing a new application (kind of a reporting application) that only needs read access to the database.
In order not to be too much linked to the database and to be able to use newer DAL (Entity Framework 6 Code First) I decided to start from a new empty database, and I only added the tables and columns I need (different names than the production one).
Now I need some way to update the new database with the production database regularly (would be best if it is -almost- immediate).
I hesitated to ask this question on http://dba.stackexchange.com but I'm not necessarily limited to only using SQL Server for the job (I can develop and run some custom application if needed).
I already made some searches and had those (part-of) solutions :
Using Transactional Replication to create a smaller database (with only the tables/columns I need). But as far as I can see, the fact that I have different table names / columns names will be problematic. So I can use it to create a smaller database that is automatically replicated by SQL Server, but I would still need to replicate this database to my new one (it may avoid my production database to be too much stressed?)
Using triggers to insert/update/delete the rows
Creating some custom job (either a SQL Job or some Windows Service that runs every X minutes) that updates the necessary tables (I have a LastEditDate that is updated by a trigger on my tables, so I can know that a row has been updated since my last replication)
Do you some advice or maybe some other solutions that I didn't foresee?
Thanks

I think that the Transactional replication is the better than using triggers.
Too much resources would be used in source server/database due to the trigger fires by each DML transaction.
Transactional rep could be scheduled as a SQL job and run it few times a day/night or as a part of nightly scheduled job. IT really depends on how busy the source db is...
There is one more thing that you could try - DB mirroring. it depends on your sql server version.

If it were me, I'd use transactional replication, but keep the table/column names the same. If you have some real reason why you need them to change (I honestly can't think of any good ones and a lot of bad ones), wrap each table in a view. At least that way, the view is the documentation of where the data is coming from.

I'm gonna throw this out there and say that I'd use Transaction Log shipping. You can even set the secondary DBs to read-only. There would be some setting up for full recovery mode and transaction log backups but that way you can just automatically restore the transaction logs to the secondary database and be hands-off with it and the secondary database would be as current as your last transaction log backup.
Depending on how current the data needs to be, if you only need it done daily you can set up something that will take your daily backups and then just restore them to the secondary.

In the end, we went for the Trigger solution. We don't have that much changes a day (maybe 500, 1000 top), and it didn't put too much pressure on the current database. Thanks for your advices.

Related

Detect Table Changes In A Database Without Modifications

I have a database ("DatabaseA") that I cannot modify in any way, but I need to detect the addition of rows to a table in it and then add a log record to a table in a separate database ("DatabaseB") along with some info about the user who added the row to DatabaseA. (So it needs to be event-driven, not merely a periodic scan of the DatabaseA table.)
I know that normally, I could add a trigger to DatabaseA and run, say, a stored procedure to add log records to the DatabaseB table. But how can I do this without modifying DatabaseA?
I have free-reign to do whatever I like in DatabaseB.
EDIT in response to questions/comments ...
Databases A and B are MS SQL 2008/R2 databases (as tagged), users are interacting with the DB via a proprietary Windows desktop application (not my own) and each user has a SQL login associated with their application session.
Any ideas?
Ok, so I have not put together a proof of concept, but this might work.
You can configure an extended events session on databaseB that watches for all the procedures on databaseA that can insert into the table or any sql statements that run against the table on databaseA (using a LIKE '%your table name here%').
This is a custom solution that writes the XE session to a table:
https://github.com/spaghettidba/XESmartTarget
You could probably mimic functionality by writing the XE events table to a custom user table every 1 minute or so using the SQL job agent.
Your session would monitor databaseA, write the XE output to databaseB, you write a trigger that upon each XE output write, it would compare the two tables and if there are differences, write the differences to your log table. This would be a nonstop running process, but it is still kind of a period scan in a way. The XE only writes when the event happens, but it is still running a check every couple of seconds.
I recommend you look at a data integration tool that can mine the transaction log for Change Data Capture events. We are recently using StreamSets Data Collector for Oracle CDC but it also has SQL Server CDC. There are many other competing technologies including Oracle GoldenGate and Informatica PowerExchange (not PowerCenter). We like StreamSets because it is open source and is designed to build realtime data pipelines between DB at the schema level. Till now we have used batch ETL tools like Informatica PowerCenter and Pentaho Data Integration. I can near real-time copy all the tables in a schema in one StreamSets pipeline provided I already deployed DDL in the target. I use this approach between Oracle and Vertica. You can add additional columns to the target and populate them as part of the pipeline.
The only catch might be identifying which user made the change. I don't know whether that is in the SQL Server transaction log. Seems probable but I am not a SQL Server DBA.
I looked at both solutions provided by the time of writing this answer (refer Dan Flippo and dfundaka) but found that the first - using Change Data Capture - required modification to the database and the second - using Extended Events - wasn't really a complete answer, though it got me thinking of other options.
And the option that seems cleanest, and doesn't require any database modification - is to use SQL Server Dynamic Management Views. Within this library residing, in the System database, are various procedures to view server process history - in this case INSERTs and UPDATEs - such as sys.dm_exec_sql_text and sys.dm_exec_query_stats which contain records of database transactions (and are, in fact, what Extended Events seems to be based on).
Though it's quite an involved process initially to extract the required information, the queries can be tuned and generalized to a degree.
There are restrictions on transaction history retention, etc but for the purposes of this particular exercise, this wasn't an issue.
I'm not going to select this answer as the correct one yet partly because it's a matter of preference as to how you approach the problem and also because I'm yet to provide a complete solution. Hopefully, I'll post back with that later. But if anyone cares to comment on this approach - good or bad - I'd be interested in your views.

Viewing database records realtime in WPF application

disclaimer: I must use a microsoft access database and I cannot connect my app to a server to subscribe to any service.
I am using VB.net to create a WPF application. I am populating a listview based on records from an access database which I query one time when the application loads and I fill a dataset. I then use LINQ to dataset to display data to the user depending on filters and whatnot.
However.. the access table is modified many times throughout the day which means the user will have "old data" as the day progresses if they do not reload the application. Is there a way to connect the access database to the VB.net application such that it can raise an event when a record is added, removed, or modified in the database? I am fine with any code required IN the event handler.. I just need to figure out a way to trigger a vb.net application event from the access table.
Think of what I am trying to do as viewing real-time edits to a database table, but within the application.. any help is MUCH appreciated and let me know if you require any clarification - I just need a general direction and I am happy to research more.
My solution idea:
Create audit table for ms access change
Create separate worker thread within the users application to query
the audit table for changes every 60 seconds
if changes are found it will modify the affected dataset records
Raise event on dataset record update to refresh any affected
objects/properties
Couple of ways to do what you want, but you are basically right in your process.
As far as I know, there is no direct way to get events from the database drivers to let you know that something changed, so polling is the only solution.
I the MS Access database is an Access 2010 ACCDB database, and you are using the ACE drivers for it (if Access is not installed on the machine where the app is running) you can use the new data macro triggers to record changes to the tables in the database automatically to an audit table that would record new inserts of updates, deletes, etc as needed.
This approach is the best since these happen at the ACE database driver level, so they will be as efficient as possible and transparent.
If you are using older versions of Access, then you will have to implement the auditing yourself. Allen Browne has a good article on that. A bit of search will bring other solutions as well.
You can also just run some query on the tables you need to monitor
In any case, you will need to monitor your audit or data table as you mentioned.
You can monitor for changes much frequently than 60s, depending on the load on the database, number of clients, etc, you could easily check ever few seconds.
I would recommend though that you:
Keep a permanent connection to the database while your app is running: open a dummy table for reading, and don't close it until you shutdown your app. This has no performance cost to anyone, but it will ensure that the expensive lock file creation is done only once, and not for every query you run. This can have a huge performance import. See this article for more information on why.
Make it easy for your audit table (or for your data table) to be monitored: include a timestamp column that records when a record was created and last modified. This makes checking for changes very quick and efficient: you just need to check if the most recent record modified date matches the last one you read.
With Access 2010, it's easy to add the trigger to do that. With older versions, you'll need to do that at the level of the form.
If you are using SQL Server
Up to SQL 2005 you could use Notification Services
Since SQL Server 2008 R2 it has been replaced by StreamInsight
Other database management systems and alternatives
Oracle
Handle changes in a middle tier and signal the client
Or poll. This requires you to configure the interval so you do not miss out on a change too long.
In general
When a server has to be able to send messages to clients it needs to keep a channel/socket open to the clients this can become very expensive when there are a lot of clients. I would advise against a server push and try to do intelligent polling. Intelligent polling means an interval that is as big as possible and appropriate caching on the server to prevent hitting the database to many times for the same data.

Log inserted/updated/deleted rows in all tables for a given database in SQL Server 2008

Whats the best way to track/Log inserted/updated/deleted rows in all tables for a given database in SQL Server 2008?
Or is there a better "Audit" feature in SQL Server 2008?
Short answer is that there is no one single solution fits all. It depends on the system but and requirements but here are couple different approaches.
DML Triggers
Relatively easy to implement, because you have to write one that works well for one table and then apply it to other tables.
Downside is that it can get messy when you have a lot of tables and even more triggers. Managing 600 triggers for 200 tables (insert, update and delete trigger per table) is not an easy task.
Also, it might cause a performance impact.
Creating audit triggers in SQL Server
Log changes to database table with trigger
Change Data Capture
Very easy to implement, natively supported but only in enterprise edition which can cost a lot of $ ;). Another disadvantage is that CDC is still not as evolved as it should be. For example, if you change your schema, history data is lost.
Transaction log analysis
Biggest advantage of this is that all you need to do is to put the database in full recovery mode and all info will be stored in transaction log
However, if you want to do this correctly you’ll need a third party log reader because this is not natively supported.
Read the log file (*.LDF) in SQL Server 2008
SQL Server Transaction Log Explorer/Analyzer
If you want to implement this I’d recommend you try out some of the third party tools that exist out there. I worked with couple tools from ApexSQL but there are also good tools from Idera and Netwrix
ApexSQL Log – auditing by reading transaction log
ApexSQL Comply – uses traces in the background and then parses those traces and stores results in central database.
Disclaimer: I’m not affiliated with any of the companies mentioned above.
Change Data Capture is designed to do what you want, but it requires each table be set up individually, so depending on the number of tables you have, there may be some logistics to it. It will also only store the data in capture tables for a couple of days by default, so you may need an SSIS package to pull it out and store for longer periods.
I don't remember whether there is already some tool for this, but you could always use triggers (then you will have access for temporal tables with changed rows- INSERTED and DELETED). Unfortunately, it could be quite a work to do if you would like to track all tables. I believe that there should be some simpler solution, but do not remember as I said.
EDIT.
Maybe this could be helpful:
--Change tracking
http://msdn.microsoft.com/en-us/library/cc280462.aspx
http://msdn.microsoft.com/en-us/library/cc280386.aspx
This allows you to do audits at the database level; it may or may not be enough to meet the business requirements, as database records usually don't make all that much sense without the logic to glue them together. For instance, knowing that user x inserted a record into the "time_booked" table with a foreign key to the "projects", "users", "time_status" tables may not make all that much sense without the SQL query to glue those 4 tables together.
You may also need to have each database user connect with their own user ID - this is fine with integrated security and a client app, but probably won't work with a website using a connection pool.
The sql server logs are not possible to analyze just like that. There are some 3rd party tools available to read the logs but as far as I know you can't query them for statistics and such. If you need this kind of info you'll have to create some sort of auditing to capture all these events in separate tables. You can use "DDL triggers".

Efficient way to delete records every 10 mins

Problem at hand
Need to delete some few thousand records every 10 minutes from a SQL Server database table.This is part of cleanup for older records.
Solutions under consideration
There's .Net Service running for some other functionality. Same service can be used with a timer to execute SQL delete command on db.
SQL server job
Trigger
Key consideration for providing solution
Ours is a web product which gets deployed at different client locations. we want minimal operational overhead as resources doing deployment are very limited technical skill and we also want to make sure that there's less to none configuration requirement for our Product.
Performance is very important, as it on live transactional database.
This sounds like exactly the sort of work that a SQL Server job was intended to provide; database maintenance.
A scheduled job can execute a basic T-SQL statement that will delete the records you don't want any more, on whatever schedule you want it to run on. The job creation can be scripted to be part of your standard deployment scripts, which should negate the deployment costs.
Additionally, by utilizing an established part of SQL Server, you capitalize on the knowledge of other database administrators that will understand SQL jobs and be able to manage them.
I would not use a trigger...and stick with SQL Server DTS or SSIS. Obviously you will need some kind of identifier so I would use a timestamp column with an index...if that's not required just fire off a TRUNCATE once nightly.
The efficiency of the delete comes from indexes, has nothing to do how the timer is triggered. It is very important that the 'old' records be easily identifiable by a range scan. If the DELETE has to scan the whole table to find these 'old' records, it will block all other activity. Usually in such cases the table is clustered by the datetime value first, and unique primary keys are delegated to a non-clustered index, if needed.
Now how to pop the timer, you really have three alternatives:
SQL Agent job
Conversation Timers
Application timer
SQL Agent job is the best option for 10 minute intervals. Only drawback is that it does not work on SQL Express deployments. If that is a concern, then conversation timers and activated procedures are a viable alternative.
Last option has the disadvantage that the application must be running for the timer to trigger deletion. If this is not a concern (ie. if the application is not running, it doesn't matter that the records are not deleted) then is OK. Note that ASP.Net applications are very bad host for such timers, because of the way IIS and ASP may choose to recycle and put to sleep app pools.

sql server replication algorithm

Anyone know how the underlying replication model in sql server works? Do they essentially depend on UTC datetime values to determine if something is new or do they keep a table of all the changes (like a table of tableID+rowid that have changed).
I am building my own "replication" system and was planning on using the dates to know what to replicate. Then I started wondering what would happen if the date got off in the computer for some reason. The obvious choice is to keep a log of the changes as you go and once you replicate those changes, you remove from the log of changes. But thats a lot of extra work, instead of just checking dates.
I figure if sql server replication works by just checking the dates, then that should be good enough for me.
Any wisdom here?
thanks
As a transaction occurs in SQL Server, it is written to the transaction log along with information pertinent to the transaction.
SQL Server replication uses this transaction log to determine which transactions have not yet been processed and to move them to the subscriber. There is a lot more going on under the hood to keep track of the intersection between transactions, publications, subscriptions, etc. but I will leave that to MSDN documentation about SQL Server replication http://msdn.microsoft.com/en-us/library/ms151198.aspx
Moving on to your point about building your own replication system:
Do not build your own replication system. There are too many complications involved that will cause you to spend many many days working. You will be much better off using the items that are shipped with SQL Server.
SQL Server replication methods are pretty impressive out of the box.
If you outline what causes you to think in terms of building your own replication system, we can help you figure out how to use existing items to provision what you need.
Also, read up as much as you can here to get an idea of what it can do for you http://msdn.microsoft.com/en-us/library/ms151198.aspx
SQL Server has a LogReader job that is aptly named. Replication reads the transaction log and applies appropriate transactions to the subscribing databases.
For one thing SQLServer (and it's not the only one) supports multiple replication algorithms.
You can find here details about the ones implemented in SQLServer 2008. Read first the X Replication Overview then follow the How X Replication works for more details.

Resources