I'd like to have some degree of fault tolerance / redundancy with my SQL Server Express database. I know that if I upgrade to a pricier version of SQL Server, I can get "Replication" built in. But I'm wondering if anyone has experience in managing replication on the client side. As in, from my application:
Every time I need to create, update or delete records from the database -- issue the statement to all n servers directly from the client side
Every time I need to read, I can do so from one representative server (other schemes seem possible here, too).
It seems like this logic could potentially be added directly to my Linq-To-SQL Data Context.
Any thoughts?
Every time I need to create, update or
delete records from the database --
issue the statement to all n servers
directly from the client side
Recipe for disaster.
Are you going to have a distributed transaction or just let some of the servers fail? If you have a distributed transaction, what do you do if a server goes offline for a while.
This type of thing can only work if you do it at a server-side data-portal layer where application servers take in your requests and are aware of your database farm. At that point, you're better off just using a higher grade of SQL Server.
I have managed replication from an in-house client. My database model worked on an insert-only mode for all transactions, and insert-update for lookup data. Deletes were not allowed.
I had a central table that everything was related to. I added a field to this table for a date-time stamp which defaulted to NULL. I took data from this table and all related tables into a staging area, did BCP out, cleaned up staging tables on the receiver side, did a BCP IN to staging tables, performed data validation and then inserted the data.
For some basic Fault Tolerance, you can scheduling a regular backup.
Related
I'm managing an online book store website. For sake of high availability I've setup two Tomcat instances for running the website application, they are the exactly same program, and they are sharing the same database which located in another server.
My question is that how can I avoid conflicts or dirty data when the two applications do the same updates/inserts at the same time to the database.
For example: update t_sale set total='${num}' where category='cs', if there are two processes execute the above sql simultaneously would cause data lost.
If by "database" you are talking about a well designed schema that is running on an RDBMS such as Oracle, DB2, or SQL Server, then the database itself will prevent what you call "conflicts" by locking parts of the database during each update transaction.
You can prevent "dirty data" from getting into the database by adding features such as check clauses and primary-foreign key structures in the database itself.
I have a database ("DatabaseA") that I cannot modify in any way, but I need to detect the addition of rows to a table in it and then add a log record to a table in a separate database ("DatabaseB") along with some info about the user who added the row to DatabaseA. (So it needs to be event-driven, not merely a periodic scan of the DatabaseA table.)
I know that normally, I could add a trigger to DatabaseA and run, say, a stored procedure to add log records to the DatabaseB table. But how can I do this without modifying DatabaseA?
I have free-reign to do whatever I like in DatabaseB.
EDIT in response to questions/comments ...
Databases A and B are MS SQL 2008/R2 databases (as tagged), users are interacting with the DB via a proprietary Windows desktop application (not my own) and each user has a SQL login associated with their application session.
Any ideas?
Ok, so I have not put together a proof of concept, but this might work.
You can configure an extended events session on databaseB that watches for all the procedures on databaseA that can insert into the table or any sql statements that run against the table on databaseA (using a LIKE '%your table name here%').
This is a custom solution that writes the XE session to a table:
https://github.com/spaghettidba/XESmartTarget
You could probably mimic functionality by writing the XE events table to a custom user table every 1 minute or so using the SQL job agent.
Your session would monitor databaseA, write the XE output to databaseB, you write a trigger that upon each XE output write, it would compare the two tables and if there are differences, write the differences to your log table. This would be a nonstop running process, but it is still kind of a period scan in a way. The XE only writes when the event happens, but it is still running a check every couple of seconds.
I recommend you look at a data integration tool that can mine the transaction log for Change Data Capture events. We are recently using StreamSets Data Collector for Oracle CDC but it also has SQL Server CDC. There are many other competing technologies including Oracle GoldenGate and Informatica PowerExchange (not PowerCenter). We like StreamSets because it is open source and is designed to build realtime data pipelines between DB at the schema level. Till now we have used batch ETL tools like Informatica PowerCenter and Pentaho Data Integration. I can near real-time copy all the tables in a schema in one StreamSets pipeline provided I already deployed DDL in the target. I use this approach between Oracle and Vertica. You can add additional columns to the target and populate them as part of the pipeline.
The only catch might be identifying which user made the change. I don't know whether that is in the SQL Server transaction log. Seems probable but I am not a SQL Server DBA.
I looked at both solutions provided by the time of writing this answer (refer Dan Flippo and dfundaka) but found that the first - using Change Data Capture - required modification to the database and the second - using Extended Events - wasn't really a complete answer, though it got me thinking of other options.
And the option that seems cleanest, and doesn't require any database modification - is to use SQL Server Dynamic Management Views. Within this library residing, in the System database, are various procedures to view server process history - in this case INSERTs and UPDATEs - such as sys.dm_exec_sql_text and sys.dm_exec_query_stats which contain records of database transactions (and are, in fact, what Extended Events seems to be based on).
Though it's quite an involved process initially to extract the required information, the queries can be tuned and generalized to a degree.
There are restrictions on transaction history retention, etc but for the purposes of this particular exercise, this wasn't an issue.
I'm not going to select this answer as the correct one yet partly because it's a matter of preference as to how you approach the problem and also because I'm yet to provide a complete solution. Hopefully, I'll post back with that later. But if anyone cares to comment on this approach - good or bad - I'd be interested in your views.
THE SETUP
Two Databases at different locations
Local Server(Oracle): Used for in-house data entry and
processing.
Live Server(Postgres): Used as the DB for a public website.
THE SCENARIO
Daily data insertion/updations/deletions are performed on the Local
DB through out the day.
Later after the end of the day the entire data of the current day is
pushed to the Live DB Server using CSV files and Sql merge.
This updates the Live DB server with the latest updations and new
data inserted.
THE PROBLEM
As the Live server is updated using running batch at the end of the day, the deletion operations do not get applied on the Live server.
Due to this unwanted data also remains at the Live Server causing discrepancy in the data on both servers.
How can the delete operation on local DB server be applied on Live Server along with the Updations and Insertions?
P.S. The entire Live DB is to be restructured so any solution that requires breaking down and restructuring the DB server can also be looked into.
Oracle GoldenGate supports replication from Oracle to PostgreSQL. It would certainly be faster and less error prone than your manual approach since it is all handled at a much lower level by the database.
If for some reason you don't want to do that then you are back to triggers tracking deletes in a table with the PK for the deleted records.
Or you could just switch out the PostgreSQL with Oracle :-)
I have an application that is in production with its own database for more than 10 years.
I'm currently developing a new application (kind of a reporting application) that only needs read access to the database.
In order not to be too much linked to the database and to be able to use newer DAL (Entity Framework 6 Code First) I decided to start from a new empty database, and I only added the tables and columns I need (different names than the production one).
Now I need some way to update the new database with the production database regularly (would be best if it is -almost- immediate).
I hesitated to ask this question on http://dba.stackexchange.com but I'm not necessarily limited to only using SQL Server for the job (I can develop and run some custom application if needed).
I already made some searches and had those (part-of) solutions :
Using Transactional Replication to create a smaller database (with only the tables/columns I need). But as far as I can see, the fact that I have different table names / columns names will be problematic. So I can use it to create a smaller database that is automatically replicated by SQL Server, but I would still need to replicate this database to my new one (it may avoid my production database to be too much stressed?)
Using triggers to insert/update/delete the rows
Creating some custom job (either a SQL Job or some Windows Service that runs every X minutes) that updates the necessary tables (I have a LastEditDate that is updated by a trigger on my tables, so I can know that a row has been updated since my last replication)
Do you some advice or maybe some other solutions that I didn't foresee?
Thanks
I think that the Transactional replication is the better than using triggers.
Too much resources would be used in source server/database due to the trigger fires by each DML transaction.
Transactional rep could be scheduled as a SQL job and run it few times a day/night or as a part of nightly scheduled job. IT really depends on how busy the source db is...
There is one more thing that you could try - DB mirroring. it depends on your sql server version.
If it were me, I'd use transactional replication, but keep the table/column names the same. If you have some real reason why you need them to change (I honestly can't think of any good ones and a lot of bad ones), wrap each table in a view. At least that way, the view is the documentation of where the data is coming from.
I'm gonna throw this out there and say that I'd use Transaction Log shipping. You can even set the secondary DBs to read-only. There would be some setting up for full recovery mode and transaction log backups but that way you can just automatically restore the transaction logs to the secondary database and be hands-off with it and the secondary database would be as current as your last transaction log backup.
Depending on how current the data needs to be, if you only need it done daily you can set up something that will take your daily backups and then just restore them to the secondary.
In the end, we went for the Trigger solution. We don't have that much changes a day (maybe 500, 1000 top), and it didn't put too much pressure on the current database. Thanks for your advices.
I am working on a database auditing solution and was thinking of having SQL Server triggers take care of changes and inserting them into an auditing table. Since this is a SQL Azure Database and will be fairly large I am concerned about the cost of a growing database due to auditing.
In order to cut down on the costs needed for auditing purposes, I am considering storing the audit table (or tables) in Azure Tables instead of Azure SQL databases. So the question becomes, how to get the SQL Server trigger to get the changed data into Azure Tables?
The only thing I can come up with is to have an audit table (or tables) in SQL Databases so the trigger can insert the rows locally, and then have a Worker Role every X seconds pull any rows from that and move them to Azure Tables and delete from the SQL Database table so it doesn't grow large.
Is there a better way to do this integration? Can I somehow put a message in a queue from a trigger?
Azure SQL Database (formerly SQL Azure) doesn't support CLR (hence no EXTERNAL NAME trigger parameter) so there's no way for your triggers to do anything outside of T-SQL. If you want audit content to go to a table, you could take the approach you came up with (temporarily write to SQL table, then move content periodically to Table). There are other approaches you could take (and this would be opinion/subjective, frowned upon here), but going with the queue concept for a minute, since you asked about queues, and illustrating what you could do with Azure Queues:
You could use an Azure queue to specify an item to insert/update in your SQL database. The queue processing code could then be responsible for performing the update and writing to the Azure table. Since the queue messages must be explicitly deleted after processing, you could simply repeat the queue message processing if something failed during execution (e.g. you write to SQL but fail before writing to table storage). The message eventually becomes visible for reading again, if you don't delete it before its timeout value. As long as your operations are idempotent, you'd be ok with this pattern.
A cheaper solution than using worker roles would be to use a combination of Azure Scheduled Tasks (you can enable them for free to run every 15 min within Mobile Apps) and Azure Web Sites. Basically the way it would work is to run this scheduled job every 15 min which would make an HTTP call to some code you have running within your Azure Web Site. This code would do the same work you had outlined for your worker role.
Alternatively, use SQL Server System-Versioned temporal tables to automatically handle the writing of audited record (i.e., changes) to corresponding history tables.