Comparing 4 Databases to Distributed Database

Comparing 4 Databases to Distributed Database - sql-server

So I have a rather unique task. I have a customer who needs to take 4 similar databases and combine into a single distributed database. Keep in mind that schemas on production DB’s change frequently (couple times a month). I have the distributed database ready and utilized Red Gate SQL Compare.
I also need to send out an email when either an error occurs such as triggers or stored procs change and need manual intervention or an actual error occurs.
However, I need to continually monitor changes and compare those schema changes regularly against the distributed database. This is due to the customer having over 100 applications that will need to have connection strings adjusted and possibly moved to a monolith.
Then comes the task of moving data to the distributed database. My boss wants to write a full .NET Win Forms app to do this. So, I suppose I’m needing to know what some options would be and if there’s a way to utilize SSDT with a .NET App to do comparisons?

Related

Synchronize data b/w two data stores

I have two different databases, one's an old legacy one which I'll be decommissioning due to the old service not being used anymore. The other one's is a new service and will eventually replace the old system. Before that happens we need both services running for a while.
Both have two tables for users for storing the email address, password and the other table is for simple user related data (addresses.)
I need to synchronize data between these two databases. The old one is a MS SQL Server DB and the new one's a NoSQL DB, (DynamoDB.)
My strategy would be that before going live, copy all the users from the old DB to the new one and then once the new system is running then synchronize the users between each DB.
I'll do this by having a tool run periodically to check any users added after last run by querying the users table something like this WHERE CreationDate >= LastRunTime and then for each user query it if it exists in the other database. I'll do this two way i.e. from old DB -> new DB and from new DB -> old DB.
Is this a good way of doing this? Any other better, fast solutions to achieve this?
How can I detect changes to existing user's data? Is there any better solution than checking & matching every user's record in both systems' tables and then taking the one that's last modified (by checking at the LastModifiedDate timestamp for each record) and updating it in the other system's table?

Solution 1 (My Recommended): Whenever system insert/update a record in either of the databases you add/update a record data in the database and add that information in a Queue.
A sperate reader will read from the queue and replicate the data to respective database periodically this way your data will get sync between the databases.
Note: Another advantage of using the queue would be that you don't have to set very high throughput in your DynamoDB table.
Solution 2: What you had suggested in your question, you can add a CRON job that will replicate the databases by checking the record based on timestamp.

I've executed several table migrations from Oracle / MySQL to DynamoDB with no downtime and the approach I used was a little different than what you described. This approach ends up requiring more coding but I would consider it a lower risk approach than the hard cutover you described.
This approach requires multiple phases as described below:
Phase 1
Create the new DynamoDB table(s) for the data in your legacy system.
Phase 2
Update your application to write/update data in both the legacy database and in DynamoDB. Your application will still read and write to the legacy system so this should be a low risk change.
Immediately before deploying this code load DynamoDB up with all of the old data.
Immediately after deploying audit the database to make sure they are in sync.
Phase 3
Update your application to start reading from DynamoDB. This should be low risk because your application will have been maintaining data in DynamoDB for some time.
Keep your application writing to the legacy database so you can cut back if you identify any problems in the new implementation. This ensures the cutover is low risk and you can easily roll back.
Phase 4
Remove the code from your application that reads and writes to the legacy database and deploy this to production.
You can now decommission the legacy database!
This is definitely more steps and will take more time than just taking the application down, migrating all of the data, and then deploying a new version of the application to read/write from DynamoDB. However, the main benefit to this approach is that it not only requires no downtime but is lower risk as it tests the change in phases and allows for easy rollback if any issues are encountered.

On high level, a sync job could be 1> cron job based or 2> notification based.
The cron job could do sync as well as auditing if you have "creation time" and "last_updated_by time". In this case the master DB (from where the data should be synced from) is normally a SQL Db since it's much easier to do table scan in SQL than in NoSQL (like in DynamoDB you need to use its scan function and it's limited by the table's hash key).
The second option is to build a notification machenism and this could be based on DynamoDB's stream http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html. It's a mature feature for DynamoDB, it guarantees event order and could achieve near real time event deliver. What you need to do is to build a listen for those events.
Lastly, you could take a look at AWS Database Migration Service https://aws.amazon.com/dms/ to see if it satisfies your requirement.

Rebuilding an unstable tool from scratch (Currently Access based - can go anywhere)

I have inherited a custom built tool that is poorly designed and unstable, and I have a great opportunity to rebuild it from scratch. This is an internal tool only that works almost entirely in Access, and its purpose is to provide higher detail on parts that cost the company over a certain dollar amount.
How it works:
1) The raw data (new part numbers) gets pulled nightly from the EDW via macros in Access.
2) The same macros then join two tables (part numbers from one, names from another). Any part under a certain dollar amount is removed, and the new data is appended to the existing Access database.
3) During the day employees can then open a custom Access form to add more details about the part. Different questions are asked depending on the part category.
4) The completed form is forwarded to management, and the information entered is retained in the Access database – it does not write back to the EDW.
5) Managers can also pull some basic reports from the database, based on overall costs.
The problems:
1) Currently everyone has to have Access installed on their work stations, and whenever there is an update the new database gets pushed to their stations. This is not considered an ideal situation by management or IT.
2) If anyone has left the tool open accidentally at the end of the day the database is locked out, therefore the macros cannot run and the tool cannot be updated with new part numbers.
3) If the tool cannot update for a few days in a row the database can become corrupted. We can restore from the last good backup, in the past this has resulted in the loss of multiple days of work.
Ideally we want to take the tool completely out of Access. I am building a SharePoint site that can host the tool, which (if I can get it right) will eliminate the need of Access on end-user stations with a database push. However the SharePoint form would need read/write ability.
The big question is: How do I build this?
I have a completely open path of possibilities – I can design it work any way I want, using any tools or platform I want, as long as it works. It does not have to update automatically, as I already run a number of SQL scripts at the start of my day and adding one more is inconsequential.
The resources I have at my immediate disposal are: SharePoint (with designer), Access, Toad, and SQL Server. The database can be hosted on a shared network drive.
I am a recent college graduate with basic SQL knowledge. I have about a year to produce a final product, but would like to get it up and running far sooner if possible.
Any advice on what direction to pursue would be very helpful, thank you.

Caveat: I've never worked with SQL Server, so I don't know all of it's capabilities (I'm an Oracle developer).
What I'd do in your situation is something like the following (although not necessarily in this exact order):
Get a SQL Server database set up to host your tables.
Create the tables etc
Migrate test data across (I'm assuming you have a dev/uat/test environment for your current system! If you haven't, make sure you set up at least a separate test environment to prod for your new db!)
Write stored procs to do the work for adding new parts, updating existing data, etc etc
Set up an automated job on the db (I'm assuming SQL Server can do this!) to do the overnight processing.
Create a separate db user with the necessary permissions to call the stored procedures
Get your frontend to call the stored procs with the relevant parameters using the db user you created in step 6 to connect to the db.
You'd also have to think about transaction control to try and mitigate the case where users go home at the end of the day without committing their work - Does the db handle the commits/rollbacks or does Sharepoint?
Once you've worked out everything in your test environment, it's then a case of creating the prod db, users and objects, and then working out the best way of migrating the prod data across.
Good luck.
Don't forget to get backups for the new db set up as well.

Viewing database records realtime in WPF application

disclaimer: I must use a microsoft access database and I cannot connect my app to a server to subscribe to any service.
I am using VB.net to create a WPF application. I am populating a listview based on records from an access database which I query one time when the application loads and I fill a dataset. I then use LINQ to dataset to display data to the user depending on filters and whatnot.
However.. the access table is modified many times throughout the day which means the user will have "old data" as the day progresses if they do not reload the application. Is there a way to connect the access database to the VB.net application such that it can raise an event when a record is added, removed, or modified in the database? I am fine with any code required IN the event handler.. I just need to figure out a way to trigger a vb.net application event from the access table.
Think of what I am trying to do as viewing real-time edits to a database table, but within the application.. any help is MUCH appreciated and let me know if you require any clarification - I just need a general direction and I am happy to research more.
My solution idea:
Create audit table for ms access change
Create separate worker thread within the users application to query
the audit table for changes every 60 seconds
if changes are found it will modify the affected dataset records
Raise event on dataset record update to refresh any affected
objects/properties

Couple of ways to do what you want, but you are basically right in your process.
As far as I know, there is no direct way to get events from the database drivers to let you know that something changed, so polling is the only solution.
I the MS Access database is an Access 2010 ACCDB database, and you are using the ACE drivers for it (if Access is not installed on the machine where the app is running) you can use the new data macro triggers to record changes to the tables in the database automatically to an audit table that would record new inserts of updates, deletes, etc as needed.
This approach is the best since these happen at the ACE database driver level, so they will be as efficient as possible and transparent.
If you are using older versions of Access, then you will have to implement the auditing yourself. Allen Browne has a good article on that. A bit of search will bring other solutions as well.
You can also just run some query on the tables you need to monitor
In any case, you will need to monitor your audit or data table as you mentioned.
You can monitor for changes much frequently than 60s, depending on the load on the database, number of clients, etc, you could easily check ever few seconds.
I would recommend though that you:
Keep a permanent connection to the database while your app is running: open a dummy table for reading, and don't close it until you shutdown your app. This has no performance cost to anyone, but it will ensure that the expensive lock file creation is done only once, and not for every query you run. This can have a huge performance import. See this article for more information on why.
Make it easy for your audit table (or for your data table) to be monitored: include a timestamp column that records when a record was created and last modified. This makes checking for changes very quick and efficient: you just need to check if the most recent record modified date matches the last one you read.
With Access 2010, it's easy to add the trigger to do that. With older versions, you'll need to do that at the level of the form.

If you are using SQL Server
Up to SQL 2005 you could use Notification Services
Since SQL Server 2008 R2 it has been replaced by StreamInsight
Other database management systems and alternatives
Oracle
Handle changes in a middle tier and signal the client
Or poll. This requires you to configure the interval so you do not miss out on a change too long.
In general
When a server has to be able to send messages to clients it needs to keep a channel/socket open to the clients this can become very expensive when there are a lot of clients. I would advise against a server push and try to do intelligent polling. Intelligent polling means an interval that is as big as possible and appropriate caching on the server to prevent hitting the database to many times for the same data.

Reliable alternative to replication for continous data sync between two databases

I have one central database and 25 client databases and all have same schema.
I want that whenever some changes are done in some tables of the central database then these changes flow down to the client database.
The databases used is SQL Express so I cannot use replication.
The solution that I have today is to make keep track of the changes in the central database and then a program makes a text file with these changes and sends them down to the client databases.Another program reads these text files and updates the client database.
There are three problems with this:-
1. The files get lost or arrive in jumbled order which messes up the client data
2. the process is slow
3. the programs are sometimes shutdown so the whole sync flow gets stopped.
Is there a reliable alternative that is fast and secure ?
I wonder how banking software are made ...they never lose transactions and they are fast.

Add an UpdateDate column to all the entities that need to be replicated. At each client add a linked server to the central repository. Now, every 5 minutes or so, poll your central repository for changes using the last UpdateDate of a client entity and grab the delta.
Then use merge or insert and update to merge data on the client. That's a very reliable way of doing homebrew replication. To keep track of deleted elements you would either want to mark them as deleted or have another table to keep track of entity kind and its reference, again combined with UpdateDate for replication.
Update
Then you mention transactions and banking software. When you do your replication via files, we ain't talkin' about no transactional replication here, not by a long shot.
If you need transactional consistency you need to subscribe to the transaction flow of the data warehouse.

I don't want to be unhelpful and you haven't given any background about your business needs, but you have to decide if your priority is really "fast and secure" or if it's actually "cheap". Replicating changes between multiple databases in a reliable, consistent way is not easy (as you know) and it's highly unlikely that you will be able to develop a solution yourself that has the features, stability and performance of SQL Server replication.
SQL Express can be a replication subscriber, by the way, so it's not clear why it doesn't meet your needs. But if it doesn't, you should estimate the cost to your business (or customer) of dealing with issues caused by an unreliable solution: your time, business downtime, finding and correcting incorrect data, customer complaints, lost business etc. Then compare that to the cost of 25 SQL Server licenses (you should certainly be able to get a good discount when you order that volume), additional hardware (if any) and the costs of training, consulting and/or learning how to use replication. Then extrapolate those costs over 5 years or so. You may find that it's cheaper just to buy the solution you need. And of course buying the full SQL Server edition means you get a lot of other new features that might be useful to you.
If you (or your boss) is really determined to get something for nothing, you might want to investigate PostgreSQL or MySQL. They both have free replication solutions that seem to be widely enough used to be reliable for many companies. Of course, you then need to calculate the costs of switching to a new database platform.

If you have one central database and 25 clients, you can easily do it with one (yes only one) SQL server licence for the main database. Subscribers to this database can run SQL express. As long as users access the the client databases, you are not even obliged to buy SQL CALs.
Back to banking software, be sure that they are paying good money for their server licenses! So don't be surprised if these are reliable and fast ...

Copying data from a local database to a remote one

I'm writing a system at the moment that needs to copy data from a clients locally hosted SQL database to a hosted server database. Most of the data in the local database is copied to the live one, though optimisations are made to reduce the amount of actual data required to be sent.
What is the best way of sending this data from one database to the other? At the moment I can see a few possibly options, none of them yet stand out as being the prime candidate.
Replication, though this is not ideal, and we cannot expect it to be supported in the version of SQL we use on the hosted environment.
Linked server, copying data direct - a slow and somewhat insecure method
Webservices to transmit the data
Exporting the data we require as XML and transferring to the server to be imported in bulk.
The data copied goes into copies of the tables, without identity fields, so data can be inserted/updated without any violations in that respect. This data transfer does not have to be done at the database level, it can be done from .net or other facilities.
More information
The frequency of the updates will vary completely on how often records are updated. But the basic idea is that if a record is changed then the user can publish it to the live database. Alternatively we'll record the changes and send them across in a batch on a configurable frequency.
The amount of records we're talking are around 4000 rows per table for the core tables (product catalog) at the moment, but this is completely variable dependent on the client we deploy this to as each would have their own product catalog, ranging from 100's to 1000's of products. To clarify, each client is on a separate local/hosted database combination, they are not combined into one system.
As well as the individual publishing of items, we would also require a complete re-sync of data to be done on demand.
Another aspect of the system is that some of the data being copied from the local server is stored in a secondary database, so we're effectively merging the data from two databases into the one live database.

Well, I'm biased. I have to admit. I'd like to hypnotize you into shelling out for SQL Compare to do this. I've been faced with exactly this sort of problem in all its open-ended frightfulness. I got a copy of SQL Compare and never looked back. SQL Compare is actually a silly name for a piece of software that synchronizes databases It will also do it from the command line once you have got a working project together with all the right knobs and buttons. Of course, you can only do this for reasonably small databases, but it really is a tool I wouldn't want to be seen in public without.
My only concern with your requirements is where you are collecting product catalogs from a number of clients. If they are all in separate tables, then all is fine, whereas if they are all in the same table, then this would make things more complicated.

How much data are you talking about? how many 'client' dbs are there? and how often does it need to happen? The answers to those questions will make a big difference on the path you should take.

There is an almost infinite number of solutions for this problem. In order to narrow it down, you'd have to tell us a bit about your requirements and priorities.
Bulk operations would probably cover a wide range of scenarios, and you should add that to the top of your list.

I would recommend using Data Transformation Services (DTS) for this. You could create a DTS package for appending and one for re-creating the data.
It is possible to invoke DTS package operations from your code so you may want to create a wrapper to control the packages that you can call from your application.

In the end I opted for a set of triggers to capture data modifications to a change log table. There is then an application that polls this table and generates XML files for submission to a webservice running at the remote location.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight