Copying data from a local database to a remote one - sql-server

I'm writing a system at the moment that needs to copy data from a clients locally hosted SQL database to a hosted server database. Most of the data in the local database is copied to the live one, though optimisations are made to reduce the amount of actual data required to be sent.
What is the best way of sending this data from one database to the other? At the moment I can see a few possibly options, none of them yet stand out as being the prime candidate.
Replication, though this is not ideal, and we cannot expect it to be supported in the version of SQL we use on the hosted environment.
Linked server, copying data direct - a slow and somewhat insecure method
Webservices to transmit the data
Exporting the data we require as XML and transferring to the server to be imported in bulk.
The data copied goes into copies of the tables, without identity fields, so data can be inserted/updated without any violations in that respect. This data transfer does not have to be done at the database level, it can be done from .net or other facilities.
More information
The frequency of the updates will vary completely on how often records are updated. But the basic idea is that if a record is changed then the user can publish it to the live database. Alternatively we'll record the changes and send them across in a batch on a configurable frequency.
The amount of records we're talking are around 4000 rows per table for the core tables (product catalog) at the moment, but this is completely variable dependent on the client we deploy this to as each would have their own product catalog, ranging from 100's to 1000's of products. To clarify, each client is on a separate local/hosted database combination, they are not combined into one system.
As well as the individual publishing of items, we would also require a complete re-sync of data to be done on demand.
Another aspect of the system is that some of the data being copied from the local server is stored in a secondary database, so we're effectively merging the data from two databases into the one live database.

Well, I'm biased. I have to admit. I'd like to hypnotize you into shelling out for SQL Compare to do this. I've been faced with exactly this sort of problem in all its open-ended frightfulness. I got a copy of SQL Compare and never looked back. SQL Compare is actually a silly name for a piece of software that synchronizes databases It will also do it from the command line once you have got a working project together with all the right knobs and buttons. Of course, you can only do this for reasonably small databases, but it really is a tool I wouldn't want to be seen in public without.
My only concern with your requirements is where you are collecting product catalogs from a number of clients. If they are all in separate tables, then all is fine, whereas if they are all in the same table, then this would make things more complicated.

How much data are you talking about? how many 'client' dbs are there? and how often does it need to happen? The answers to those questions will make a big difference on the path you should take.

There is an almost infinite number of solutions for this problem. In order to narrow it down, you'd have to tell us a bit about your requirements and priorities.
Bulk operations would probably cover a wide range of scenarios, and you should add that to the top of your list.

I would recommend using Data Transformation Services (DTS) for this. You could create a DTS package for appending and one for re-creating the data.
It is possible to invoke DTS package operations from your code so you may want to create a wrapper to control the packages that you can call from your application.

In the end I opted for a set of triggers to capture data modifications to a change log table. There is then an application that polls this table and generates XML files for submission to a webservice running at the remote location.

Related

Point Connection String to custom utility

Currently we have our Asp mvc LOB web application talking to an SQL server database. This is setup through the a connection string in the web.config as usual.
We are having performance issues with some of our bigger customers that are running some really large reports and kpi's on the database which choke it up and cause performance issues for the rest of the users.
Our solution so far is to setup replication on the database and pass all the report and kpi data calls off to the replicated server and leave the main server for the common critical use.
Without having add another connection string to the config for the replicated server and go through the application and direct the report, kpi and other read only calls to the secondary db is there a way I can point the web.config connection string to an intermediary node that will analyse the data request and shuffle it off to the appropriate db accordingly? i.e. If the data call is a standard update process on the db it will shuffle that to the main db and if there is a report being loaded it will pass it off to the secondary replicated server.
We will only need to add this node in for the bigger customers with larger db's, so if we can get away with adding a node outside the current application setup it will save us a lot of code changes and testing needed.
Thanks in advance
I would say it may be easier for you to add a second connection string for reports, etc. instead of trying to analyse the request.
The reasons are as follows:
You probably have a fairly good idea which areas of your system need to go the second database. Once you identify them, you can just point them to to the second database and not worry about switching them back and forth.
You can just create 2 connection string in you config file. If you have only one database for smaller customers, you can point both connections to the same one database. For bigger customers, you can use two different connection strings. This way you will make the system flexible and configurable.
Analysing requests usually turns out to be complex and adding this additional complexity seems unwarranted in this case.
All my comments are based on what you wrote above and may not be absolutely valid - you know they system better, just use them if you want.

Merge multiple Access database into one big database

I have multiple ~50MB Access 2000-2003 databases (MDB files) that only contain tables with data. The data-databases are located on a server in my enterprise that can take ~1-2 second to respond (and about 10 seconds to actually open the 50 MDB file manually while browsing in the file explorer). I have other databases that only contain forms. Most of those forms-database (still MDB files) are actually copied from the server to the client (after some testing, the execution looks smoother) before execution with a batch file. Most of those forms-databases use table-links to fetch the data from the data-databases.
Now, my question is: is there any advantage/disadvantage to merge all data-databases from my ~50MB databases to make one big database (let's say 500MB)? Will it be slower? It would actually help to clean up my code if I wouln't have to connect to all those different databases and I don't think 500MB is a lot, but I don't pretend to be really used to Access by any mean and that's why I'm asking. If Access needs to read the whole MDB file to get the data from a specific table, then it would be slower. It wouldn't be really that surprising from Microsoft, but I've been pleased so far with MS Access database performances.
There will never be more than ~50 people connected to the database at the same time (most likely, this number won't in fact be more than 10, but I prefer being a little bit conservative here just to be sure).
The db engine does not read the entire MDB file to get information from a specific table. It must read information from the system tables (hidden tables whose names start with MSys) to determine where the data you need is stored. Furthermore, if you're using a query to retrieve information from the table, and the db engine can use an index to determine which rows satisfy the query's WHERE clause, it may read only those rows from the table.
However, you have issues with your network's performance. When those lead to dropped connections, you risk corrupting the MDB. That is why Access is not well suited for use in wide area networks or with wireless connections. And even on a wired LAN, you can suffer such problems when the network is flaky.
So while reducing the amount of data you pull across the network is a good thing, it is not the best remedy for Access on a flaky network. Instead you should migrate the data to a client-server db so it can be kept safe in spite of dropped connections.
You are walking on thin ice here.
Access will handle your scenario, but is not really meant to allow so many concurrent connections.
Merging everything in a big database (500mb) is not a wise move.
Have you tried to open it from a network location?
As far as I can suggest, I will use a backend SqlServer Express to merge all the tables in a single real client-server database.
The changes required by client mdb front-end should not be very pervasive.

Data Replication vs Service Bus vs App Fabric vs...?

I am build an application which needs to consume data from a source database. The source database has several issues including:
Performance issues
Legacy structure with terrible keys, naming conventions, etc.
Lots of data my application doesn’t care about
I would like to setup an application specific SQL Server database. The new database will be populated with a subset of data from the source database (and from a few other source systems). The data will always move one way from the source databases to the application specific database (i.e. - data won't sync back to the source). It will have a different DDL model than the source database.
The data doesn't need to be synced absolutely real time, but any longer than a few minute lag could cause issues.
How should I move data from the source database into the application database? Should I use
Replication
Write Custom SSIS Packages
Abstact to higher level SOA
solution like nServiceBus, AppFabric, etc?
Some other ideas?
Pros/cons to each?
Sounds to me like you don't need a messaging service like NServiceBus - this would involve modifying the legacy system to publish events whenever data changes, something I expect you don't want to get into. Because it is acceptable in your case for your local store of data to be slightly out of date, an SSIS package could be acceptable.
However, if the source database is very large, this could be an issue, as you will be doing it every few minutes. Also, if users of the legacy system are already experiencing performance problems, an SSIS package running every few minutes won't help. Maybe you could introduce a timestamp of the source data, so that it only copies new/modified data?
If the source data is very large and performance is seriously an issue, then maybe NServiceBus would be a good idea. You could also consider Mass Transit or your own simple solution built on MSMQ. But this will mean getting you hands dirty with the legacy code.

How to copy entire SQL Server 2008 database, applying WHERE clause to restrict copied data

To allow more realistic conditions during development and testing, we want to automate a process to copy our SQL Server 2008 databases down from production to developer workstations. Because these databases range in size from several GB up to 1-2 TB, it will take forever and not fit onto some machines (I'm talking to you, SSDs). I want to be able to press a button or run a script that can clone a database - structure and data - except be able to specify WHERE clauses during the data copy to reduce the size of the database.
I've found several partial solutions but nothing that is able to copy schema objects and a custom restricted data without requiring lots of manual labor to ensure objects/data are copied in correct order to satisfy dependencies, FK constraints, etc. I fully expect to write the WHERE clause for each table manually, but am hoping the rest can be automated so we can use this easily, quickly, and frequently. Bonus points if it automatically picks up new database objects as they are added.
Any help is greatly appreciated.
Snapshot replication with conditions on tables. That way you will get your schema and data replicated whenever needed.
This article describes how to create a merge replication, but when you choose snapshot replication the steps are the same. And the most interesting part is Step 8: Filter Table Rows. of course, because with this you can filter out all the unnecessary data to get replicated. But this step needs to be done for every entity and if you've got like hundreds of them, then you'd better analyze how to do that programmatically instead of going through the wizard windows.

why are multiple DBs actually needed?

I was looking at godaddy.com which says they offer up to 10 MySQL DBs, but I don't know why you would need more than 1 ever since a DB can have mutliple tables. Can't multiple DBs be integrated into a single DB? Is there an example case where its better or unfeasible to not have multiple ones? And how do you differentiate between them when you want to call them, from their directory or from a name?
Best,
I guess separation of concerns would be the most obvious answer. In the same way you can have all of your functionality in one humongous class in object oriented programming, it's a good idea to keep non-related information separate. It's easier to wrap your head around smaller chunks of data, and future developers mights start to think tables are related, and aggregate data in a way they were never meant to.
Imagine that you're doing two different projects with two different teams. Maybe you won't one team to access the other team tables.
There can also be a space limit in each database, and It each one can be configured with specific params to optimize the performance.
In other hand, two final users can be assigned to make the backups of each entire database, and you wan`t one user to make the backup of the other DB because he could be able to restore the database in other place and access the first database data.
I'm sure there are some pretty good DBAs on the forum who can answer this in detail.
Storing tables in different databases makes because you are able to backup them up individually. Furthermore, you will be able to control access to each database under different NT groups (e.g. Admin vs. users). Although this can be done at the indvidual table level, sometimes it makes sense to grant or deny access to an entire database to a particular group.
When you need to call them in SQL Server you need to append the database name to the query like this SELECT * FROM [MyDatabase].[dbo].[MyTable].
One other reason to use separate databases relates to whether you need full transactional recovery or not. For instance, if I havea bunch of tables that are populated on a schedule through import processes and never by the users, putting them in a separate database allows me to set the recovery mode to simple which reduces the logging (a good thing when you are loading millions of records at once). I can also not do transactional log backup every fifteen minutes like I do for the data in the database with the user inserted data. It could also make recovery a faster process when needed as the databases would be smaller and thus individally take less time to recover. Won't help much when the whole server crashes but it could help a lot if onely one datbase gets corrupted for some reason. If the data relates to different applications, it simplifies the security as well to have the data in separte databases. And of course sometimes we have commercial databases and we can;t add tables to those and so may need a separate database to handles some things we want to add to that data (we do this for instance with our Project Management software, we have a spearate database where we extract and summarize data from the PM system for reporting and then write all our custome reports off that.)

Resources