Copy Database Data from Many DBs to One. Data Replication (sort of) - sql-server

This involves data replication, kind of:
We have many sites with SQL Express installed, there is an 'audit' database on each site that has one table in 1st normal form (to make life simple :)
Now I need to get this table from each site, and copy the contents (say, with a Date Time Value > 1/1/200 00:00, but this will change obviously) and copy it to a big 'super table' in sql server proper, that also has the primary key as the Site Name (That needs injecting in) and the current primary key from the SQL Express table)
e.g. Many SQL Express DBs with the following table columns
ID, Definition Name, Definition Type, DateTime, Success, NvarChar1, NvarChar2 etc etc etc
And the big super table needs to have:
SiteName, ID, Definition Name, Definition Type, DateTime, Success, NvarChar1, NvarChar2 etc etc etc
Where items in bold are the primary key(s)
Is there a Microsoft (or non MS I suppose) app/tool/thing to manager copying all this data accross already, or do we need to write our own?
Many thanks.

You can use SSIS (which comes with SQL Server) to populate, it can be set up with variables to change the connection string to the various databases. I have one that loops through the whole list and does the same process using three differnt files from three differnt vendors. You could so something simliar to loop through the different site databases. Put the whole list of database you want to copy the audit data from in a table and loop through it changing the connection string each time.
However, why on earth would you want one mega audit table per site? If every table in the database populates the audit table as changes happen, then the audit table eventually becomes a huge problem for performance. Every insert, update and delete has to hit this table and then you are proposing to add an export on top of that. This seems to me to be a guaranteed structure for locking and deadlocks and all sorts of nastiness. Do yourself a favor and limit each audit table to the table it is auditing.

Things to consider:
Linked servers and sp_msforeachdb as part of a do-it-yourself solution.
SQL Server Replication (by Microsoft) (which I believe can pull data from SQL Server Express)
SQL Server Integration Services which can pull data from SQL Server Express instances.
Personally, I would investigate Integration Services first.
Good luck.

You could do this with SymmetricDS. SymmetricDS is open source, web-enabled, database independent, data synchronization/replication software. It uses web and database technologies to replicate tables between relational databases in near real time. The software was designed to scale for a large number of databases, work across low-bandwidth connections, and withstand periods of network outage.
As of right now, however, you would need to implement a custom IDataLoaderFilter extension point (in Java) to add the extra column. The metadata would be available though because your SiteName would be the external_id.

Related

SQL Server 2005 Auditing

Background
I have a production SQL Server 2005 server to which 4 different applications connect and make changes.
There are no foreign keys and in some cases no primary keys.
Unfortunately throwing the whole thing out and starting from scratch is not an option.
So my solution is to start migrating each of the applications to a service layer approach so that there is only one application directly connecting to the database.
However there are problems that need to be fixed before that service layer is written and all the applications are migrated over.
So rather than make changes and hope they don't break any one of the 4 badly written applications (with no way of quickly testing all functionality) my solution is to start auditing the database
Problem
How do I audit what stored procedures, tables, columns, views are being accessed/updated/called by each user on SQL Server 2005.
I can find out which tables are being updated but I have no idea which columns and by what users.
I also don't know if certain tables are being accessed only through stored procedures/views.
I know that SQL Server 2008 has better auditing features but if I could do this without spending money that would be great. That said if the best solution is to upgrade or buy software that's also an option.
Check out SQL Server 2008's CDC feature. You can't use this directly in 2005 but you can write a trigger for each table to log all data changes to a new audit table. i.e. you'd have an audit table for each table in your db, with all the same columns plus some additional columns saying what the operation was and when it occurred.
If the nature of your applications means you can get user information and/or application information from CURRENT_USER and APP_NAME() you could include that information in the audit table too.
And check out this answer for more goodness.

Is there a free GUI tool for data sync between DB in which it is possible to script rules?

What I need to do is some data between 2 databases. The source can be anything (comma separated file, xls file, any database, ...), the destination is MS SQL Server.
I do not need to sync all data, I just need to sync particular tables.
Example:
I need to sync accounting Software (runs on PostgreSQL) CUSTOMERS table with CRM (runs on SQL Server).
Some problems this tool should be able to face:
1) Accounting software customers table has 1 field that is not mapped in crm customers table. (In this way I want to map this extra field to the field CUSTOMERS_CUSTOM_DATA.EXTRA_FIELD)
2) Having some rules (like sync only customers whose code is between 10000 and 99999)
3) Allowing to perform some post insert tasks (for example I am using manually managed seuqences for the tanble IDs, so after inserting 10 records I need to add 10 to the sequence)
4) Having an exception handling mechanism so if something is wrong it can wither call a sql server stored procedure (that I already have and it will send an e-mail to me) or simply send a message to notify that something was wrong in the nightly sync.
5) Be easy to schedule when to perform data sync (hourly, daily, INCLUDING MANUAL)
6) Perform data conversion: if Surname field in source table is varchar(20) and in destination table is varchar(15) I want to explicitly say "perform a truncation".
7) Have different rules for insert or update. For example in the source e-mail field is not present, but I want to populate it in the destination I decide to perform this operation on insert only, not on update. (for example as I insert a new customer I want to populate the e-mail field concatenating name and surname, but then I want to let the users to modify it, this first insertion is just to simplify data entry, but then this particular case will be handled manually. So I want to say (on insert populate e-mail field, on update don't do anything with email field)
8) In case of delete in the source db don't delete on the destination but only change the varchar(10) STATUS to DELETED.
Note: I know that Integration Services will be perfect for this, but I must support the Express Edition, so SSIS is not an option.
I created a bunch of scripts and scheduled stored procedure that at present do what I need, but it is very hard to maintain and the total lack of a GUI makes the work much slower. I remember having seeing TALEND time ago, maybe that tool is also the answr I need, anyway I need to provide a quick answer to management, so I have now no time to investigate all the tools on the market, and I would prefer to have a suggestion from an expert.
I believe SQL Server Integration Services does all that, and I believe SQL Server Management Studio allows you to create and package your SSIS jobs so that they can be deployed elsewhere.
Finally I went for TALEND, I never really used SSIS, I just saw a live demo of it at a SQL Server conference. Anyway Talend is a free alternative (and quite rich) to SSIS, so it will suite the needs of all customers, including the ones (95%) that has SQL Server Express.

Linking tables between databases

I’m after a bit of advice on the best way to go about this is SQL server 2008R2 express. I have a number of applications that are in separate databases on the same server. They are all “plugins” that use a central staff/structure list that will be in a separate database. The application is in the process of being migrated from JET.
What I’m looking for is the best way of all the “plugin” databases being able to see the central database and use those tables in standard queries and views etc.
As I’m using express that rules out any replication solution and so far the only option I can think of is to use triggers or a stored procedure to “push” out all the changes to the plugins. The information needs to be populated on a near enough real time basis however the number of changes will be very small maybe up to 100 a day and the biggest table only has about 1000 rows at the moment (the staff names table).
Hopefully that will cover all everything but if anyone needs any more details then just ask
Thanks
Apologies if I've misunderstood, but from your description it sounds like all these databases are hosted on the same instance of SQL Server - it's your mention of replication that makes me uncertain.
Assuming that's the case, you should be able to replace any copies of tables from the central database which are held in the "plugin" databases with views or synonyms which reference the central tables directly, since SQL server allows you to make references between databases on the same server using three-part naming (database_name.schema_name.object_name)
For example, if each plugin db has a table StaffNames, you could replace this with a view by dropping the table, then creating a view:
drop table StaffNames
go
create view StaffNames
as
select * from <centraldbname>.<schema - probably dbo>.StaffNames
go
and your code should continue to work seamlessly, as long as permissions are set up.
Alternatively, you could replace all the references to the shared tables in the plugin databases with three-part name references to the central database, but the view method requires less work.

Moving client data from one database to a new one

Our application architecture allows us to host multiple clients in a single database, and also host multiple databases. This allows us to scale out by distributing clients across multiple databases. For example, 20 clients can be in database A, and another 15 could be in database B. We use a ClientID field in almost every table to partition client data. All our table's primary keys are INT identity TableID fields.
I'm looking for a tool/script that would help me extract client data from one database, and move it to a brand new database (so the PKs can stay the same). I'm hoping this exists already so we don't have to build our own. Pretty flexible in how this could work, but ideally it just generates a large .sql file with all the necessary INSERTS in the right order to move the data, and another sql file with all the necessary DELETES to erase the data from the source.
If it makes any difference we are on SQL Server 2008.
If you have standard or enterprise, you do have SSIS. Although it may not qualify as a "tool", it is fairly easy to implement in this scenario.
I can recomend redgate SQL DataCompare for this, we use it for syncing data, and use their SQL Compare to sync the database schema.
Both tools can either output sql, you can execute yourself, or the tools can execute the sql scripts themself.
They have a command line version of the tools to, so you could use them in an deployment script, tho i haven't tried this.
They both work really well, and are no doubt worth the price.
Not the answer you may be looking for, but you should consider using a GUID as a key. This will ensure that you have some type of unique identifier for your all records and that you can avoid collisions with identity keys / integer based indexes. It would add another degree of traceability should something go wrong when you migrate between databases.
SplendidCRM uses this technique when importing data from other DB systems.
Update:
My assumption was that the operation of transferring data between databases was not that frequent and that you needed database architecture for that task. I would use the GUID as lookup key specifically validation for the transfer of data, but I would NOT use that as a primary key for joins for standard operations like URL's. Although unique across databases, the trade-off is that GUIDs are slow.
In other words, the GUIDS would in addition to your existing primary keys now, and act as a means of validation for you should something go wrong. If you need ClientID in Database A to retain the same value in Database B then an identity column as that identifier will be an issue. You may have to create another identifier that is not "auto-generated". This could something other than the GUID, but my instinct is that integers alone will not be enough. Maybe you can create a columns that is a hash of the identity key, customer name and database name, or more simply, just concatenate those columns into a varchar column.

Use SSIS to migrate and normalize database

We have an MS Access database that we want to migrate to a SQL Server Database with a new DB design. A part of the application that uses the SQL Server DB is already written.
I looked around to find out how to do the migration step most easily and started with Microsofts SQL Server Integration Services (SSIS). Now I have gotten to the point that I want to split a table vertically for normalization reasons.
A made up example looks like this
MS Access table person
ID
Name
Street
SQL Server table person
id
name
SQL Server table address
id
person_id
street
How can I complete this task best with SSIS? The id columns are identity (autoincrement) columns, so I cannot insert the old ID. How can I put the correct person_id foreign key in the address table?
There might even be a table which has to be broken up into three tables, where a row in table2 belongs to table1 and a row in table3 belongs to a row table2.
Is SSIS the appropriate means for this?
EDIT
Although this is a one-time migration, we need to have an automated and repeatable process, because the production database is under heavy usage and we are working on the migration in our development environment with recent, but not up-to-date data. We plan for one test run of the migration and have the customer review the behaviour. If everything is fine, we will go for the real migration.
Most of the given solutions include lots of manual steps and are thus not appropriate.
Use the execute SQL Task and write the statement yourself.
For the parent table do Select into table from table... then do the same for the rest as you progress. Make sure you set identity insert to ON for the parent table and reuse your old ID's. That will help you keep your data integrity.
For migrating your Access tables into SQL Server, use SSMA, not the Upsizing Wizard from Access.
You'll get a lot more tools at your disposal.
You can then break up your tables one by one from within SQL Server.
I'm not sure if there are any tools that can help you split your tables automatically, at least I couldn't find any, but it's not too difficult to do manually although how much work is required depends on how you used the original tables in your VBA code and forms in the first place.
A side note
Regarding normalization, don't go overboard with it: I know your example was just that but normalizing customer addresses is not always (rarely?) needed.
How many addresses can a person have?
If you count a home address, business address, delivery address, billing address, that's probably the most you'll ever need.
In that case, it's better to just keep them in the same table. Normalizing that data will just require more work to recombine and offers no benefit.
Of course, there are cases where it would make sense to normalise but I've seen people going overboard with the notion (I've been guilty of it as well) and then find themselves struggling to build more complex queries to join all that split data, making development and maintenance harder and often suffering a performance penalty in the process.
Access is so user-friendly, why not normalize your tables in Access, and then upsize the finished structure from there?
I found a different solution which was not mentioned yet and allows us to use all the comfort and options of the dataflow task:
If the destination database is on a local SQL Server, you can use a dataflow task with SQL Server destination instead of an OLE DB destination.
For a SQL Server destination you can mark the "keep identities" option. (I do not know if the english names are correct, because we have a german version.) With this you can write into identity columns
We found that we cannot use the old primary keys everywhere, because we have some tables that take a union of records from multiple tables.
We start the process by building a temporary mapping table with columns
new_id (identity)
old_id (int)
old_tablename (string)
We first fill in all the old_id s for every table that is referenced by a foreign key in the new schema. The new_id values are generated automatically by SQL Server.
So we can use a join to translate from old_id to new_id where needed. We use the new_id values to fill the identity (primary key) columns in the new tables with the "keep identities" option and can simply look them up in our mapping table for the foreign keys by a join.
You might also look at Jamie Thomson's SSIS Normalizer component. I just found out about it today (haven't actually tried it yet). The example he posts looks a lot like the one in your question.

Resources