Best Method to transform data between multiple databases - sql-server

I have a very old system that contains three separate databases. The tables within these databases are organized very poorly and the system is inefficient because of it. Our new system requires the data from the old system to be transformed into one new database that has a different schema.
Question
What is the best approach to transform the data between the old system and the new? Also is there a method to synchronize the data so as people still use the old system the new system will be updated. This is my first time doing this, so please bare with my poorly worded question.
We are currently using MSSQL 2008 R2 and the new system is moving to MSSQL 2012.

To move the data I would suggest using SSIS. You can build queries that transform the data in the tables to the new tables. If the schema remains the same you could use SQL Server replication to synchronize them: http://msdn.microsoft.com/en-us/library/ms151198.aspx. However, this could lead to some performance issues in practice.

Related

Keep two different databases synchronized

I'm modeling a new microservice architecture migrating some part of a monolithic software to microservices.
I'm adding a new PostgreSQL database and the idea is in the future use that database but for now I still need to keep updated the old SQL Server database and also synchronize the PostgreSQL database if something new appears in the old database.
I've searched for ETL tools but are meant to move data to a datawarehouse (that's not what I need). I just can't replicate the information because the DB model is not the same.
Basically I need a way to detect new rows inserted in the SQL Server database, transform that information and insert it in my PostgreSQL.
Any suggestions?
PostgreSQL's foreign data wrappers might be useful. My approach would be, to change the frontend to use PostgreSQL and let postgreSQL handle the split via it's various features (triggers, rules, ...)
Take a look at StreamSets Data Collector. It can detect changes in SQL Server and insert/update/delete to any DB that has a JDBC driver including Postgres. It is open source but you can buy support. You can also make field changes/additions/removals/renaming to the data stream so that the fields match the target table.

How to eager load entire database with EF

My database consists of 5 tables with ~10000 rows combined. It takes ~1Mb in SQL Server CE which is on shared folder. The database itself is hierarchical Country-Region-City-Street-Building. I am using Entity Framework 4.
Because the database is small users are able to explore and edit all 2000 Cities in a WPF ListView. But with every approach I tried so far the GUI is sluggish (because of many database round-trips, with dummy data GUI is lightfast). How can I load entire database into memory with one or few database round-trips?
I tried multiple Include() but I noted great performance penalty as described here
Should I write my own ORM-light? I could also use plain ascii CSV files instead of database but it would obviously exclude concurrency.
Honestly, I've done something like this myself, and the answer for me was to copy the whole database locally and work on it.
If you're looking not only to read but also to write, I'd definitely suggest ditching CE and installing one of the Express versions of Sql Server. They are designed for this kind of situation; CE is not*.
*SP1 is better for concurrent access, but over the network will never be performant for large datasets.
I re-asked this question on Microsoft forum and they were kind to give me some guidance:
Basically my question can be restated as following:
read only once from database on application start
do all subsequent queries from local data, not database (for performance)
write to both context and database each time an entity is being added or deleted.
With plain EF it is not possible because each query goes to the database. This implies that I must read data fast on start and then cache it.
Implementation details:
The best way seems to be using ESQL to import data fast and then cache it, for example using entities not connected to context. From my first experiments it seems to work well.

LINQ with existing databases and unknown schema

I'm working on a database heavy project, where the Microsoft SQL databases are very mature (16 or more years-old mature), and an old product uses VB6 and ADO to generate sql which interacts with the database. I've been given the task of porting/re-writing the ancient version with a new .NET version.
I'd love to use LINQ-to-* to ensure easy maintainability, but having tried for the last several weeks I feel like LINQ-to-SQL isn't flexible enough, LINQ-to-Entities has too much overhead, and LINQ-to-Datasets is pointless since I would be just as happy using Ado.Net.
The program operates on two databases at once: one is a database with a very consistent schema containing meta-data, and the other a database which has a varying schema, is tightly coupled to the meta-database, and dictates what information from the meta-database you are interested in at any given time. Furthermore, I need non-LINQ information from both databases (such as system-stored procedures, and system-tables).
Is there any way to use LINQ intelligently here? I'd love the static typing, but if I can't have it I don't want to force my square app into a round framework.
Just an FYI, you can get access system tables (and sys stored procs too?) using LINQ. Here is how:
Create a connection to the server you want.
Right-click the server and choose Change View > Object Type.
You should now see System Tables and User Tables. You should see sysjobs there, and you can easily drag it onto a .dbml surface.
Above was stolen from this post.
The best answer seems to be to use ADO.NET completely. I have the option of using Linq-to-Sql over the metabase and ADO.NET for any other database access, but that would make the code feel too inconsistent for me.

How to have a "master-structure" database with "children-data" databases in SQL SERVER 2005?

I have been googling a lot and I couldn't find if this even exists or I'm asking for some magic =P
Ok, so here's the deal.
I need to have a way to create a "master-structured" database which will only contain the schemas, structures, tables, store procedures, udfs, etc, everything but real data in SQL SERVER 2005 (if this is available in 2008 let me know, I could try to convince my client to pay for it =P)
Then I want to have several "children" of that master db which implement those schemas, tables, etc but each one has different data.
So when I need to create a new stored procedure or something like that, I just create it on the master database (and of course it's available on its children).
Actually I have several different databases with the same schema and different data. But the problem is to maintain congruency between them. Everytime I create a script to create some SP or add some index or whatever, I have to execute it in every database, and sometimes I could miss one =P
So let's say you have a UNIVERSE (would be the master db) and the universe has SPACES (each one represented by a child db). So the application I'm working on needs to dynamically "clone" SPACES. To do that, we have to create a new database. Nowadays I'm creating a backup of the db being cloned, restoring it as a new one and truncate the tables.
I want to be able to create a new "child" of the "master" db, which will maintain the schemas and everything, but will start with empty data.
Hope it's clear... My english is not perfect, sorry about that =P
Thanks to all!
What you really need is to version-control your database schema.
See do-you-source-control-your-databases
If you use SQL Server, I would recommend dbGhost - not expensive and does a great job of:
synchronizing 2 databases
diff-ing 2 databases
creating a database from a set of scripts (I would recommend this version).
batch support, so that you can upgrade all your databases using a single batch
You can use this infrastructure for both:
rolling development versions to test, integration and production systems
rolling your 'updated' system to multiple production deployments (especially in a hosted environment)
I would write my changes as a sql file and use OSQL or SQLCMD via a batchfile to ensure that I repeatedly executed on all the databases without thinking about it.
As an alternative I would use the VisualStudio Database Pro tools or RedGate SQL compare tools to compare and propogate the changes.
There are kludges, but the mainstream way to handle this is still to use Source Code Control (with all its other attendant benefits.) And SQL Server is increasingly SCC friendly.
Also, for many (most robust) sites it's a per-server issue as much as a per-database issue.
You can put things in master like SPs and call them from anywhere. As far as other objects like tables, you can put them in model and new databases will get them when you create a new database.
However, in order to get new tables to simply pop up in the child databases after being added to the parent, nothing.
It would be possible to create something to look through the databases and script them from a template database, and there are also commercial tools which can help discover differences between databases. You could also have a DDL trigger in the "master" database which went out and did this when you created a new table.
If you kept a nice SPACES template, you could script it out (without data) and create the new database - so there would be no need to TRUNCATE. You can script it out from SQL or an external tool.
Little trivia here. The mssqlsystemresource database works as you describe: is defined once and 'appears' in every database as the special sys schema. Unfortunately the special 'magic' needed to get this working is not available to the user databases. You'll have to use deployment techniques to keep your schema in synk. That is, apply the changes to every database as the other answers already suggested.
In theory, you could put a trigger on your UNIVERSE.sysobjects table (assuming SQL Server), and then you could enumerate master.dbo.sysdatabases to find all the child databases. If you have a special table that indicates it's a child database, you can reference child.dbo.sysobjects to find it.
Make no mistake, it would be difficult to implement. But it's one way you could do it.

MS Access Application - Convert data storage from Access to SQL Server

Bear in mind here, I am not an Access guru. I am proficient with SQL Server and .Net framework. Here is my situation:
A very large MS Access 2007 application was built for my company by a contractor.
The application has been split into two tiers BY ACCESS; there is a front end portion that holds all of the Ms Access forms, and then on the back end part, which are access tables, queries, etc., that is stored on a computer on the network.
Well, of course, there is a need to convert the data storage portion to SQL Server 2005 while keeping all of these GUI forms which were built in Ms Access. This is where I come in.
I have read a little, and have found that you can link the forms or maybe even the access tables to SQL Server tables, but I am still very unsure on what exactly can be done and how to do it.
Has anyone done this? Please comment on any capabilities, limitations, considerations about such an undertaking. Thanks!
Do not use the upsizing wizard from Access:
First, it won't work with SQL Server 2008.
Second, there is a much better tool for the job:
SSMA, the SQL Server Migration Assistant for Access which is provided for free by Microsoft.
It will do a lot for you:
move your data from Access to SQL Server
automatically link the tables back into Access
give you lots of information about potential issues due to differences in the two databases
keeps track of the changes so you can keep the two synchronised over time until your migration is complete.
I wrote a blog entry about it recently.
You have a couple of options, the upsizing wizard does a decent(ish) job of moving structure and data from access to Sql. You can then setup linked tables so your application 'should' work pretty much as it does now. Unfortunately the Sql dialect used by Access is different from Sql Server, so if there are any 'raw sql' statements in the code they may need to be changed.
As you've linked to tables though all the other features of Access, the QBE, forms and so on should work as expected. That's the simplest and probably best approach.
Another way of approaching the issue would be to migrate the data as above, and then rather than using linked tables, make use of ADO from within access. That approach is kind of famaliar if you're used to other languages/dev environments, but it's the wrong approach. Access comes with loads of built in stuff that makes working with data really easy, if you go back to use ADO/Sql you then lose many of those benefits.
I suggest start on a small part of the application - non essential data, and migrate a few tables and see how it goes. Of course you back everything up first.
Good luck
Others have suggested upsizing the Jet back end to SQL Server and linking via ODBC. In an ideal world, the app will work beautifully without needing to change anything.
In the real world, you'll find that some of your front-end objects that were engineered to be efficient and fast with a Jet back end don't actually work very well with a server database. Sometimes Jet guesses wrong and sends something really inefficient to the server. This is particular the case with mass updates of records -- in order not to hog server resources (a good thing), Jet will send a single UPDATE statement for each record (which is a bad thing for your app, since it's much, much slower than a single UPDATE statement).
What you have to do is evaluate everything in your app after you've upsized it and where there are performance problems, move some of the logic to the server. This means you may create a few server-side views, or you may use passthrough queries (to hand off the whole SQL statement to SQL Server and not letting Jet worry about it), or you may need to create stored procedures on the server (especially for update operations).
But in general, it's actually quite safe to assume that most of it will work fine without change. It likely won't be as fast as the old Access/Jet app, but that's where you can use SQL Profiler to figure out what the holdup is and re-architect things to be more efficient with the SQL Server back end.
If the Access app was already efficiently designed (e.g., forms are never bound to full tables, but instead to recordsources with restrictive WHERE clauses returning only 1 or a few records), then it will likely work pretty well. On the other hand, if it uses a lot of the bad practices seen in the Access sample databases and templates, you could run into huge problems.
It's my opinion that every Access/Jet app should be designed from the beginning with the idea that someday it will be upsized to use a server back end. This means that the Access/Jet app will actually be quite efficient and speedy, but also that when you do upsize, it will cause a minimum of pain.
This is your lowest-cost option. You're going to want to set up an ODBC connection for your Access clients pointing to your SQL Server. You can then use the (I think) "Import" option to "link" a table to the SQL Server via the ODBC source. Migrate your data from the Access tables to SQL Server, and you have your data on SQL Server in a form you can manage and back up. Important, queries can then be written on SQL Server as views and presented to the Access db as linked tables as well.
Linked Access tables work fine but I've only used them with ODBC and other databases (Firebird, MySQL, Sqlite3). Information on primary or foreign keys wasn't passing through. There were also problems with datatype interpretation: a date in MySQL is not the same thing as in Access VBA. I guess these problems aren't nearly as bad when using SQL Server.
Important Point: If you link the tables in Access to SQL Server, then EVERY table must have a Primary Key defined (Contractor? Access? Experience says that probably some tables don't have PKs). If a PK is not defined, then the Access forms will not be able to update and insert rows, rendering the tables effectively read-only.
Take a look at this Access to SQL Server migration tool. It might be one of the few, if not the ONLY, true peer-to-peer or server-to-server migration tools running as a pure Web Application. It uses mostly ASP 3.0, XML, the File System Object, the Data Dictionary Object, ADO, ADO Extensions (ADOX), the Dictionary Scripting Objects and a few other neat Microsoft techniques and technologies. If you have the Source Access Table on one server and the destination SQL Server on another server or even the same server and you want to run this as a Web Internet solution this is the product for you. This example discusses the VPASP Shopping Cart, but it will work for ANY version of Access and for ANY version of SQL Server from SQL 2000 to SQL 2008.
I am finishing up development for a generic Database Upgrade Conversion process involving the automated conversion of Access Table, View and Index Structures in a VPASP Shopping or any other Access System to their SQL Server 2005/2008 equivalents. It runs right from your server without the need for any outside assistance from external staff or consultants.
After creating a clone of your Access tables, indexes and views in SQL Server this data migration routine will selectively migrate all the data from your Access tables into your new SQL Server 2005/2008 tables without having to give out either your actual Access Database or the Table Contents or your passwords to anyone.
Here is the Reverse Engineering part of the process running against a system with almost 200 tables and almost 300 indexes and Views which is being done as a system acceptance test. Still a work in progress, but the core pieces are in place.
http://www.21stcenturyecommerce.com/SQLDDL/ViewDBTables.asp
I do the automated reverse engineering of the Access Table DDLs (Data Definition Language) and convert them into SQL equivalent DDL Statements, because table structures and even extra tables might be slightly different for every VPASP customer and for every version of VP-ASP out there.
I am finishing the actual data conversion routine which would migrate the data from Access to SQL Server after these new SQL Tables have been created including any views or indexes. It is written entirely in ASP, with VB Scripting, the File System Object (FSO), the Dictionary Object, XML, DHTML, JavaScript right now and runs pretty quickly as you will see against a SQL Server 2008 Database just for the sake of an example.
It takes perhaps 15-20 seconds to reverse engineer almost 500 different database objects. There might be a total of over 2,000 columns involved in this example for the 170 tables and 270 indexes involved.
I have even come up with a way for you to run both VPASP systems in parallel using 2 different database connection files on the same server just to be sure that orders entered on the Access System and the SQL Server system produce the same results before actual cutover to production.
John (a/k/a The SQL Dude)
sales#designersyles.biz
(This is a VP-ASP Demo Site)
Here is a technique I've heard one developer speak on. This is if you really want something like a Client-Server application.
Create .mdb/.mde frontend files distributed to each user (You'll see why).
For every table they need to perform an CRUD, have a local copy in the file in #1.
The forms stay linked to the local tables.
Write VBA code to handle the CRUD from the local tables to the SQL Server database.
Reports can be based off of temp tables created from the SQL Server (Won't be able to create temp tables in mde file I don't think).
Once you decide how you want to do this with a single form, it is not too difficult to apply the same technique to the rest. The nice thing about working with the form on a local table is you can keep a lot of the existing functionality as the existing application (Which is why they used and continue to use Access I hope). You just need to address getting data back and forth to the SQL Server.
You can continue to have linked tables, and then gradually phase them out with this technique as time and performance needs dictate.
Since each user has their own local file, they can work on their local copy of the data. Only the minimum required to do their task should ever be copied locally. Example: if they are updating a single record, the table would only have that record. When a user adds a new record, you would notice that the ID field for the record is Null, so an insert statement is needed.
I guess the local table acts like a dataset in .NET? I'm sure in some way this is an imperfect analogy.

Resources