Migrate multiple Access DB (with same function but different data) to single SQL Server DB - sql-server

Situation
I have 5 Access DB files, each one has 10 tables, 40 queries and 8 macros. All 5 Access DB files have same table name, table structure, same queries and same macros. The only different is the data contain in the table. If it matters, some tables on each database has rows between few hundreds to 100K+.
What I am trying to achieve
I am migrating these 5 Access DB files to single SQL Server (2008) database. Edit: After migrating, I do need to know which tables belong to which database since each original Access DB is associated with company's department so I need to keep track of this.
My Solutions or Options
Tables will be imported to SQL Server as tables. Queries will be imported as Stored Procedures. Macro will be imported as new Stored Procedures.
Import each Access DB's tables and queries to SQL Server DB and rename each tables and queries by giving them prefix to identify which tables belong to which database.
Same as #1, however, only import tables. As for the queries, only import one set of queries (40 queries) and modify them to dynamically select, insert, update or delete from the tables.
Import table A from 1st Access DB, table A from 2nd Access DB, table A from 3rd Access DB and so on, to one new table in SQL Server and give them unique identifier to identify which row of data belong to which database.
What do you think is the best approach? Please tell me if there is better way to do this than what I have listed. Thanks!

I would migrate them to MS SQL like so:
Import all tables from database 1 into corresponding tables from SQL Server, but add a new primary key with the name of the old one, rename the old pk and identifier for the database.
Update all foreign keys to the new pk field using the old pk and the identifier.
Repeat for databases 2-5
Either delete the identifier or keep it, depending if you need to know where the rows came from (same for old primary keys)
Only import queries/macros once, as they are the same.
When doing it this way, you keep the pk-fk relations and the queries intact and still know where the rows came from.

I would say number 3. You would get no duplication code and much easier maintenance.
One example of easier maintenance is performance tuning. You say the queries are the same in the 5 access DBs: say you detect one of the queries runs too slow and you decide that you need to create an index on an underlying table. In option #1 and #2 this would mean recreating the same index on 5 "twin" tables.

In access for each of these databases, you could assign each of the department field id (new field) with it's on identifier in a new table (each department has different value), and then add this value to each of the tables that is to be imported. Create a new table that has the department information in it, then create join table that connect these tables. Thus, each department is differentiated between each other.

Related

What causes the same data to have different sizes in two different SQL server databases?

I have a table with 339 million rows and twenty-one columns of which seventeen are varchar(100) types, two are integer types and one is a float and a DateTime type. It is in an Azure SQL server database. The table has no indices and only the primary key constraint. I aim to copy this table to a new database and delete the old one. My approach to this problem is to save the data on Delta Lake and use an Azure Data Factory pipeline to save it to the new database. I have used this approach several times before for migrating tables to new databases.
However, I am met with a strange problem. In the old database, the table is about 80Gb in total. Yet on the new database, it takes only 54% of the data (182 million rows) to fill 276Gb. There is no DBA in my team to help me with this matter. What could, possibly, be causing this? I hope I have included all the information that could help with this issue.

Can we implement symmetricds in databases which are identical but, tables have different PK id for same tables

Can I implement symmetricDS in identical databases?
My scenerio
I have to databases:
Database A
Database B
Whatever data change happens in either one of them should reflect in the other:
Current situation:
Even though the DB are identical, database B have less tables that database A
Consider a table tableA from database A and same table in database B
But pk id for same records are actually different in two tables
Can i expand and implement symmetricDS if i want to expand to a third database
Currently i am using a mapping table and API to handle datasync.
Can i move to symmetricDS for syncing data
Yes, go ahead
SymemtricDs allows for bidirectional synchronization of databases
Only the tables of database B will be configured for synchronization. The extra tables from database A might be added to the mix using table transformation.
As long as there are constraints of uniqueness on columns in, for example, database A that are PKs in database B that will not be a problem.
You can add as many types and instances of those types of databases. Bear in mind that the graph of database relationships must satisfy the definition of a tree.

Data Warehousing GUID to Int PrimaryKeys

I'm a (very) junior Analyst responsible for setting up an mssql DWH which hosts data from our CRM for reporting purposes.
The current CRM uses uniqueidentifiers in its mssql database for all keys, and some of the tables have 8m+ rows. In our reporting software (Qlikview) I can swap the GUIDs for ints and take an 800mb data file down to 90mb which is excellent, however I'd like to perform this logic in the DWH if possible to make it faster and a little cleaner.
My issue is I have no idea how to do so while maintaining FK links to other tables. I have considered maintaining a staging table of GUIDs and associated numeric IDs however this seems inefficient and poses a problem of then trying to write some arbitrary numeric ID to the PK column of the destination table which I'm sure is a terrible idea.
The DWH import works as follows: I have USPs on the source db performing SELECTs which are executed by a SSIS package, the output of which are placed in tables of the same name on the [Staging] schema of the DWH. From there, transform is performed by USPs on the DWH, also executed by the same SSIS package, which handles execution order and multi-threading. Whatever implementation I come up with will need to be compatible with this architecture (done within USPs that potentially run asynchronously).
I'm very much a SQL noob so I do ask to please link documentation if necessary or at least describe answers in a google-friendly way.
Is the removal of GUID is the major cause of possible shrink to 90mb ? Do you not need GUID to process the Report?
Do you strip the relationship and join almost all table into as few table as possible when creating the staging table?
If answer to number 1 and 2 is yes then you do not need GUID and simply need to have a int unique column.
I suggest in select command during creating/inserting staging table you use ROW_NUMBER for replacing the GUID column with int unique column. This is only going to work if you recreating the staging table each time running the SSIS Script.
If you are simply inserting data to an already existing Staging Table when running SSIS Script then you can just create an autoincrement primary column. When you insert data to Staging Table, do not insert to autoincrement primary column so the column is automatically generating unique int value.

How do I store data that is shared between databases?

How do I store data that is shared between databases?
Suppose a database for a contact management system. Each user is given a separate database. User can store his/her contacts' education information.
Currently there's a table called School in every database where the name of every school in the country is stored. School table is referenced as a FK by Contact table.
School table gets updated every year or so, as new schools get added or existing schools change names.
As the school information is common across all user databases, moving it into a separate common database seems to be a better idea. but when it's moved to a separate database, you can not create a FK constraint between School and Contact.
What is the best practice for this kind of situation?
(p.s. I'm using SQL Server if that is relevant)
Things to consider
Database is a unit of backup/restore.
It may not be possible to restore two databases to the same point in time.
Foreign keys are not supported across databases.
Hence, I would suggest managing the School -- and any other common table -- in one reference DB and then replicating those tables to other DBs.
Just straight out of the box, foreign key constraints aren't going to help you. You could look into replicating the individual schools table.
Based on the fact that you won't query tables with the SchoolID column very often I'll asume that inserts/updates to these tables will be really rare... In this case you could create a constraint on the table in which you need the FKs that checks for the existence of such SchoolID in the Schools table.
Note that every insert/update to the table with the SchoolID column will literally perform a query to another DB so, distance between databases, the way they connect to each other and many other factors may impact the performance of the insert/update statements.
Still, if they're on the same server and you have your indexes and primary keys all set up, the query should be fairly fast.

Use SSIS to migrate and normalize database

We have an MS Access database that we want to migrate to a SQL Server Database with a new DB design. A part of the application that uses the SQL Server DB is already written.
I looked around to find out how to do the migration step most easily and started with Microsofts SQL Server Integration Services (SSIS). Now I have gotten to the point that I want to split a table vertically for normalization reasons.
A made up example looks like this
MS Access table person
ID
Name
Street
SQL Server table person
id
name
SQL Server table address
id
person_id
street
How can I complete this task best with SSIS? The id columns are identity (autoincrement) columns, so I cannot insert the old ID. How can I put the correct person_id foreign key in the address table?
There might even be a table which has to be broken up into three tables, where a row in table2 belongs to table1 and a row in table3 belongs to a row table2.
Is SSIS the appropriate means for this?
EDIT
Although this is a one-time migration, we need to have an automated and repeatable process, because the production database is under heavy usage and we are working on the migration in our development environment with recent, but not up-to-date data. We plan for one test run of the migration and have the customer review the behaviour. If everything is fine, we will go for the real migration.
Most of the given solutions include lots of manual steps and are thus not appropriate.
Use the execute SQL Task and write the statement yourself.
For the parent table do Select into table from table... then do the same for the rest as you progress. Make sure you set identity insert to ON for the parent table and reuse your old ID's. That will help you keep your data integrity.
For migrating your Access tables into SQL Server, use SSMA, not the Upsizing Wizard from Access.
You'll get a lot more tools at your disposal.
You can then break up your tables one by one from within SQL Server.
I'm not sure if there are any tools that can help you split your tables automatically, at least I couldn't find any, but it's not too difficult to do manually although how much work is required depends on how you used the original tables in your VBA code and forms in the first place.
A side note
Regarding normalization, don't go overboard with it: I know your example was just that but normalizing customer addresses is not always (rarely?) needed.
How many addresses can a person have?
If you count a home address, business address, delivery address, billing address, that's probably the most you'll ever need.
In that case, it's better to just keep them in the same table. Normalizing that data will just require more work to recombine and offers no benefit.
Of course, there are cases where it would make sense to normalise but I've seen people going overboard with the notion (I've been guilty of it as well) and then find themselves struggling to build more complex queries to join all that split data, making development and maintenance harder and often suffering a performance penalty in the process.
Access is so user-friendly, why not normalize your tables in Access, and then upsize the finished structure from there?
I found a different solution which was not mentioned yet and allows us to use all the comfort and options of the dataflow task:
If the destination database is on a local SQL Server, you can use a dataflow task with SQL Server destination instead of an OLE DB destination.
For a SQL Server destination you can mark the "keep identities" option. (I do not know if the english names are correct, because we have a german version.) With this you can write into identity columns
We found that we cannot use the old primary keys everywhere, because we have some tables that take a union of records from multiple tables.
We start the process by building a temporary mapping table with columns
new_id (identity)
old_id (int)
old_tablename (string)
We first fill in all the old_id s for every table that is referenced by a foreign key in the new schema. The new_id values are generated automatically by SQL Server.
So we can use a join to translate from old_id to new_id where needed. We use the new_id values to fill the identity (primary key) columns in the new tables with the "keep identities" option and can simply look them up in our mapping table for the foreign keys by a join.
You might also look at Jamie Thomson's SSIS Normalizer component. I just found out about it today (haven't actually tried it yet). The example he posts looks a lot like the one in your question.

Resources