I have several versions of a single access table which I need to move to a sql server db, which I have built. What I mean by several versions, there are 6 different schools that built there own access table to track records. I am consolidating. The wrinkle is that each location made there table slightly different.
The Access table is like:
Student (StudentId,
StudentName,
StudentAddress,
TeacherId,
TeacherName,
Class1,
Class2,
Class3,
Class4,
Class5,
StudentStatus)
I have built a new database with multiple tables, such as a (this is a rough layout)
Student table (StudentId, FirstName, LastName, TeacherId)
Teacher table (TeacherId, FirstName, LastName)
Class table (ClassId, ClassName, TeacherId)
etc
I imported the access tables to a 'staging' schema in the sql server DB (staging.Location1, staging.Location2, etc). Now I need to either create an SSIS package to copy the data to the new tables, or write some stored procs. I imagine I will need to do each location individually.
As I stated above, each access table is slightly different, with different datatypes and column names. For example, one location has the studentID as a varchar and is called StudId, while another location has it as an int and it is called SID, or StudentId, etc.
I can not decide to most efficient route. Am I correct in thinking that either way, this will require at least one SSIS package or SP per location due to the differences?
I say do it this way:
Create only one staging table that covers all data you have in those 3 table. Use right data types and field names. Use Identity field as primary key and use a school name as a identifier for each school
Copy data from 3 access db into the single staging. do data conversion if needed. This way you will have all data in one place and they are all consistent.
Read data from staging and populate the 3 tables you have in your new design. You only need one SSIS package to do this part, you just need to change the destination connection and change the filter on school name
Related
I am attempting to create an Employee table in SQL Server 2016 and I want to use EmpID as the Primary Key and Identity. Here is what I believe to be true and my question: When I create the Employee table with EmpID as the Primary Key and an Identity(100, 1) column, each time I add a new employee, SQL Server will auto create the EmpID starting with 100 and increment by 1 with each new employee. What happens if I want to import a list of existing employees from another company and those employees already have an existing EmpID? I haven't been able to figure out how I would import those employees with the existing EmpID. If there is a way to import the employee list with the existing EmpID, will SQL Server check to make sure the EmpID's from the new list does not exist for a current employee? Or is there some code I need to write in order to make that happen?
Thanks!
You are right about primary keys, but about importing employees from another company and Merging it with your employee list, you have to ask these things:
WHY? Sure there are ways to solve this problem, but why will you merge other company employees into your company employee?
Other company ID structure: Most of the time, companies have different ID structure, some have 4 characters others have only numbers so on and so forth. But you have to know the differences of the companies ID Structure.
If the merging can't be avoided, then you have to tell the higher ups about the concern, and you have to tell them that you have to give the merging company new employee ID's which is a must. With this in my, simply appending your database with the new data is the solution.
This is an extremely normal data warehousing issue where a table has data sources from multiple places. Also comes up in migration, acquisitions, etc.
There is no way to keep the existing IDs as a primary key if there are multiple people with the same ID.
In the data warehouse world we would always create a new surrogate key, which is the primary key to the table, and include the original key and a source system identifier as two attributes.
In your scenario you will probably keep the existing keys for the original company, and create new IDs for the new employees, and save the oldID in an additional column for historical use.
Either of these choices also means that as you migrate other associated data such as leave information imported from the old system, you can translate it to the new key by looking up OldID in the employee table, and finding the associated newID to associate it with when saving your lave records in the new system.
At the end of the day there is no alternative to this, as you simply cant have two employees with the same primary key.
I have never seen any company that migrate employees from another company and keep their existed employee id. Usually, they'll give them a new ID and keep the old one in the employee file for references uses. But they never uses the old one as an active ID ever.
Large companies usually uses serial of special identities that are already defined in the system to distinguish employees based on field, specialty..etc.
Most companies they don't do the same as large ones, but instead, they stick with one identifier, and uses dimensions as an identity. These dimensions specify areas of work for employees, projects, vendors ..etc. So, they're used in the system globally, and affected on company financial reports (which is the main point of using it).
So, what you need to do is to see the company ID sequence requirements, then, play your part on that. As depending on IDENTITY alone won't be enough for most companies. If you see that you can depend on identity alone, then use it, if not, then see if you can use dimensions as an identity (you could create five dimensions - Company, Project, Department, Area, Cost Center - it will be enough for any company).
if you used identity alone, and want to migrate, then in your insert statement do :
SET IDENTITY_INSERT tableName ON
INSRT INTO tableName (columns)
...
this will allow you to insert inside identity column, however, doing this might require you to reset the identity to a new value, to avoid having issues. read DBCC CHECKIDENT
If you end up using dimensions, you could make the dimension and ID both primary keys, which will make sure that both are unique in the table (treated as one set).
I am trying to use change tracking to copy data incrementally from a SQL Server to an Azure SQL Database. I followed the tutorial on Microsoft Azure documentation but I ran into some problems when implementing this for a large number of tables.
In the source part of the copy activity I can use a query that gives me a change table of all the records that are updated, inserted or deleted since the last change tracking version. This table will look something like
PersonID Age Name SYS_CHANGE_OPERATION
---------------------------------------------
1 12 John U
2 15 James U
3 NULL NULL D
4 25 Jane I
with PersonID being the primary key for this table.
The problem is that the copy activity can only append the data to the Azure SQL Database so when a record gets updated it gives an error because of a duplicate primary key. I can deal with this problem by letting the copy activity use a stored procedure that merges the data into the table on the Azure SQL Database, but the problem is that I have a large number of tables.
I would like the pre-copy script to delete the deleted and updated records on the Azure SQL Database, but I can't figure out how to do this. Do I need to create separate stored procedures and corresponding table types for each table that I want to copy or is there a way for the pre-copy script to delete records based on the change tracking table?
You have to use a LookUp activity before the Copy Activity. With that LookUp activity you can query the database so that you get the deleted and updated PersonIDs, preferably all in one field, separated by comma (so its easier to use in the pre-copy script). More information here: https://learn.microsoft.com/en-us/azure/data-factory/control-flow-lookup-activity
Then you can do the following in your pre-copy script:
delete from TableName where PersonID in (#{activity('MyLookUp').output.firstRow.PersonIDs})
This way you will be deleting all the deleted or updated rows before inserting the new ones.
Hope this helped!
In the meanwhile the Azure Data Factory provides the meta-data driven copy task. After going through the dialogue driven setup, a metadata table is created, which has one row for each dataset to be synchronized. I solved this UPSERT problem by adding a stored procedure as well as a table type for each dataset to be synchronized. Then I added the relevant information in the metadata table for each row like this
{
"preCopyScript": null,
"tableOption": "autoCreate",
"storedProcedure": "schemaname.UPSERT_SHOP_SP",
"tableType": "schemaname.TABLE_TYPE_SHOP",
"tableTypeParameterName": "shops"
}
After that you need to adapt the sink properties of the copy task like this (stored procedure, table type, table type parameter name):
#json(item().CopySinkSettings).storedProcedure
#json(item().CopySinkSettings).tableType
#json(item().CopySinkSettings).tableTypeParameterName
If the destination table does not exist, you need to run the whole task once before adding the above variables, because auto-create of tables works only as long as no stored procedure is given in the sink properties.
I need to move data between two databases and wanted to see if SSIS would be a good tool. I've pieced together the following solution, but it is much more complex than I was hoping it would be - any insight on a better approach to tackling this problem would be greatly appreciated!
So what makes my situation unique; we have a large volume of data, so to keep the system performant we have split our customers into multiple database servers. These servers have databases with the same schema, but are each populated with unique data. Occasionally we have the need to move a customer's data from one server to another. Because of this, simple recreating the tables and moving the data in place won't work as in the database on server A there could be 20 records, but there could be 30 records in the same table for the database on server B. So when moving record 20 from A to B, it will need to be assigned ID 31. Getting past this wasn't difficult, but the trouble comes when needing to move the tables which have a foreign key reference to what is now record 31....
An example:
Here's a sample schema for a simple example:
There is a table to track manufacturers, and a table to track products which each reference a manufacturer.
Example of data in the source database:
To handle moving this data while maintaining relational integrity, I've taken the approach of gathering the manufacturer records, looping through them, and for each manufacturer moving the associated products. Here's a high level look at the Control Flow in SSDT:
The first Data Flow grabs the records from the source database and pulls them into a Recordset Destination:
The OLE DB Source pulls from the source databases manufacturer table while pulling all columns, and places it into a record set:
Back in the control flow, I then loop through the records in the Manufacturer recordset:
For each record in the manufacturer recordset I then execute a SQL task which determines what the next available auto-incrementing ID will be in the destination database, inserts the record, and then returns the results of a SELECT MAX(ManufacturerID) in the Execute SQL Task result set so that the newly created Manufacturer ID can be used when inserting the related products into the destination database:
The above works, however once you get more than a few layers deep of tables that reference one another, this is no longer very tenable. Is there a better way to do this?
You could always try this:
Populate you manufacturers table.
Get your products data (ensure you have a reference such as name etc. to manufacturer)
Use a lookup to get the ID where your name or whatever you choose matches.
Insert into database.
This will keep your FK constraints and not require you to do all that max key selection.
I have a linked table in my access database (dbo_Billing_denied (DSN=WTSTSQL05_BB;DATABASE=DEPTFINANCE), etc.) and I want to create a table that will store the data from this linked into local table, so I can use it to run other queries. Currently I can use this because it tells me that it can not make connection (ODB--connection to 'WTSTSQL05_BB' failed.
Do I have to create a table first and assign all the fields before I can do this (create a table and fields that are same as what's in the linked table and than create append query to do this...)?
It sounds like you might have two problems. I will address the second one. You will need to reestablish connection to the linked table before this will work.
You can use a "make table query" in Access to make a local copy of the linked table. You can use the GUI for this, or you can structure the SQL something like this:
SELECT <list of various fields, or * for all fields>
INTO <name of new local table>
FROM <name of linked table(s) on the server>
WHERE <any other conditions you want to put on which records are included>;
I mentioned that there might be more than one table. You can also do this with joined tables or unions etc. The "where" clause is optional. Removing it will copy the entire data set.
You will get a warning when you try to execute this query in Access. It will tell you that you are about to write (or overwrite) a table. If you are trying to write a cleaner application with fewer nuisance messages for the end user, call this query from a macro. The macro would need to turn the warnings off, execute the query, then turn the warnings back on.
Microsoft Access does not require you to create this table before you write it; if the table does not exist Access will create this table for you, based on the field definitions in the source data. If a table of the same name does exist, Access will drop this table from your database and then create a new table of that name.
This also implies that the local table you are generating will need a unique name. If your query tries to overwrite the linked table by using the same name, the first thing Access will do is drop the linked table. It will then look for field definitions and input data in the linked table that it just dropped.
Since the new local table will have a new name, queries developed for the linked table will not work with the new local table. One possible work-around would be to rename the linked table in your local Access database. The table name in Access does not need to equal the name in the database it's linking to. The query could then write to a table with the correct name, and previous queries should work. Still, keep in mind that these queries would no longer be working on live data.
I want to write an enterprise software and now I'm in the DB design phase. The software will have some master data such as Suppliers, Customers, Inventories, Bankers...
I considering 2 options:
Put each of these on one separate table. The advantage: the table will have all necessary information for that kind of master file (Customer: name, address,.../Inventory: Type, Manufacturer, Condition...). Disadvantage: Not flexible. When I want to have a new type of master data, such as Insurer, I have to design another table.
Put all in one table and this table have foreign key to another table which have type of each kind of master data (table 1: id, data_type, code, name, address....; table 2: data_type, data_type_name). Advantage: flexible - if I want more master data such as Insurer, I just put in table 2: code: 002, name: Insurer, and then put detail each insurer into table 1). Disadvantage: table 1 must have sufficient field to store all kind of information including: customer name, address, account, inventory's manufacturer, inventory's quality...).
So which method do you usually do (or you think work better).
Thank you very much
I would advise creating separate tables for each entity type - it will be a lot easier to maintain in the future when you discover things you want to add for one entity type that don't apply to the others. If all of the entities (Suppliers, Customers, etc) are going to have the same fields and the only difference is their type then you could theoretically use one table. However, I would expect that there would be enough differences between the entity types that it would be worth creating separate tables for each. If there are several fields in common (e.g. address information) you could create a table for the common elements and have a foreign key in the individual tables to the table with the common data (e.g. AddressID).
logically, each "master" entity should be in its own table
if you don't, you'll find joins will become very painful, and your generic lookup table will accumulate all kinds of useless fields