SQL Normalizing array of tables into multiple new tables - sql-server

I have a database with 51 tables all with the same schema (one table per state). Each table has a couple million rows and about 50 columns.
I've normalized the columns into 6 other tables, and now I want to import all of the data from those 51 tables into the 6 new tables. The column names are all the same, and so I'm hoping I can automate the process of importing all the data.
I'm assuming what I'll need to do is:
Select the names of all the lists that have the raw schema
SELECT TABLE_NAME
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'raw'
Iterate over all the results
Grab all rows from that table, and SELECT INTO the appropriate cols into the appropriate tables
Delete row from raw table
Is there anything I'm missing? Also, is there any way to have this run on the SQL Server so I don't have to have my SQL Server Management Studio open the whole time?

Yes, obviously, you can automate it with t-sql. But I recommened you to use SSIS in this case. As you say, structure of all tables are the same than you can make some ETL process and then you just change table name in the source. Consecuently, you will have the folowwing advantages:
Solve issue with couple of clicks
Low risk of errors
You will able to use the number of data transformations

Related

Synchronize table between two different databases

Once a day I have to synchronize table between two databases.
Source: Microsoft SQL Server
Destination: PostgreSQL
Table contains up to 30 million rows.
For the first time i will copy all table, but then for effectiveness my plan is to insert/update only changed rows.
In this way if I delete row from source database, it will not be deleted from the destination database.
The problem is that I don’t know which rows were deleted from the source database.
My dirty thoughts right now tend to use binary search - to compare the sum of the rows on each side and thus catch the deleted rows.
I’m at a dead end - please share your thoughts on this...
In SQL Server you can enable Change Tracking to track which rows are Inserted, Updated, or Deleted since the last time you synchronized the tables.
with TDS FDWs (Foreign Data Wrapper), map the source table with a temp table in pg, an use a join to find/exclude the rows that you need.

SSIS - Looking Up Records from Different Databases

I have a source table in a Sybase database (ORDERS) and a Source table in an MSSQL Database (DOCUMENTS). I need to query the Sybase database and for each row found in the ORDERS table get the matching row(s) by order number from the DOCUMENTS table.
I originally wrote the SSIS package using a lookup transformation, simple, except that it could be a one-to-many relationship, where 1 order number will exist in the ORDERS table but more than 1 documents could exist in the DOCUMENTS table. The SSIS lookup will only match on first.
My 2nd attempt will be to stage the rows from the ORDERS table into a staging table in MSSQL and then loop through the rows in this table using a FOR EACH LOOP CONTAINER and get the matching rows from the DOCUMENTS table, inserting the DOCUMENTS rows into another staging table. After all rows from ORDERS have been processed I will write a query to join the two staging tables to give me my result. A concern with this method is that I will be opening and closing the DOCUMENTS database connection many times, which will not be very efficient (although there will probably be less than 200 records).
Or could you let me know of any other way of doing this?

SQL Server: adding rows/tables with the same columns

I have two tables in SQL Server and both of those tables have the same headers, which means its the same columns, but since I added them from Excel, it means that I was not able to import them as one table, since it is more then 1 million rows.
So now I have one table with a bit less than a million rows and one with like 400000 rows and actually it should be one table, but Excel only allows around one million.
I have them both imported into SQL Server and actually I really want them to be both in one table like union.
The question is how to do it.
I just want to put one of them below the other since it is exactly the same column header.
What you should have done was import the first sheet, and create the table at the same time, and then import the second sheet into the existing table, in a separate import process. Or, if you were using SSIS, you could have used a Union Data Transformation to "combine" the 2 datasets into one, and then insert all the data into a single table.
You can, however, easily get the data into one table. Assuming you want to retain Table1 and that Table1 and Table2 do indeed have the same definitions (and don't have IDENTITY columns) you can just do the following:
INSERT INTO dbo.Table1
SELECT *
FROM dbo.Table2;
DROP TABLE dbo.Table2;
Now all your data is in one table, Table1.

Insert data into just one column from csv

Somehow, one of the columns in one of my tables in my database now shows all NULL values throughout all 1000 rows. If I have a CSV file of the data is it possible to insert the data of just that one column into the database without disturbing the existing information without large amounts of queries?
If it was me I'd probably just upload (use the Import/Export wizard) your csv to a new table then just join between the old table and the new (import) table and run an update - very simple.

How to create a 'sanitized' copy of our SQL Server database?

We're a manufacturing company, and we've hired a couple of data scientists to look for patterns and correlation in our manufacturing data. We want to give them a copy of our reporting database (SQL 2014), but it must be in a 'sanitized' form. This means that all table names get converted to 'Table1', 'Table2' etc., and column names in each table become 'Column1', 'Column2' etc. There will be roughly 100 tables, some having 30+ columns, and some tables have 2B+ rows.
I know there is a hard way to do this. This would be to manually create each table, with the sanitized table name and column names, and then use something like SSIS to bulk insert the rows from one table to another. This would be rather time consuming and tedious because of the manual SSIS column mapping required, and manual setup of each table.
I'm hoping someone has done something like this before and has a much faster, more efficienct, way.
By the way, the 'sanitized' database will have no indexes or foreign keys. Also, it may seem to make any sense why we would want to do this, but this is what was agreed to by our Director of Manufacturing and the data scientists, as the first round of analysis which will involve many rounds.
You basically want to scrub the data and objects, correct? Here is what I would do.
Restore a backup of the db.
Drop all objects not needed (indexes, constraints, stored procedures, views, functions, triggers, etc.)
Create a table with two columns, populate the table, each row has orig table name and new table name
Write a script that iterates through the table, roe by row, and renames your tables. Better yet, put the data into excel, and create a third column that builds the tsql you want to build, then cut/paste and execute in ssms.
Repeat step 4, but for all columns. Best to query sys.columns to get all the objects you need, put to excel, and build your tsql
Repeat again for any other objects needed.
Backip/restore will be quicker than dabbling in SSIS and data transfer.
They can see the data but they can't see the column names? What can that possibly accomplish? What are you protecting by not revealing the table or column names? How is a data scientist supposed to evaluate data without context? Without a FK all I see is a bunch of numbers on a column named colx. What are expecting to accomplish? Get a confidentially agreement. Consider a FK columns customerID verses a materialID. Patterns have widely different meanings and analysis. I would correlate a quality measure with materialID or shiftID but not with a customerID.
Oh look there is correlation between tableA.colB and tableX.colY. Well yes that customer is college team and they use aluminum bats.
On top of that you strip indexes (on tables with 2B+ rows) so the analysis they run will be slow. What does that accomplish?
As for the question as stated do a back up restore. Using system table drop all triggers, FK, index, and constraints. Don't forget to drop the triggers and constraints - that may disclose some trade secret. Then rename columns and then tables.

Resources