How to use BULK INSERT to move data between two tables? - sql-server

How to use BULK INSERT to move data between two tables? I can't find a tutorial with examples. I have 10 mln records to move. SQL Server 2012 SP3. It is one time action. I can't use cmdshell, I can't use SSIS. I have to move data in batches. It is going to be a night job. I say "move" but I don't have to delete records in source. Target table exists already, I don't have to check some constraints and NO foreign keys.

It is very easy to shift data from one database to another, as long as both are located in the same server (as you wrote it).
As you want to use a script and you want to copy records from one table to another you might use this:
INSERT INTO TargetDB.dbo.TableName(col1,col2,col3...)
SELECT col1,col2, col3 ... FROM SourceDB.dbo.TableName
This would copy all rows from here to there.
In your question you do not provide enough information, but you want to use a script. The above is a script...
If you have existing data you should read about MERGE
If Target and Soure do not have the same structure, you can easily adapt the SELECT to return exactly the set you need to insert
If you do not need to copy all rows, just add a WHERE clause
If the user you took to connect has not the necessary rights ask the admin. But - from you question or comment - I take, that this will be applied by an admin anyway...
If the databases live in two different servers, you might read about linked server
And finally you might read about import and export, which is widely supported by SSMS...

Related

How do I compare SQL Server databases and update another?

I am currently dealing with a scenario of whereby I have 2 SQL databases and I need to compare the changes in data in each of the tables in the 2 databases. The databases have the same exact tables between them however may have differing data in each of them. The first hurdle I have is how do I compare the data sets and the second challenge I have is once I have identified the differences how do I say for example update data missing in database 2 that may have existed in 1.
Why not use SQL Data Compare instead of re-inventing the wheel? It does exactly what you're asking - compares two databases and writes scripts that will sync in either direction. I work for a vendor that competes with some of their tools and it's still the one I would recommend.
http://www.red-gate.com/products/sql-development/sql-data-compare/
One powerful command for comparing data is EXCEPT. With this, you can compare two tables with same structure simply by doing the following:
SELECT * FROM Database1.dbo.Table
EXCEPT
SELECT * FROM Database2.dbo.Table
This will give you all the rows that exist in Database1 but not in Database2, including rows that exist in both but are different (because it compares every column). Then you can reverse the order of the queries to check the other direction.
Once you have identified the differences, you can use INSERT or UPDATE to transfer the changes from one side to the other. For example, assuming you have a primary key field PK, and new rows only come into Database2, you might do something like:
INSERT INTO Database1.dbo.Table
SELECT T2.*
FROM Database2.dbo.Table T2
LEFT JOIN Database1.dbo.Table T1 on T2.PK = T1.PK
WHERE T1.PK IS NULL -- insert rows that didn't match, i.e., are new
The actual methods used depend on how the two tables are related together, how you can identify matching rows, and what the sources of changes might be.
You also can look at the Data Compare feature in Visual Studio 2010 (Premium and higher)? I use it to make sure configuration tables in all my environments ( i.e. development, test and production ) are in sync. It made my life enormously easier.
You can select tables you want to compare, you can choose columns to compare. What I haven't learned to do though is how to save my selections for the future use.
You can do this with SQL Compare which is a great too for development but if you want to do this on a scheduled basis a better solution might be Simego's Data Sync Studio. I know it can do about 100m (30 cols wide) row compare on 16GB on an i3 iMac (bootcamp). In reality it is comfortable with 1m -> 20m rows on each side. It uses a column storage engine.
In this scenario it would only take a couple of minutes to download, install and test the scenario.
I hope this helps as I always look for the question mark to work out what someone is asking.

TSQL copy one database to another

I have two identical databases lets call them A & B, what I want to do is to have two copy options:
1- Copy the whole database and overwrite everything from A to B with TSQL
2- With TSQL I want to loop over each table row in A and check the last modified date filed if is greater then last modified date in B table row copy and overwrite the whole row.
Please let me now if something is not clear, any help is very appreciated and thanks in advance.
Using Visual Studio 2010 Premium or Ultimate you can do a schema compare and create the new database on the server you need to. Getting the data over there is another matter entirely and is something you can perform with SSIS or linked server queries, especially considering you're simply pumping data from one side to another.
See this post for schema compare tools:
How can I find the differences between two databases?
Something we've recently complete is using the CHECKSUM and Table Switching to bring in new data, keep the old and compare both sets.
You would basically set up three tables for each doing the switch, one for production, one for _staging and one that's an _old table that's an exact schema match. Three tables are needed since when you perform the switch (which is instantaneous, you need to switch into an empty table). You would pump the data into the staging table and execute a switch to change the pointer for the table definition from production to _old and from _staging to production. AND if you have the checksum calculated on each row, you can company and see what's new or different.
It's worked out well with our use of it, but it's a fair amount of prep work to make it happen, which could be prohibitive depending on your environment.
http://msdn.microsoft.com/en-us/library/ms190273.aspx
http://technet.microsoft.com/en-us/library/ms191160.aspx
http://msdn.microsoft.com/en-us/library/ms189788.aspx
Here's a sample query that we ran on GoDaddy.com's sql web editor:
CREATE TABLE TestTable (Val1 INT NOT NULL)
CREATE TABLE TestTable2 (Val1 INT NOT NULL)
INSERT INTO TestTable (Val1) VALUES (4)
SELECT * FROM TestTable
ALTER TABLE TestTable SWITCH TO TestTable2
SELECT * FROM TestTable2

Large Data Service Architecture

Everyday a company drops a text file with potentially many records (350,000) onto our secure FTP. We've created a windows service that runs early in the AM to read in the text file into our SQL Server 2005 DB tables. We don't do a BULK Insert because the data is relational and we need to check it against what's already in our DB to make sure the data remains normalized and consistent.
The problem with this is that the service can take a very long time (hours). This is problematic because it is inserting and updating into tables that constantly need to be queried and scanned by our application which could affect the performance of the DB and the application.
One solution we've thought of is to run the service on a separate DB with the same tables as our live DB. When the service is finished we can do a BCP into the live DB so it mirrors all of the new records created by the service.
I've never worked with handling millions of records in a DB before and I'm not sure what a standard approach to something like this is. Is this an appropriate way of doing this sort of thing? Any suggestions?
One mechanism I've seen is to insert the values into a temporary table - with the same schema as the target table. Null IDs signify new records and populated IDs signify updated records. Then use the SQL Merge command to merge it into the main table. Merge will perform better than individual inserts/updates.
Doing it individually, you will incur maintenance of the indexes on the table - can be costly if its tuned for selects. I believe with merge its a bulk action.
It's touched upon here:
What's a good alternative to firing a stored procedure 368 times to update the database?
There are MSDN articles about SQL merging, so Googling will help you there.
Update: turns out you cannot merge (you can in 2008). Your idea of having another database is usually handled by SQL replication. Again I've seen in production a copy of the current database used to perform a long running action (reporting and aggregation of data in this instance), however this wasn't merged back in. I don't know what merging capabilities are available in SQL Replication - but it would be a good place to look.
Either that, or resolve the reason why you cannot bulk insert/update.
Update 2: as mentioned in the comments, you could stick with the temporary table idea to get the data into the database, and then insert/update join onto this table to populate your main table. The difference is now that SQL is working with a set so can tune any index rebuilds accordingly - should be faster, even with the joining.
Update 3: you could possibly remove the data checking from the insert process and move it to the service. If you can stop inserts into your table while this happens, then this will allow you to solve the issue stopping you from bulk inserting (ie, you are checking for duplicates based on column values, as you don't yet have the luxury of an ID). Alternatively with the temporary table idea, you can add a WHERE condition to first see if the row exists in the database, something like:
INSERT INTO MyTable (val1, val2, val3)
SELECT val1, val2, val3 FROM #Tempo
WHERE NOT EXISTS
(
SELECT *
FROM MyTable t
WHERE t.val1 = val1 AND t.val2 = val2 AND t.val3 = val3
)
We do much larger imports than that all the time. Create an SSIS pacakge to do the work. Personally I prefer to create a staging table, clean it up, and then do the update or import. But SSIS can do all the cleaning in memory if you want before inserting.
Before you start mirroring and replicating data, which is complicated and expensive, it would be worthwhile to check your existing service to make sure it is performing efficiently.
Maybe there are table scans you can get rid of by adding an index, or lookup queries you can get rid of by doing smart error handling? Analyze your execution plans for the queries that your service performs and optimize those.

What is a good approach to preloading data?

Are there best practices out there for loading data into a database, to be used with a new installation of an application? For example, for application foo to run, it needs some basic data before it can even be started. I've used a couple options in the past:
TSQL for every row that needs to be preloaded:
IF NOT EXISTS (SELECT * FROM Master.Site WHERE Name = #SiteName)
INSERT INTO [Master].[Site] ([EnterpriseID], [Name], [LastModifiedTime], [LastModifiedUser])
VALUES (#EnterpriseId, #SiteName, GETDATE(), #LastModifiedUser)
Another option is a spreadsheet. Each tab represents a table, and data is entered into the spreadsheet as we realize we need it. Then, a program can read this spreadsheet and populate the DB.
There are complicating factors, including the relationships between tables. So, it's not as simple as loading tables by themselves. For example, if we create Security.Member rows, then we want to add those members to Security.Role, we need a way of maintaining that relationship.
Another factor is that not all databases will be missing this data. Some locations will already have most of the data, and others (that may be new locations around the world), will start from scratch.
Any ideas are appreciated.
If it's not a lot of data, the bare initialization of configuration data - we typically script it with any database creation/modification.
With scripts you have a lot of control, so you can insert only missing rows, remove rows which are known to be obsolete, not override certain columns which have been customized, etc.
If it's a lot of data, then you probably want to have an external file(s) - I would avoid a spreadsheet, and use a plain text file(s) instead (BULK INSERT). You could load this into a staging area and still use techniques like you might use in a script to ensure you don't clobber any special customization in the destination. And because it's under script control, you've got control of the order of operations to ensure referential integrity.
I'd recommend a combination of the 2 approaches indicated by Cade's answer.
Step 1. Load all the needed data into temp tables (on Sybase, for example, load data for table "db1..table1" into "temp..db1_table1"). In order to be able to handle large datasets, use bulk copy mechanism (whichever one your DB server supports) without writing to transaction log.
Step 2. Run a script which as a main step will iterate over each table to be loaded, if needed create indexes on newly created temp table, compare the data in temp table to main table, and insert/update/delete differences. Then as needed the script can do auxillary tasks like the security role setup you mentioned.

Changing the column order/adding new column for existing table in SQL Server 2008

I have situation where I need to change the order of the columns/adding new columns for existing Table in SQL Server 2008. It is not allowing me to do without drop and recreate. But that is in production system and having data in that table. I can take backup of the data, and drop the existing table and change the order/add new columns and recreate it, insert the backup data into new table.
Is there any best way to do this without dropping and recreating. I think SQL Server 2005 will allow this process without dropping and recreating while changing to existing table structure.
Thanks
You can't really change the column order in a SQL Server 2008 table - it's also largely irrelevant (at least it should be, in the relational model).
With the visual designer in SQL Server Management Studio, as soon as you make too big a change, the only reliable way to do this for SSMS is to re-create the table in the new format, copy the data over, and then drop the old table. There's really nothing you can do about this to change it.
What you can do at all times is add new columns to a table or drop existing columns from a table using SQL DDL statements:
ALTER TABLE dbo.YourTable
ADD NewColumn INT NOT NULL ........
ALTER TABLE dbo.YourTable
DROP COLUMN OldColumn
That'll work, but you won't be able to influence the column order. But again: for your normal operations, column order in a table is totally irrelevant - it's at best a cosmetic issue on your printouts or diagrams..... so why are you so fixated on a specific column order??
There is a way to do it by updating SQL server system table:
1) Connect to SQL server in DAC mode
2) Run queries that will update columns order:
update syscolumns
set colorder = 3
where name='column2'
But this way is not reccomended, because you can destroy something in DB.
One possibility would be to not bother about reordering the columns in the table and simply modify it by add the columns. Then, create a view which has the columns in the order you want -- assuming that the order is truly important. The view can be easily changed to reflect any ordering that you want. Since I can't imagine that the order would be important for programmatic applications, the view should suffice for those manual queries where it might be important.
As the other posters have said, there is no way without re-writing the table (but SSMS will generate scripts which do that for you).
If you are still in design/development, I certainly advise making the column order logical - nothing worse than having a newly added column become part of a multi-column primary key and having it no where near the other columns! But you'll have to re-create the table.
One time I used a 3rd party system which always sorted their columns in alphabetical order. This was great for finding columns in their system, but whenever they revved their software, our procedures and views became invalid. This was in an older version of SQL Server, though. I think since 2000, I haven't seen much problem with incorrect column order. When Access used to link to SQL tables, I believe it locked in the column definitions at time of table linking, which obviously has problems with almost any table definition changes.
I think the simplest way would be re-create the table the way you want it with a different name and then copy the data over from the existing table, drop it, and re-name the new table.
Would it perhaps be possible to script the table with all its data.
Do an edit on the script file in something like notepad++
Thus recreating the table with the new columns but the same.
Just a suggestion, but it might take a while to accomplish this.
Unless you write yourself a small little c# application that can work with the file and apply rules to it.
If only notepadd++ supported a find and move operation

Resources