Way to import file using insert_update mode in SQL Server - sql-server

In db2, data from file can be imported with 'import' providing insert_update mode to do insert if record doesn't exist and update if exist.
Is there a way to import/load data from a file into a table such that records from file are inserted if they do not exist and updated if they do exist.
The only way I could figure out is to use bulk load with merge through intermediate/temporary table and then use that table to insert-update into target table.
With this approach there may be performance issue as all data is first loaded into temporary table. Please advise if there is way to do this without creating temporary table.

You could use SSIS.
In your data flow you would perform a lookup to see if record exists already, if so send it down an update code path (which will probably involve using a staging update then joing the two). If it doesn't exist perform an insert.
Along the lines of https://social.msdn.microsoft.com/forums/sqlserver/en-US/9e14507d-2a30-403b-98f5-a6d2468b384e/update-else-insert-ssis-record

Related

How to insert data into destination table using flat file source in SSIS

I have SSIS package, in which flow is -
Get the data from flat file source and insert it into staging table.
Use the staging table data for transformation using select and where clause and then insert filtered data in destination. table.
For 1st point, I have taken Data flow task to get the data from source and insert data into staging table. For 2nd point, I am confused, how should I do it. I am using Execute SQL task to run Select-Where query but the not getting how will I insert that query result into destination table. Which SSIS component should I use here. Or shall I change the entire flow for better performance. Kindly suggest. Thanks in advance.
You are on the right track. Mostly, for a simple data import, I use this flow.
Let's say we have a destination table named FiscalYear.
The first thing I would do is create the staging table. If it exists, I drop it and recreate the table.
The next step is, using the data flow, to stage the file to the staging table.
For the last step, using Execute SQL task, and SQL-server Merge query, I insert or update the data. But to insert or update the data, you may have to have a unique identifier for each row that is in the file. This unique identifier is going to help you from inserting duplicates in case you run the package more than once.
This row unique identifier can be a single column or a combination of columns. In my case, I usually have a column named rowguid of type uniqueidentifier

Remove duplicates from a SQL server rows using DISTINCT

I need to remove SQL server duplicated rows when importing file into database with distinct method.
HallGroup is my table in database. I'm using this
Sql procedure:
SELECT DISTINCT * INTO tempdb.dbo.tmpTable
FROM HallGroup
DELETE FROM HallGroup
INSERT INTO HallGroup SELECT * FROM tempdb.dbo.tmpTable
DROP TABLE tempdb.dbo.tmpTable
With this procedure works fine duplicated rows are deleted, but the problem is when i try to import again data to SQL server rows are still duplicating. What i'm missing, So any hint?
How to remove SQL server duplicated rows properly when importing file into database with distinct method?
I am just getting back into SQL after being out for a bit but I would not have solved your problem in that way that you are trying (not that I completely understand why you are doing it that way) as I believe (even if it were working correctly) over time your process will take longer each time you do it as the size of the table increases.
It would be much more efficient if you inserted the new data based on the absence of a key (you indicate you are already using a stored proc). If you don't have a key to use (which very recently happened to me), make one. I just solved a similar problem to yours whereas I am importing data into a table from an external source and wanted to eliminate the possibility of duplicates. In my case, I associate name of the external source datafile (is distinct by dataset to import) with the data to be imported and use that to ensure I am not re-importing already imported data. I load the external data into a table using a dtsx and then run a stored proc to merge that data with an existing table. This gives me the added advantage of having a audit trail of where each record came from.
Hope this helps.

Inserting data into the newly created tables from table variable SSIS and not one single table

I have been searching for about a week now and I was wondering if anyone may have a clue. I wrote a package to do the following:
loop through a parent folder and its subfolders for a csv with a particular naming structure (works)
Create a table for each .csv based on the enumeration of each file (works).
Import the data into sql server in their own tables with the file name that was created as the table name and not OLE DB Destination (which does not work). It works if it there is destination folder for everything, but when I use table variable that does not work.
What I did was add an Execute SQL task to the for each container to create a table with a variable for the file path that is mapped as an expression in the for each container in a create table query under property sqlstatementsource expression. The tables are created, but when I use the variable that was mapped for the for each loop as the table name or variable in OLE DB Destination I get an error asking for me to check if the table exists. The tables are created, but I cannot get the insertion of the data into their own tables. Even when I bypass the error of "Destination table has not been provided" and run the package. I set delayValidation as true and still nothing. SSIS from what I have seen so far does some cool things. However, I am stuck right now. What else am I doing wrong?
I forgot to mention that the data is going to sql server.
Thanks for everything.
You can't create an OLEDB Destination at design time with a variable for a table name. The OLEDB destination needs to know the table name, and the columns, so that it can pre-map the data flow to the table columns.
You have a couple of other options:
You can use BiML to dynamically create your dataflows and destinations.
You can use an ExecuteSQL Transformation as your dataflow destination, and write a dynamic SQL statement that inserts each row in the dataflow to the desired table.

Export large amounts of binary data from one SQL database and import it into another database of the same schema

I have one database with an image table that contains just over 37,000 records. Each record contains an image in the form of binary data. I need to get all of those 37,000 records into another database containing the same table and schema that has about 12,500 records. I need to insert these images into the database with an IF NOT EXISTS approach to make sure that there are no duplicates when I am done.
I tried exporting the data into excel and format it into a script. (I have doe this before with other tables.) The thing is, excel does not support binary data.
I also tried the "generate scripts" wizard in SSMS which did not work because the .sql file was well over 18GB and my PC could not handle it.
Is there some other SQL tool to be able to do this? I have Googled for hours but to no avail. Thanks for your help!
I have used SQL Workbench/J for this.
You can either use WbExport and WbImport through text files (the binary data will be written as separate files and the text file contains the filename).
Or you can use WbCopy to copy the data directly without intermediate files.
To achieve your "if not exists" approache you could use the update/insert mode, although that would change existing row.
I don't think there is a "insert only if it does not exist mode", but you should be able to achieve this by defining a unique index and ignore errors (although that wouldn't be really fast, but should be OK for that small number of rows).
If the "exists" check is more complicated, you could copy the data into a staging table in the target database, and then use SQL to merge that into the real table.
Why don't you try the 'Export data' feature? This should work.
Right click on the source database, select 'Tasks' and then 'Export data'. Then follow the instructions. You can also save the settings and execute the task on a regular basis.
Also, the bcp.exe utility could work to read data from one database and insert into another.
However, I would recommend using the first method.
Update: In order to avoid duplicates you have to be able to compare images. Unfortunately, you cannot compare images directly. But you could cast them to varbinary(max) for comparison.
So here's my advice:
1. Copy the table to the new database under the name tmp_images
2. use the merge command to insert new images only.
INSERT INTO DB1.dbo.table_name
SELECT * FROM DB2.dbo.table_name
WHERE column_name NOT IN
(
SELECT column_name FROM DB1.dbo.table_name
)

PostgreSQL COPY FROM Command Help

I have CSV File which is quite large (few hundred MBs) which I am trying to import into Postgres Table, problem arise when there, is some primary key violation (duplicate record in CSV File)
If it has been one I could manually filter out those records, but these files are generated by a program which generate such data every hour. My script has to automatically import it to database.
My question is: Is there some way out that I can set a flag in COPY command or in Postgres so It can skip the duplicate records and continue importing file to table?
My thought would be to approach this in two ways:
Use a utility that can help create an "exception report" of duplicate rows, such as this one during the COPY process.
Change your workflow by loading the data into a temp table first, massaging it for duplicates (maybe JOIN with your target table and mark all existing in the temp with a dup flag), and then only import the missing records and send the dups to an exception table.
I personally prefer the second approach, but that's a matter of specific workflow in your case.

Resources