I have CSV File which is quite large (few hundred MBs) which I am trying to import into Postgres Table, problem arise when there, is some primary key violation (duplicate record in CSV File)
If it has been one I could manually filter out those records, but these files are generated by a program which generate such data every hour. My script has to automatically import it to database.
My question is: Is there some way out that I can set a flag in COPY command or in Postgres so It can skip the duplicate records and continue importing file to table?
My thought would be to approach this in two ways:
Use a utility that can help create an "exception report" of duplicate rows, such as this one during the COPY process.
Change your workflow by loading the data into a temp table first, massaging it for duplicates (maybe JOIN with your target table and mark all existing in the temp with a dup flag), and then only import the missing records and send the dups to an exception table.
I personally prefer the second approach, but that's a matter of specific workflow in your case.
Related
The process is one where I would get 28 fixed width files and combine it into one table. In the past, this was done via FoxPro. As I have learned today, there were duplicates for which FoxPro did not reject or have any issues with. I have discovered that I need to write a merge statement in order to import the 28 and not get tripped up by duplicate primary key errors when I try to import each one separately using the Import Wizard.
I use Management Studio with a SQL Server Express front end and therefore can't create SSIS packages.
I am going to break this up into two questions so as to not make this too convoluted. First, I have since converted the fixed width files into tab-delimited text files by using Excel.
First question: Can one construct a merge statement that brings the files (tab-delimited) into SQL Server from the C drive? I could import each using the import wizard but that is cumbersome. I know how to write a merge statement but it demands that the data already exist in SQL Server. Below is an example. The question is how would I bring it in from outside.
Merge Industry as TARGET
Using Table1 as SOURCE
On (TARGET.Primary keys 1-9 = SOURCE.Primary keys 1-9)
No, you can't import data during or as part of a MERGE statement. The MERGE operation is purely for the 'upsert' situation; constructing logic on combining two result sets with criteria for matches and mismatches.
To get data into SQL Server you can either work via the UI (which is pretty boring and error prone when you have 28 files), or you can use some of the built in commands such as BULK INSERT.
Perhaps you could BULK INSERT the files one by one, and merge after each import.
If you wanted to continue using Foxpro but eliminate the duplicate records the first piece of advice would be to quit using the Import Wizard.
Wizards may be convenient to use, but they come with their own set of 'baggage' which can be problematic.
Aside from saying that they are in fixed field length format, you don't indicate which format(s) the 28 import files are in (CSV, SDF, TXT, ect.). Regardless you can farily easily write Foxpro code to handle all of the importing without the use of a 'Wizard'.
Then once all of the records have been imported you can readily eliminate the duplicates with something like the following:
SELECT ImportDBF && Assuming it is used EXCLUSIVELY
DELETE ALL
INDEX ON <primary key> UNIQUE TAG Uniq && Create an Index on only UNIQUE instances of your Primary key field
RECALL ALL && Recall only those UNIQUE records
DELETE TAG Uniq && Eliminate the temporary Index
PACK && PACK out the duplicate records
Now your Foxpro data table should be ready to go.
Good Luck
In db2, data from file can be imported with 'import' providing insert_update mode to do insert if record doesn't exist and update if exist.
Is there a way to import/load data from a file into a table such that records from file are inserted if they do not exist and updated if they do exist.
The only way I could figure out is to use bulk load with merge through intermediate/temporary table and then use that table to insert-update into target table.
With this approach there may be performance issue as all data is first loaded into temporary table. Please advise if there is way to do this without creating temporary table.
You could use SSIS.
In your data flow you would perform a lookup to see if record exists already, if so send it down an update code path (which will probably involve using a staging update then joing the two). If it doesn't exist perform an insert.
Along the lines of https://social.msdn.microsoft.com/forums/sqlserver/en-US/9e14507d-2a30-403b-98f5-a6d2468b384e/update-else-insert-ssis-record
I have one database with an image table that contains just over 37,000 records. Each record contains an image in the form of binary data. I need to get all of those 37,000 records into another database containing the same table and schema that has about 12,500 records. I need to insert these images into the database with an IF NOT EXISTS approach to make sure that there are no duplicates when I am done.
I tried exporting the data into excel and format it into a script. (I have doe this before with other tables.) The thing is, excel does not support binary data.
I also tried the "generate scripts" wizard in SSMS which did not work because the .sql file was well over 18GB and my PC could not handle it.
Is there some other SQL tool to be able to do this? I have Googled for hours but to no avail. Thanks for your help!
I have used SQL Workbench/J for this.
You can either use WbExport and WbImport through text files (the binary data will be written as separate files and the text file contains the filename).
Or you can use WbCopy to copy the data directly without intermediate files.
To achieve your "if not exists" approache you could use the update/insert mode, although that would change existing row.
I don't think there is a "insert only if it does not exist mode", but you should be able to achieve this by defining a unique index and ignore errors (although that wouldn't be really fast, but should be OK for that small number of rows).
If the "exists" check is more complicated, you could copy the data into a staging table in the target database, and then use SQL to merge that into the real table.
Why don't you try the 'Export data' feature? This should work.
Right click on the source database, select 'Tasks' and then 'Export data'. Then follow the instructions. You can also save the settings and execute the task on a regular basis.
Also, the bcp.exe utility could work to read data from one database and insert into another.
However, I would recommend using the first method.
Update: In order to avoid duplicates you have to be able to compare images. Unfortunately, you cannot compare images directly. But you could cast them to varbinary(max) for comparison.
So here's my advice:
1. Copy the table to the new database under the name tmp_images
2. use the merge command to insert new images only.
INSERT INTO DB1.dbo.table_name
SELECT * FROM DB2.dbo.table_name
WHERE column_name NOT IN
(
SELECT column_name FROM DB1.dbo.table_name
)
I have a situation. While we upgraded our server drives, we took a backup of our DB and then imported it again and then site was live, its after a weeks we realize that import did not copy entire backup from dump file as when we checked it was only 13 GB which it suppose to be 60 GB. The interesting thing is, there is one big table which we figured out is copied in a funny way, it is huge and its just copied few initial records, say like 2000-5000 and it copied last records say 400000-500000 and there is no records in between. Isn't that weird? cause while we imported we checked initial and last entries and thought it copied everything. Then we created a new DB and imported a dump again and now it seems to be ok, but we have the new entry already in our live DB(13GB data), so we have to copy those and add to our newl imported DB. What will be the idle way to copy those new records in to recovered DB? I mean some query which will search the new DB and add only new records found. So do I have to take dump of our 13 GB and then import? Or is there anyways to copy from live DB?
You can use mysqldump with the following option: --insert-ignore - writes INSERT IGNORE statements rather than INSERT statements (http://dev.mysql.com/doc/refman/5.1/en/mysqldump.html#option_mysqldump_insert-ignore)
With IGNORE keyword, a row that duplicates an existing UNIQUE index or PRIMARY KEY value in the table is not inserted and no duplicate-key error is issued.
How to combine several sqlite databases (one table per file) into one big sqlite database containing all the tables. e.g. you have database files: db1.dat, db2.dat, db3.dat.... and you want to create one file dbNew.dat which contains tables from all the db1, db2...
Several similar questions have been asked on various forums. I posted this question (with answer) for a particular reason. When you are dealing with several tables and have indexed many fields there. It causes unnecessary confusion to create index properly into the destination database tables. You may miss 1-2 index and its just annoying. The given method can also deal with large amount of data i.e. when you really have gbs of tables. Following are the steps to do so:
Download sqlite expert: http://www.sqliteexpert.com/download.html
Create a new database dbNew: File-> New Database
Load the 1st sqlite database db1 (containing a single table): File-> Open Database
Click on the 'DDL' option. It gives you a list of commands which are needed to create the particular sqlite table CONTENT.
Copy these commands and select 'SQL' option. Paste the commands there. Change the name of destination table DEST (from default name CONTENT) into whatever you want.
6'Click on 'Execute SQL'. This should give you a copy of the table CONTENT in db1 with the name DEST. The main utility of doing it is that you create all the index also in the DEST table as they were in the CONTENT table.
Now just click and drag the DEST table from the database db1 to the database dbNew.
Now just delete the database db1.
Go back to step 3 and repeat with the another database db2 etc.