What is the best (the fastest) way to import huge CSV files to MS SQL Server?
I'm to that job so first I learned how to use Import Wizard Tool, but the speed is inappropriate for me, for example, importing CSV of 5GB and 60 millions rows took me around 12 hours. (My specific is that I have some long text fields in Russian so it needed to be converted to NVARCHAR(MAX) to be correctly visible and I guess it takes 90% of the time)
Tomorrow I learned Import-CSV command in PowerShell and started importing this file again for experiment, and it seems speed is the same.
I know also where is way to use BULK INSERT command in SQL Server (idk how to use it never did before)
Maybe someone can recommend me something else?
Additional question: does importing always upload full CSV to RAM? If I have 25GB CSV file and only 16GB RAM memory what can I do?
Conclusion: I have huge CSV's like 5-25GB, hundred millions rows, Russian language and lots of float/int numbers inside. What can you recommend me for importing it to MS SQL Server?
Thanks a lot!
Related
I already have the schema/tables all created.
I don't have access to a .bak file, so this 18 GB SQL file is all I have. It contains all the data I need to restore the database. I generated this file using SSMS "Generate Script" functionality.
As we all know, SSMS can't import such a big file.
What I tried so far is split the file into 1 GB files, open it with Sublime Text and fix the lines that was cut by using split.
Then I use the cmd line to import the 1 GB files one by one into the SQL Server database.
But this approach takes forever and error prone. Is there an easier way to import a huge SQL file? Again, generating a .bak file is not an option because my access is very limited.
Are there tools I can use specifically for this scenario?
For reference, foreign key checks and other constraints are not needed to import this huge file.
Thanks in advance.
I have a tpcds dsdgen generated data about 100GB, and each dat file is around 20GB+,
I have exhaustively searched how to load flat file into sql-server 2016, and i tried importing it using SSIS package and enabled fast parse on all the decimal fields.
Still it takes lot of time to load into a table. Its taking 24hrs and continuing and just loaded 9,700,000 records of a table only,
is the above the ideal way, or what could be the problem.
please help me, Iam stuck with this problem and also new to MSSQL world.
What's the most performant way of inserting a 3GB .tsv into SQL Server?
I do not know the column names, whenever I try to preview it, takes forever...
If it is easier to first create the table in SQL Server and then insert the .tsv? I could contact the provider of the .tsv and find out the exact column names.
I would think that your best bet is to extract a set number of lines and use that to estimate some broad, catch-all columns and create a table you can load into for analysis of the complete dataset.
You could do this with a small script file like those discussed here:
VBScript - skip and read lines in text file
Once you have an understanding of your data structure, in my experience the fastest way to load data into SQL Server - though far from the most convenient - is the Bulk Copy Program Utility, or BCP:
https://msdn.microsoft.com/en-us/library/ms162802.aspx
The faster way to insert data into SQLServer is using Bulk Insert:
https://msdn.microsoft.com/en-us/library/ms188365.aspx
But you need to know data schema.
I'm working on an application that requires a lot of data. This data is stored in SAP (some big enterprise planning tool) and needs to be loaded in an Oracle database. The data I'm talking about is 15.000+ rows long and each row has 21 columns.
Every time an interaction is made with SAP (4 times a day), those 15.000 rows are exported and have to be loaded in the Oracle database. I'll try to explain what I do now to achieve my goal:
Export data from SAP to a CSV file
Remove all rows in the Oracle database
Load the exported CSV file and import this into the Oracle database
What you can conclude from this is that the data has to be updated in the Oracle database if there is a change in the row. This process takes about 1 minute.
Now I'm wondering if it would be faster to check each row in the Oracle database for changes in the CSV file. The reason why I ask this before trying it first is because it requires a lot of coding to do what my question is about. Maybe someone has done something similar before and can guide me with the best solution.
All the comments helped me reduce the time. First Truncate, then insert all rows with the Oracle DataAccess library instead of OleDb.
I have a unique query regarding Apache Sqoop. I have imported data using apache Sqoop import facility into my HDFS files.
Next ,. I need to put the data back into another database (basically I am performing data transfer from one database vendor to another database vendor) using Hadoop (Sqoop).
To Put data into Sql Server , there are 2 options.
1) Using Sqoop Export facility to connect to my RDBMS,(SQL server) and export data directly.
2) Copy the HDFS data files (which are in CSV format) into my local machine using copyToLocal command and then perform BCP ( or Bulk Insert Query) on those CSV files to put the data into SQL server database.
I would like to understand which is the perfect(or rather correct) approach to do so and which one of them is more Faster out of the two - The Bulk Insert or Apache Sqoop Export from HDFS into RDBMS. ??
Are there any other ways apart from these 2 ways mentioned above which can transfer faster from one database vendor to another.?
I am using 6-7 mappers (records to be transferred is around 20-25 millions)
Please suggest and Kindly let me know if my Question is unclear.
Thanks in Advance.
If all you do is ETL from one vendor to another, then going through Sqoop/HDFS is a poor choice. Sqoop makes perfect sense if the data originates in HDFS or is meant to stay in HDFS. I would also consider sqoop if the set is so large as to warrant a large cluster for the transformation stage. But a mere 25 million records is not worth it.
With SQL Server import it is imperative, on large imports, to achieve minimally logging, which require bulk insert. Although 25 mil is not so large as to make the bulk option imperative, still AFAIK sqoop, nor sqoop2, do not support bulk insert for SQL Server yet.
I recommend SSIS instead. Is much more mature than sqoop, it has bulk insert task and has a rich transformation featureset. Your small import is well within the size SSIS can handle.