I have a tpcds dsdgen generated data about 100GB, and each dat file is around 20GB+,
I have exhaustively searched how to load flat file into sql-server 2016, and i tried importing it using SSIS package and enabled fast parse on all the decimal fields.
Still it takes lot of time to load into a table. Its taking 24hrs and continuing and just loaded 9,700,000 records of a table only,
is the above the ideal way, or what could be the problem.
please help me, Iam stuck with this problem and also new to MSSQL world.
Related
I'm trying to import data into SQL Server using SQL Server Management Studio and I keep getting the "output column... failed because truncation occurred" error. This is because I'm letting the Studio autodetect the field length which it isn't very good at.
I know I can go back and extend the column length but I'm thinking there must be a better way to get it right first time without having to manaully work out how long each column is.
I know that this must be a common issue but my Google searches aren't coming up with anything as I'm more looking for a technique rather than a specific issue.
One approach you may take, assuming the import is not something which would take hours to complete, is to just set every text column to VARCHAR(MAX), and then complete the CSV import. Once you have the actual table in SQL Server, you can inspect each column using LEN to see how wide it is. Based on that, you can either alter columns, or you could just take notes, drop the table, and reimport using appropriate widths.
You should look into leveraging SSIS for this task. There is somewhat of a fixed cost in terms of spending time setting up the process for importing the csv file and creating a physical table in the database. Ultimately, though, you will be able to set the data types for each column/field in your file. Further, SSIS will enable you to transform or reformat the data to say the least.
I would suggest downloading Visual Studio and SQL Server Data Tools. The latter contains the necessary tools, including SSIS, SSRS, and SSAS, for which you would need to complete this task.
The main point is being able to automate this task, especially if it's an ongoing project of uploading csv files into the database.
I have more than 100 million records data in file txt. I would like to import them to SQL server. So, which one I can choose between data loader and SSIS. Thank you so much!
Assuming you are referring to the Import Data wizard when you say "The data loader", it is just a wizard that creates you an SSIS package to import your data. You even get the option to save your import as an SSIS package at the end of the process.
If you care more about the speed of the import, for 100 million records within a text file you would probably (but not definitely) be better off using the Bulk Copy Program (BCP) Utility provided by Mircosoft.
Edit following comments
From what I can see, DataLoader.io is a Salesforce only tool. It seems you cannot use it to load data into SQL Server. In this case, out of the two options you have suggested SSIS is the only viable option. Whether or not SSIS is suitable for your current and on-going situation however, is a larger discussion not really suited to the Stack Overflow format.
I have a table in a db in SQL Server that reads data from a csv file that's being uploaded to a ftp each night. The table shows data for the past 30 days, but suddenly it stopped showing entries past a certain date.
I've checked the ftp dump and the csv file - everything here looks fine here (nothing's changed). The table itself is created using SSIS, and I've found various script for extract, load and transform. However, I'm unsure of how and where to start troubleshooting.
I realize that this is somewhat a broad question, so I'm looking for a way or narrowing down the problem?
Every time I had an SSIS issue like this I would open the package and run it manually from within SSIS because I've found the logging at that level better than when it runs as a SQL job.
For large and complicated packages I would have to select parts of and run piece by piece
I understand this may be a little far-fetched, but is there a way to take an existing SSIS package and get an output of the job it's doing as T-SQL? I mean, that's basically what it is right? Transfering data from one database to another can be done with T-SQL as well.
I'm wondering this because I'm trying to get away from using SSIS packages for data transfer and instead using EF/linq to do this on the fly in my application. My thought process is that currently I have an SSIS package that transfers and formats data from one database to another in preparation to be spit out to an excel. This SSIS package runs nightly and helps speed up the generation of the excel as once the data is transferred to the second db, it's already nice and formatted correctly.
However, if I could leverage EF and maybe some linq to sql in order to format the data from the first database on the fly and spit it out to excel quickly without having to use this second db, that would be great. So can my original question be done, can I extract the t-sql representation of an SSIS package some how?
SSIS packages are not exclusively T-SQL. They can consist of custom back-end code, file system changes, Office document creation steps, etc, to name only a few. As a result, generating the entirety of an SSIS package's work into T-SQL isn't possible, because the full breadth of it's work isn't limited to SQL Server.
I am very new to SSIS and its capabilities. I am busy building a new project that will upload files to a database. The problem I am facing is that files and tables differentiate from one another.
So what was done is I created a table that will map each file's columns to the specific table's column the data needs to be stored in, in a separate table. I want the user to manage this part when they receive a new file or the file layout changes some how.
As far as I know about SSIS is that you can map each file to a table and it can be scheduled as task.
My question is will SSIS be able to handle this or should I handle this process in code?
Many thanks in advance
I would say it all depends on the amount of data that would be imported into your SQL server, for large data sets (Normally 10000+ Rows) it becomes a necessity to utilize the SSIS as you would receive performance gains in your application. Here is a simple example of creating a SSIS package using code. For smaller data operations I would suggest using a combination of this and this. Or to Create a dynamic table on your SQL server based on the file format, look at this
SSIS can be very picky about file formats, so if the files are completely different, then it probably isnt the tool for the job. For flat files, SSIS requires the ordering of columns to be the same.
If you know that your files will only ever arrive in one of 5 formats (for example), it wouldn't be much trouble to write 5 packages to import them. If any new file could have a totally different schema, I dont think SSIS would be the right tool for the job.