Huge text or excel data Import FASTER into sql - sql-server

I have a huge text file that has 1 million rows and each row there is only 28 length number as text.
I want to import them into sql server that has table corresponding column. So that a million data will be inserted to one column DB table.
I used SSIS, it's kind of slow. (1 million data will be inserted in 4.5 hours or more) Are there any other ways to do that much faster ?

You can use BCP utility for fast import . See official documentation here : DOC

As a result, I decided to sptling up the huge data into parts and run more SSIS at the same time through insert same table. There will be no lock in inserting. I hope 6 SSIS finish this job nearly about an hour.
Thanks.

Related

How to copy 19 million rows having text data type columns in faster way to another table in sql server 2012

I need to perform a task in which we have a table who has 19 columns with text data type. I want to delete these columns from this source table and move those columns to a new table with data type as varchar(max). The source table has currently 30k rows (with text data type data). This will increase eventually as client will use the database for record storage. For transferring this old data i tried to use "insert into..select.." query but it is taking around 25-30 mins to transfer these much rows(30k). Same is the case with "Select from..insert.." query. I have also tried creating data flow task of SSIS for transferring with OLE DB as source and destination as well. But still it's taking same amount of time. I'm really confused as all posts over internet suggests that SSIS is fastest way for data transfer. Can you please suggests me better way to improve performance of data transfer using any technique?
Thanks
SSIS probably isn't faster if the source and the destination are in the same database and the SSIS process is on the same box.
One approach might be to figure out where you are spending the time and optimise that. If you set Management Studio to "discard results after execution" and run just the select part of your query, how long does that take? If this is a substantial part of the 25-30 minutes then work on optimising that.
If the select statement turns out to be really fast, then all the time is being spent on the insert and you need to look at improving that part of the process instead. There are a couple of things you can try here before you go hardware shopping; are there any indexes or constraints (or triggers!) on the target table that you can drop for the duration of the insert and put back again at the end? Can you put the database in simple mode?

Import data from Oracle datawarehouse to SQL Server via SSIS 2008

I have an Oracle datawarehouse, which contains a huge amount of data (around 11 million rows) and want to extract this data on a daily basis to SQL Server database.
SSIS Package
I have created a package to import data from Oracle to SQL Server using slowly changing dimensions however it is handling around 600 rows per second.
I need my package to just insert new records without updating or doing anything to old records as the data is huge.
Is there any way to do it very fast with any other data flow items?
You could try to utilize a Merge Join in SSIS, this should allow for a comparison where only new records are inserted. Also, I don't like using just datetime when determining what data does and does not get inserted, I guess it depends on your source data though. Sounds like there is not a sequential ID field for the Oracle source data? If there is, I'd utilize that and datetime in combination for what data to insert. This could be done in SQL or SSIS.
600/sec is not too bad in your case.
If assume that those 11 millions were collected during only 1 year. That means the number of new records is just 30K per day. Which is about 1 minute to run.
The biggest problem is to identify records to insert.
If you have to have Timestamp or sequential ID to identify latest inserted records.
In case your ID is not sequential you can try to extract into SSIS ONLY ID field from Oracle table and compare it to the existing dataset and then request from Oracle only newest records.
If you don't have these fields you can extract all 11 million records, then generate hash on both sides and compare these hash values to know what new to insert.

insert data into vertica from MATLAB

I have to insert millions of data from MATLAB into Vertica. I tried using the datainsert function given in MATLAB but it seems slow as it takes about 6 seconds for 3000 records. The other functions fastinsert and insert are even slower. Is there a faster method to insert the data?
Do yourself a favor and export the data into csv format. See this link for more details.
Vertica performance on sequential insert statements is poor. You have to use Vertica native COPY command to load data from the exported csv file, will do about 1 million rows per sec in a small single node Cluster.

What is the fastest approach to populate MS SQL Database with large amount of data

Dilemma:
I am about to perform population of data on MS SQL Server (2012 Dev Edition). Data is based on production data. Amount is around 4TB (around 250 million items).
Purpose:
To test performance on full text search and on regular index as well. Target number should be around 300 million items around 500K each.
Question:
What should I do before to speed up the process or consequences that I should worry about?
Ex.
Switching off statistics?
Should I do a bulk insert of 1k items per transaction instead of single transaction?
Simple recovery model?
Log truncation?
Important:
I will use sample of 2k of production items to create every random item that will be inserted into database. I will use near unique samples generated in c#. It will be one table:
table
(
long[id],
nvarchar(50)[index],
nvarchar(50)[index],
int[index],
float,
nvarchar(50)[index],
text[full text search index]
)
Almost invariably, in a situation like this, and I've had several of them, I've used SSIS. SSIS is the fastest way I know to import large amounts of data into a SQL Server database. You have complete control over batch (transaction size) and it will perform bulk inserting. In addition, if you have transformation requirements, SSIS will handle this with ease.

sql server table fast load isn't

I've inherited an SSIS package which loads 500K rows (about 30 columns) into a staging table.
It's been cooking now for about 120 minutes and it's not done --- this suggests it's running at less than 70 rows per second. I know that everybody's environment is different but I think this is a couple orders of magnitude off from "typical".
Oddly enough the staging table has a PK constraint on an INT (identity) column -- and now I'm thinking that it may be hampering the load performance. There are no other constraints, indexes, or triggers on the staging table.
Any suggestions?
---- Additional information ------
The source is a tab delimited file which connects to two separate Data Flow Components that add some static data (the run date, and batch ID) to the stream, which then connects to an OLE DB Destination Adapter
Access mode is OpenRowset using FastLoad
FastLoadOptions are TABLOCK,CHECK_CONSTRAINTS
Maximum insert commit size: 0
I’m not sure about the etiquette of answering my own question -- so sorry in advance if this is better suited for a comment.
The issue was the datatype of the input columns from the text file: They were all declared as “text stream [DT_TEXT]” and when I changed that to “String [DT_STR]” 2 million rows loaded in 58 seconds which is now in the realm of “typical” -- I'm not sure what the Text file source is doing when columns are declared that way, but it's behind me now!
I'd say there is a problem of some sort, I bulk insert a staging table from a file with 20 million records and more columns and an identity field in far less time than that and SSIS is supposed to be faster than SQL Server 2000 bulk insert.
Have you checked for blocking issues?
If it is running in one big transaction, that may explain things. Make sure that a commit is done every now and then.
You may also want to check processor load, memory and IO to rule out resource issues.
This is hard to say.
I there was complex ETL, I would check the max number of threads allowed in the data flows, see if some things can run in parallel.
But it sounds like it's a simple transfer.
With 500,000 rows, batching is an option, but I wouldn't think it necessary for that few rows.
The PK identity should not be an issue. Do you have any complex constraints or persisted calculated columns on the destination?
Is this pulling or pushing over a slow network link? Is it pulling or pushing from a complex SP or view? What is the data source?

Resources