What's the most performant way of inserting a 3GB .tsv into SQL Server?
I do not know the column names, whenever I try to preview it, takes forever...
If it is easier to first create the table in SQL Server and then insert the .tsv? I could contact the provider of the .tsv and find out the exact column names.
I would think that your best bet is to extract a set number of lines and use that to estimate some broad, catch-all columns and create a table you can load into for analysis of the complete dataset.
You could do this with a small script file like those discussed here:
VBScript - skip and read lines in text file
Once you have an understanding of your data structure, in my experience the fastest way to load data into SQL Server - though far from the most convenient - is the Bulk Copy Program Utility, or BCP:
https://msdn.microsoft.com/en-us/library/ms162802.aspx
The faster way to insert data into SQLServer is using Bulk Insert:
https://msdn.microsoft.com/en-us/library/ms188365.aspx
But you need to know data schema.
Related
We are trying to devise an optimal method for importing very large Excel files into SQL database. Using SSIS is somewhat troublesome because it scans top X records to determine the format of the file, but rows further down may be different, so it takes a lot of trial and error, with us having to bring the unusual columns to the top so SSIS can "learn".
When we get new file formats to import, they conform to specification in terms of row formatting etc - so we can say we know the schema in advance. The SQL destination tables have the same schema, with couple of extra columns such as date inserted and original filename.
Is there an easier way to create format definitions for new files we are going to insert? We don't have to use SSIS, we are open to any other tool, with a view for as much automation as possible. There's a question of testing the sanity of data we will import, we were planning on doing basic queries against staging datasets such as "less than 1% of records can miss postal code" etc.
Many thanks
Maybe you can import data as text and after that you can convert that using Derived Column transformation. You can read data from Excel as Text using IMEX option in Connection String. More information about this parameter you can find here.
First of all: i don't need a full-text-search engine, i don't need full-text-search in my code. I have a database with ~2000 tables, and i need to find the table and column in which certain information is stored, for developing purposes. Is there any quick way (maybe an SQL Server Management Studio trick that i should know of) to do this? I think phpmyadmin provides such a feature for mysql dbs. At the moment i'm seriously thinking of dumping the database to an .sql file and use a text editor to search for the phrases i'm looking for.
Check the INFORMATION_SCHEMA. You can select on it - there is a table containing all the field names etc. and you can then do search on that one.
I don't see a way how to do it without dynamic SQL - get list of all tables and their columns from sys.tables and sys.columns (don't forget to add proper schema if you're using them), construct query that checks for the values you're trying to find and stores table and column name in temporary table, place all queries into (temp) table and finally cursor/loop over that table executing all queries.
PS. your idea of dumping everything into *.sql files should work as well, depends on the volume of data.
I'm copying about 200 tables from Oracle to SQL Server using SSIS. Right now, the basic package template follows this logic:
Get time
Truncate table
Load data and get row count
Record table name, row count, and time to log table.
Currently, I copy and paste the package and change the data flow. Is there a better way to do this? I know SSIS is metadata driven, but doing 200 tables like this is a little ridiculous. And if my boss wants me to change something in the template then I get to do it all over again. Is there a way to loop through tables? I would just use linked servers in SQL Server but since we have SQL Server Enterprise I'm able to use the Attunity connectors and they are much faster.
Any help would be appreciated. It seems like there must be a better way but I'm not familiar enough with SSIS to really know what to ask for.
I'm not a good SQL programmer, I've got only the basics, but I've heard of some BCP thing for fast data loading. I've searched the internet and it seems to be a command-line only utility, and not something you can use in code.
The thing is, I want to be able to make very fast inserts and updates in a SQL Server 2008 database. I would like to have a function in the database that would accept:
The name of the table I want to execute an insert/update operation against
The names of the columns I'll be feeding data to
The data in a CSV format or something that SQL can read stupid-fast
A flag indicating weather the function should perform an insert or update operation
This function would then read this CSV string and genarate the necessary code for inserting/updating the table.
I would then write code in C# to call that function passing it the table name, column names, a list of objects serialized as a CSV string and the insert/update flag.
As you can see, this is intended to be both fast and generic, suitable for any project dealing with large amounts of data, and thus a candidate to my company's framework.
Am I thinking right? Is this a good idea? Can I use that BCP thing, and is it suitable to every case?
As you can see, I need some directions on this... thanks in advance for any help!
In C#, look at SQLBulkCopy. It's what SSIS uses in the background.
For true bcp/BULK INSERT, you'd need bulkadmin rights which may not be allowed
Have you considered using SQL Server Integrated Services (SSIS). It's designed to do exactly what you describe. It is very fast. You can insert data on a transactional basis. And you can set it up to run on a schedule. And much more.
I am tasked with exporting the data contained inside a MaxDB database to SQL Server 200x. I was wondering if anyone has gone through this before and what your process was.
Here is my idea but its not automated.
1) Export data from MaxDB for each table as a CSV.
2) Clean the CSV to remove ? (which it uses for nulls) and fix the date strings.
3) Use SSIS to import the data into tables in SQL Server.
I was wondering if anyone has tried linking MaxDB to SQL Server or what other suggestions or ideas you have for automating this.
Thanks.
AboutDev.
I managed to find a solution to this. There is an open source MaxDB library that will allow you to connect to it through .Net much like the SQL provider. You can use that to get schema information and data, then write a little code to generate scripts to run in SQL Server to create tables and insert the data.
MaxDb Data Provider for ADO.NET
If this is a one time thing, you don't have to have it all automated.
I'd pull the CSVs into SQL Server tables, and keep them forever, will help with any questions a year from now. You can prefix them all the same, "Conversion_" or whatever. There are no constraints or FKs on these tables. You might consider using varchar for every column (or the ones that cause problems, or not at all if the data is clean), just to be sure there are no data type conversion issues.
pull the data from these conversion tables into the proper final tables. I'd use a single conversion stored procedure to do everything (but I like tsql). If the data isn't that large millions and millions of rows or less, just loop through and build out all the tables, printing log info as necessary, or inserting into exception/bad data tables as necessary.