I have an Access (2010) database (front-end) with linked SQL Server-tables (back-end). And I need to import text files into these tables. These text files are very large (some have more than 200.000 records, and approx. 20 fields)
The problem is that I can't import the tex tfiles directly in the SQL tables. Some files contain empty lines at the start, and some other lines that I don't want to import in the tables.
So here's what I did in my Access database:
1) I created a link to the text files.
2) I also have a link to the SQL Server tables
3a) I created an Append-query that copies the records from the linked text file to the linked SQLServer table.
3b) I created a VBA-code that opens both tables, and copies the records from the text file in the SQL Server-table, record for record. (I tried it in different ways: with DAO and ADODB).
[Step 3a and 3b are two different ways how I tried to import the data. I use one of them, not both. I prefer option 3b, because I can run a counter in statusbar to see how many records needs to be imported at any moment; I can see how far he is.]
The problem is that it takes a lot of time to run it... and I mean a LOT of time: 3 hours for a file with 70.000 records and 20 fields!
When I do the same with an Access-table (from TXT to Access), it's much faster.
I have 15 tables like this (with even more records), and I need to make these imports every day. I run this procedure automatically every night (between 20:00 and 6:00).
Is there an easier way to do this?
What is the best way to do this?
This feels like a good case for SSIS to me.
You can create a data flow from a flat file (as the data source) to a SQL DB (as the destination).
You can add some validation or selection steps in between.
You can easily find tutorials like this one online.
Alternatively, you can do what Gord mentioned, and import the data from a text file into a local Access table and then using a single INSERT INTO LinkedTable SELECT * FROM LocalTable to copy the data to the SQL Server table.
Related
I just tested importing 1321 records (one int column as a key, two text columns as nvarchar(100)) to a MS SQL server.
In Navicat this took me 7 seconds to create with the import wizard
In Datagrip it took 280ms per row (370 seconds). The method I chose to do this was:
1) Open .csv file
2) Use the SQL Insert and Data Extractor options
3) Rename MY_TABLE to the appropriate name (this caused lag on my system with 16gigs of RAM)
4) Control A and then execute
I saw it inserting each row one at a time. This is a simple lookup table. After this I am planning on importing records from 2014 until present (I am creating a new database) which consists of several million rows. Am I inserting .csv files incorrectly? What options do I have here?
Context menu of datasource you want to import to
Choose Import from File....
Customize the table that will be created, check the preview and press OK.
I'm having several issues with importing a flat file into MS SQL Server using the SQL Server import / export wizard. I'd like to know how to effectively load the file into a SQL Server table.
File Conditions:
The flat file is fairly large (800MB, and serveral million rows)
It's poorly formatted
The first column is empty
The header is a 3 row set: top blank, middle has field names, bottom blank
This 3 row header is repeated approximately every 60,000 rows
Some values are nulls
It's tab delimited
First, I tried to load it in as Flat File, but SQL server failed to recognize the tab delimiters. Excel opens it correctly (although partially), but SQL Server sticks it all in 1 column.
Second, I tried opening and saving it as an excel file and loading it as an excel file into the SQL Server import wizard (which I'm not sure if it resaves all the data anyway). Now SQL Server parses the columns correctly, but it says integrity constrints are broken when it hits the repeated headers (every numeric type field has a string header every 60000 rows).
If anyone can tell me how to get around this that would be great. I'd ideally like to upload it without the integrity constraints and remove the extra headers with a DELETE WHERE header or blank clause. Not the only solution I'll take, but an idea.
Also, this is my first stackoverflow post, so patience is appreciated.
Thanks,
Since I don't have a formal answer yet, I'll post what I ended up doing.
Essentially, I just made everything a varchar so it would just load into a table. Then I wrote several queries to clean up the garbage in it. Later I made new typed fields and filled them with an insert and cast from the varchar typed fields.
I don't know that this will ever help someone, but at least there's an answer here.
I have 12 very-large Excel sheets.
Each one is 102 columns wide, and an average of 600K rows long.
They're all identical in structure.
Quite often I need to run a specific query, from only a subset of columns, with a specific criteria. And the process of doing so by opening each file and filtering/sorting/finding what I need has become very tedious.
If they were all in a database, such queries would be so much easier.
I tried MS Access, and I tried SQL Server Express.
Both die on me during the respective Import wizards.
And the failure is certainly due to the size of the data set, because if I manually trim a file to say 10 rows, the import works fine.
Any ideas how to do this? I'm open to using Access, or SQL Server Express, or any other tool that does the job really.
Note: Some of the columns contain records that contain commas, hence several suggestions I found to turn the files into CSVs prior to import, ended up with broken structure.
Edit 1: They're 12 separate workbooks, with 1 sheet in each. And the aim is create a single table where all the data is appended.
I have a desktop application through which data is entered and it is being captured in MS Access DB. The application is being used by multiple users(at different locations). The idea is to download data entered for that particular day into an excel sheet and load it into a centralized server, which is an MSSQL server instance.
i.e. data(in the form of excel sheets) will come from multiple locations and saved into a shared folder in the server, which need to be loaded into SQL Server.
There is a ID column with IDENTITY in the MSSQL server table, which is the primary key column and there are no other columns in the table which contains unique value. Though the data is coming from multiple sources, we need to maintain single auto-updating series(IDENTITY).
Suppose, if there are 2 sources,
Source1: Has 100 records entered for the day.
Source2: Has 200 records entered for the day.
When they get loaded into Destination(SQL Server), table should have 300 records, with ID column values from 1 to 300.
Also, for the next day, when the data comes from the sources, Destination has to load data from 301 ID column.
The issue is, there may be some requests to change the data at Source, which is already loaded in central server. So how to update the data for that row in the central server as the ID column value will not be same in Source and Destination. As mentioned earlier ID is the only unique value column in the table.
Please suggest some ides to do this or I've to take up different approach to accomplish this task.
Thanks in advance!
Krishna
Okay so first I would suggest .NET and doing it through a File Stream Reader, dumping it to the disconnected layer of ADO.NET in a DataSet with multiple DataTables from the different sources. But... you mentioned SSIS so I will go that route.
Create an SSIS project in Business Intelligence Development Studio(BIDS).
If you know for a fact you are just doing a bunch of importing of Excel files I would just create many 'Data Flow Task's or many Source to Destination tasks in a single 'Data Flow Task' up to you.
a. Personally I would create tables in a database for each location of an excel file and have their columns map up. I will explain why later.
b. In a data flow task, select 'Excel Source' as the source file. Put in the appropriate location of 'new connection' by double clicking the Excel Source
c. Choose an ADO Net Destination, drag the blue line from the Excel Source to this endpoint.
d. Map your destination to be the table you map to from SQL.
e. Repeat as needed for each Excel destination
Set up the SSIS task to automate from SQL Server through SQL Management Studio. Remember you to connect to an integration instance, not a database instance.
Okay now you have a bunch of tables right instead of one big one? I did that for a reason as these should be entry points and the logic to determinate dupes and import time I would leave to another table.
I would set up another two tables for the combination of logic and for auditing later.
a. Create a table like 'Imports' or similar, have the columns be the same except add three more columns to it: 'ExcelFileLocation', 'DateImported'. Create an 'identity' column as the first column and have it seed on the default of (1,1), assign it the primary key.
b. Create a second table like 'ImportDupes' or similar, repeat the process above for the columns.
c. Create a unique constraint on the first table of either a value or set of values that make the import unique.
c. Write a 'procedure' in SQL to do inserts from the MANY tables that match up to the excel files to insert into the ONE 'Imports' location. In the many inserts do a process similar to:
Begin try
Insert into Imports (datacol1, datacol2, ExcelFileLocation, DateImported) values
Select datacol1, datacol2, (location of file), getdate()
From TableExcel1
End try
-- if logic breaks unique constraint put it into second table
Begin Catch
Insert into ImportDupes (datacol1, datacol2, ExcelFileLocation, DateImported) values
Select datacol1, datacol2, (location of file), getdate()
From TableExcel1
End Catch
-- repeat above for EACH excel table
-- clean up the individual staging tables for the next import cycle for EACH excel table
truncate TableExcel1
d. Automate the procedure to go off
You now have two tables, one for successful imports and one for duplicates.
The reason I did what I did is two fold:
You need to know more detail than just the detail a lot of times like when it came in, from what source it came from, was it a duplicate, if you do this for millions of rows can it be indexed easily?
This model is easier to take apart and automate. It may be more work to set up but if a piece breaks you can see where and easily stop the import for one location by turning off the code in a section.
How to combine several sqlite databases (one table per file) into one big sqlite database containing all the tables. e.g. you have database files: db1.dat, db2.dat, db3.dat.... and you want to create one file dbNew.dat which contains tables from all the db1, db2...
Several similar questions have been asked on various forums. I posted this question (with answer) for a particular reason. When you are dealing with several tables and have indexed many fields there. It causes unnecessary confusion to create index properly into the destination database tables. You may miss 1-2 index and its just annoying. The given method can also deal with large amount of data i.e. when you really have gbs of tables. Following are the steps to do so:
Download sqlite expert: http://www.sqliteexpert.com/download.html
Create a new database dbNew: File-> New Database
Load the 1st sqlite database db1 (containing a single table): File-> Open Database
Click on the 'DDL' option. It gives you a list of commands which are needed to create the particular sqlite table CONTENT.
Copy these commands and select 'SQL' option. Paste the commands there. Change the name of destination table DEST (from default name CONTENT) into whatever you want.
6'Click on 'Execute SQL'. This should give you a copy of the table CONTENT in db1 with the name DEST. The main utility of doing it is that you create all the index also in the DEST table as they were in the CONTENT table.
Now just click and drag the DEST table from the database db1 to the database dbNew.
Now just delete the database db1.
Go back to step 3 and repeat with the another database db2 etc.