Large Excel File Imported Into SQL Server Database - sql-server

I have a client who needs to import rows from a LARGE Excel file (72K rows) into their SQL Server database. This file is uploaded by users of the system. Performance became an issue when we tried to upload and process these at the same time on user upload. Now we just save it to the disk and an admin picks it up and splits it into 2K rows and runs it through an upload tool one by one. Is there an easier way to accomplish this without affecting performance or timeouts?

If I understand your problem correctly you get a large spreadsheet and need to upload it into a SQL Server database. I'm not sure why your process is slow at the moment, but I don't think that data volume should be inherently slow.
Depending on what development tools you have available it should be possible to get this to import in a reasonable time.
SSIS can read from excel files. You could schedule a job that wakes up periodically and checks for a new file. If it finds the file then it uses a data flow task to import it into a staging table and then it can use a SQL task to run some processing in it.
If you can use .Net then you could write an application that reads the data out through the OLE automation API and loads it to a staging area through SQLBulkCopy. You can read the entire range into a variant array through the Excel COM API. This is not super-fast but should be fast enough for your purposes.
If you don't mind using VBA then you can write a macro that does something similar. However, I don't think traditional ADO has a bulk load feature. In order to do this you would need to export a .CSV or something similar to a drive that can be seen off the server and then BULK INSERT from that file. You would also have to make a bcp control file for the output .CSV file.
Headless imports from user-supplied spreadsheets are always troublesome, so there is quite a bit of merit in doing it through a desktop application. The principal benefit is with error reporting. A headless job can really only send an email with some status information. If you have an interactive application the user can troubleshoot the file and make multiple attempts until they get it right.

I could be wrong, but from your description it sounds like you were doing the processing in code in your application (i.e. file is uploaded and the code that handles the upload then processes the import, possibly on a row-by-row basis)
In any event, I've had the most success importing large datasets like that using SSIS. I've also set up a spreadsheet as a linked server which works but always felt a bit hackey to me.
Take a look at this article which details how to import data using several different methods, namely:
SQL Server Data Transformation Services (DTS)
Microsoft SQL Server 2005 Integration Services (SSIS)
SQL Server linked servers
SQL Server distributed queries
ActiveX Data Objects (ADO) and the Microsoft OLE DB Provider for SQL Server
ADO and the Microsoft OLE DB Provider for Jet 4.0

Related

Can I automatically export data from a Cognos report into a database?

The overall goal is to have data from an automated daily Cognos report stored in a database so that I am able to report not only on that day but also historical data if I so choose. My general thought is that if I can find a way to automatically add the new daily data to an existing Excel file, I can then use that as my data source and create a dashboard in Tableau. However, I don't have any programming experience, so I'm floundering here.
I'm committed to using Tableau, but I chose Excel only because I'm more familiar with that program than others, along with the fact that an Excel output file is an option in Cognos. If you have better ideas, please don't hesitate to suggest them along with why you believe it's a better idea.
Update: I'm still jumping through hoops to try to get read-only access to the backend database to make this process a lot more efficient, but in the meantime I've moved forward with the long method utilizing Cognos.
I was able to leverage a coworker to create a system file folder to automatically save the Cognos reports to, and then I scheduled a job to run the reports I need. Each of those now saves into a folder in a shared network drive (so my entire team has access to the files), and I wrote a series of macros to append the data each day from those feeder files in the shared drive to a Master File. Now all that's left is to create a Tableau dashboard using the Master File as the data source and I'll have what I need.
Thanks for all your help!
I'm posting this an an answer because, it's just too much to leave as a comment.
What you need are 3 things.
Figure out how to have COGNOS run your report and download your Excel file.
Use Visual Studio with BIDS (which is the suite of SQL analysis, reporting, and integration services) to automate all the stuff you need to do to append your Excel files, etc... Then you can use the same tools to import that data to your SQL server.
In fact, if all you're doing is trying to get this data into SQL, you can skip the Append Excel part, and just append the data directly to your SQL table.
Once your package is built, you can save it as an automated job on your SQL server to run whenever you wish.
Tableau can use your SQL server as a data source. Once you have that updated, you can run your reports.

What is the best way to continuously (every 10-20 seconds) import a remote XML (accessible through HTTP) into SQL Server?

I have a remote XML file, which is zipped (approx 100MB in size). I need to download, extract, read, parse and import into SQL Server.
Before starting coding this solution (in Python), is there any ready-made utility which could do that? Notice that this needs to run on a scheduled basis (preferably as service) or Windows Schedule.
What's really important is that it needs to be really fast!
Thank you,
Giorgoc
Following on my comments:
You can quite easily do this with SSIS.
Download the remote XML file... Here is an example: How to read a remote xml file from SQL Server 2005
you can then do transformation on the data if needed by using transformation tasks
Using SSIS to load the XML data into the SQL Server DB... Here is an example: How to load an XML file into a database using an SSIS package?
Hope they point you in the right direction and help you in your tasks.

Extract from Progress Database to SQL Server

I'm looking for the best approach (or a couple of good ones to choose from) for extracting from a Progress database (v10.2b). The eventual target will be SQL Server (v2008). I say "eventual target", because I don't necessarily have to connect directly to Progress from within SQL Server, i.e. I'm not averse to extracting from Progress to a text file, and then importing that into SQL Server.
My research on approaches came up with scenarios that don't match mine;
Migrating an entire Progress DB to SQL Server
Exporting entire tables from Progress to SQL Server
Using Progress-specific tools, something to which I do not have access
I am able to connect to Progress using ODBC, and have written some queries from within Visual Studio (v2010). I've also done a bit of custom programming against the Progress database, building a simple web interface to prove out a few things.
So, my requirement is to use ODBC, and build a routine that runs a specific query on a daily basis daily. The results of this query will then be imported into a SQL Server database. Thanks in advance for your help.
Update
After some additional research, I did find that a Linked Server is what I'm looking for. Some notes for others working with SQL Server Express;
If it's SQL Server Express that you are working with, you may not see a program on your desktop or in the Start Menu for DTS. I found DTSWizard.exe nested in my SQL Server Program Files (for me, C:\Program Files (x86)\Microsoft SQL Server\100\DTS\Binn), and was able to simply create a shortcut.
Also, because I'm using the SQL Express version of SQL Server, I wasn't able to save the Package I'd created. So, after creating the Package and running it once, I simply re-ran the package, and saved off my SQL for use in teh future.
Bit of a late answer, but in case anyone else was looking to do this...
You can use linked server, but you will find that the performance won't be as good as directly connecting via the ODBC drivers, also the translation of the data types may mean that you cannot access some tables. The linked server might be handy though for exploring the data.
If you use SSIS with the ODBC drivers (you will have to use ADO.NET data sources) then this will perform the most efficiently, and as well you should get more accurate data types (remember that the data types within progress can change dynamically).
If you have to extract a lot of tables, I would look at BIML to help you achieve this. BIML (Business Intelligence Markup Language) can help you create dynamically many SSIS packages on the fly which can be called from a master package. This master package can then be scheduled or run ad-hoc and so can any of the child packages as needed.
Can you connect to the Progress DB using OLE? If so, you could use SQL Server Linked Server to bypass the need for extracting to a file which would then be loaded into SQL Server. Alternately, you could extract to Excel and then import from Excel to SQL Server.

Suitable method For synchronising online and offline Data

I have two applications with own database.
1.) Desktop application which has vb.net winforms interface, runs in offline enterprise network and stores data in central database [SQL Server]
**All the data entry and other office operations are carried out and stored in central database
2.) Second application has been build on php. it has html pages and runs as website in online environment. It stores all data in mysql database.
**This application is accessed by registered members only and they are facilitied with different reports of the data processed by 1st application.
Now I have to synchronize data between online and offline database servers. I am planning for following:
1.) Write a small program to export all the data of SQL Server [offline server] to a file in CVS format.
2.) Login to admin Section of live server.
3.) Upload the exported cvs file to the server.
4.) Import the data from cvs file to mysql database.
Is the method i am planning good or it can be tunned to perform good. I would also appreciate for other nice ways for data synchronisation other than changing applications.. ie. network application to some other using mysql database
What you are asking for does not actually sound like bidirectional sync (or movement of data both ways from SQL Server to MySQL and from MySQL to SQL Server) which is a good thing as it really simplifies things for you. Although I suspect your method of using CSV's (which I would assume you would use something like BCP to do this) would work, one of the issues is that you are moving ALL of the data every time you run the process and you are basically overwriting the whole MySQL db everytime. This is obviously somewhat inefficient. Not to mention during that time the MySQL db would not be in a usable state.
One alternative (assuming you have SQL Server 2008 or higher) would be to look into using this technique along with Integrated Change Tracking or Integrated Change Capture. This is a capability within SQL Server that allows you to determine data that has changed since a certain point of time. What you could do is create a process that just extracts the changes since the last time you checked to a CSV file and then apply those to MySQL. If you do this, don't forget to also apply the deletes as well.
I don't think there's an off the shelf solution for what you want that you can use without customization - but the MS Sync framework (http://msdn.microsoft.com/en-us/sync/default) sounds close.
You will probably need to write a provider for MySQL to make it go - which may well be less work than writing the whole data synchronization logic from scratch. Voclare is right about the challenges you could face with writing your own synchronization mechanism...
Do look into SQL Server Integration Service as a good alternate.

Transfer data from SQL Server table using query to Excel and vice-versa

I want to transfer data from SQL Server table to Excel sheet using 3 tier architecture in asp.net 3.5. After user has made required changes in the Excel sheet I want Excel sheet to get uploaded and update data in the table with validation for proper data.
You could setup an SSIS package to import the data from Excel
You could either look at libraries (tons of them out there) to programmatically read and write Excel sheets, and handle it all manually.
OR: check out the SQL Server Integration Services (SSIS) - they offer neat ways to export SQL Server data into a multitude of formats (including Excel), and they also offer the route back. You can easily control and execute SSIS packages from a .NET application, too.
I can think of two methods. The first is to make an ADO connection to your SQL/Server from Excel VBA (which can be password protected). When you press a button in Excel, it either reads your spreadsheet data one record at a time and has the validation logic OR it simply uploads it with insert query to a temporary table and then you have a trigger that sees the data and processes it. That way you don't have to upload any files.
I have used ADO commands in VBA to SQL/Server and they are amazingly easy, reliable and exceptionally fast. Plenty of examples out there through Google search to find examples. It's great because you can use all kinds of Excel-specific VBA commands to build a record then update or insert it or whatever you need to do.
For security, you can limit the user (connection string and password hidden in the VBA) -- to just inserting data in a certain database so even if the password is somehow hacked from the VBA it won't do anyone any good as they can only insert.
The second method is to create an ordinary ASP upload control that accepts your Excel file when it is done. There is an event in ASP where you can run ASP.Net code when the file upload is complete. That would see the uploaded Excel file, read it through ordinary .net commands for reading Excel files (Excel automation), process it then I guess either refresh it or discard it. I do not know if there are complications running Excel automation on a server -- probably, because in essence it is running a verison of Excel.Exe on your web server (not really a good idea).
I believe you can make an ADO connection from ASP to an Excel file and do SQL queries on it. I have done that successfully but unfortunately it decides the type of the field based on the first few records and this can sometimes cause a lot of problems when reading an Excel file as a database. I suppose you could write some quick VBA to output the Excel data to CSV and upload that CSV file instead, so that nothing on the web server has to try to read an Excel file. In VBA, you can even automate the upload through SendKeys and InternetExplorer automation. I've done that and it works amazingly well. Sendkeys is the only way to populate the file upload text box for security reasons.
As you can see the first answer is the better one. That is how I would do it because that way you can also refresh your spreadsheet with new data.
I actually think you posted a very interesting question here. It's a lot easier to edit data in an Excel spreadsheet and send it back up. I have replicated a lot of that functionality using the Excel-style grid control from essentialobjects -- great software, but to emulate a spreadsheet takes a lot of coding and still it's just a Excel-like form, not a full spreadsheet.
If you are willing to put MS-Access in the middle, that can get you around a lot of these complications, but is itself an extra layer.

Resources