I have a directory of SAS7BDAT files - about 300 of them which I need to import them into a SQL Server table. Unfortunately, the date field is not part of the dataset but is in the filename. So I need to parse the filename, get the date and append to each dataset at the time of import.
Is SSIS a good candidate for this? If so, do I use For-each loop to this? How do I parse the filename and append the date?
For individual files, I can easily use SQL Server Management Studio and import it. I can do the same for this exercise too and then handle the date when loading to the final table, but am hoping there is a much more cleaner solution.
Is there any other backend way of handling this without SAS installed? Python or otherwise?
TIA
[Solved]
Came across an article which mentioned R's SAS7BDAT library.
So using that, I could successfully load all the files along with the filename into an R list using "ldply".
After some data frame manipulation, I could load all the files into SQL Server using SQLSave.
The files are very small in size. So performance wasn't much of an issue, although I suspect it can be for larger volumes.
Related
Problem:
I receive multiple sets of flat files on a weekly basis that need t be imported my Database. The flat files I receive are not in a conventional format to import, so they need to be run through a script and be parsed in to a more SQL friendly format. These flat files are usually in JSON, TXT, XML, LOG, ect.
Current Solutions
Current I have a windows forms application and another GUI to transform the files and to bulkimport to SQL tables. However, I'm finding it unreliable to ask users to import data, and I would much rather automate the tasks.
More recently, I have been creating SSIS packages. This proves to much faster and useful since I can add script components. It allows me to manually parse whatever flat files I throw at it. My issue is finding a way to automate this. I have no control of the server where my database is hosted. So I'm unable to deploy the packages there to bring in the files. Currently, I'm just running the packages on my local machine manually to get the data in.
Needed Solution
I need a way for me to automate getting these flat files in. Originally I wanted to request and FTP server for the files to be dumped in. Then the files would be picked up by my packages and imported into the SQL Server DB. However, since I have no control of any of the local folders on that server, it seems to be impossible for me to automate this. Is there a better way for me to find a solution for this? Could I build something custom in C#, Python, Powershell, etc.? I'm very new to the scene and trying to find a solution for this problem has been a nightmare.
I need to have users import Excel/CSV files to my database.
Currently, I have a VB.net application that will let me import CSV files only to our database. Rather than scaling this application to keep fitting my needs, and deploying it to users to import data, I'm considering switching to SSIS.
How do I deploy packages so that my users are able to use them to import Excel/CSV files? I know SSIS is not intended to be a front end, so should I not use it for my needs? Is it only used for SQL Developers to import data?
Also, my users have no experience with SQL or using a database. They are used to putting their excel files on Sharepoint or pass them around via email. I just introduced them to SSRS which works wonderfully as a reporting service but I need a simple and reliable import process.
Probably not for a few reasons:
You'd have to deploy the SSIS runtime for the package to run - this is not something that is usually done. You'd probably have to pay a licence cost
SSIS stores metadata (i.e. the type and number of columns in the source and target). If this metadata changes then the package will usually fail
SSIS is a server tool. It 's not really built for user feedback
Excel as a source is difficult for two reasons:
It has no validation. Users can put anything they want in it, including invalid or missing values
Excel drivers work out metadata by inspecting rows on the fly and this is sometimes incorrect (I'm sure you've already encountered this in your program)
A custom built solution requires more maintenance but has a lot more flexibility, and you probably need this flexibility given that you have excel sources.
If your excel files are guaranteed to be clean every time, and all of your users use a single SQL Server (with a single licensed install of SSIS) then it might be practical.
Added to reflect discussion below:
In this case you have consistent data files coming from elsewhere that need to be automatically uploaded into the database. SSIS can help in this case with the following proven pattern:
User (or process) saves the file is saved to a specific shared folder
A package, scheduled to run every (say) one minute in SQL Agent, imports all files in that folder
If the import is successful, the file is moved to a 'successful' folder
If the import is unsuccessful, the file is moved to a 'failed' folder
This way, a thick client app doesn't need to be deployed to everyone. Instead any user can drop the file (if they have share access), and it will be automatically pulled in
Users can also confirm that the file was successful by checking the folder
Here's an example of a package that imports all files in a folder and moves them when complete:
SSIS - How to loop through files in folder and get path+file names and finally execute stored Procedure with parameter as Path + Filename
The overall goal is to have data from an automated daily Cognos report stored in a database so that I am able to report not only on that day but also historical data if I so choose. My general thought is that if I can find a way to automatically add the new daily data to an existing Excel file, I can then use that as my data source and create a dashboard in Tableau. However, I don't have any programming experience, so I'm floundering here.
I'm committed to using Tableau, but I chose Excel only because I'm more familiar with that program than others, along with the fact that an Excel output file is an option in Cognos. If you have better ideas, please don't hesitate to suggest them along with why you believe it's a better idea.
Update: I'm still jumping through hoops to try to get read-only access to the backend database to make this process a lot more efficient, but in the meantime I've moved forward with the long method utilizing Cognos.
I was able to leverage a coworker to create a system file folder to automatically save the Cognos reports to, and then I scheduled a job to run the reports I need. Each of those now saves into a folder in a shared network drive (so my entire team has access to the files), and I wrote a series of macros to append the data each day from those feeder files in the shared drive to a Master File. Now all that's left is to create a Tableau dashboard using the Master File as the data source and I'll have what I need.
Thanks for all your help!
I'm posting this an an answer because, it's just too much to leave as a comment.
What you need are 3 things.
Figure out how to have COGNOS run your report and download your Excel file.
Use Visual Studio with BIDS (which is the suite of SQL analysis, reporting, and integration services) to automate all the stuff you need to do to append your Excel files, etc... Then you can use the same tools to import that data to your SQL server.
In fact, if all you're doing is trying to get this data into SQL, you can skip the Append Excel part, and just append the data directly to your SQL table.
Once your package is built, you can save it as an automated job on your SQL server to run whenever you wish.
Tableau can use your SQL server as a data source. Once you have that updated, you can run your reports.
I am looking into copying a file from the client computer to the server computer. One path I've looked into was creating a CLR method which accepts a stream as input. Another suggestion I've had is to use the BCP utility, though I have been unsuccessful in finding any examples which this was done using BCP.
Is it possible to pass a blob to BCP and import to a table, or would there be more steps involved to make this work?
Which method would be best for a file copy functionality?
You can BCP blobs in and out of the database. Also, I found a reference to scripts for importing and exporting files as blobs which might be helpful.
What is the best way to import highly formatted data from Excel to SQL server.
Basically I have 250+ Excel files that have been exported from a reporting tool in a format that our business users would prefer. This is a 3rd party tool that can not export data in any other format. I need to "scrub" these files on a monthly basis and import them into a database. I want to use SQL Server 2005
File formats look like this:
Report Name
Report Description
MTH/DEC/2003 MTH/JAN/2004 MTH/FEB/2004
Data Type Data Type Data Type
Grouping 1 1900 1700 2800
Grouping 2 1500 900 1300
Detail 300 500 1000
Detail 1100 200 200
Detail 100 200 100
you could write a simple parser application. there are many api that will handle reading excel files.
I have written one in java and it only took a day or two.
here is one api.
Good Luck
EDIT: Forgot to mention we will also need a sql api such as JDBC. Again we use JDBC for the majority of our applications and works great.
Personally I would do it using SSIS. It might not be trivial to set up as the file format looks relatively complex (but that I suspect might be true no matter what tool you use), but as long as it stays consistent, it will run quickly each month and SSIS packages are easy to put under source control. Since SSIS is part of SQL Server it is easy to make sure allthe servers have it available. The key is do have a good understanding of how that format relates to how you store data in the database. THat's the hard part no matter what tool you use.
Assuming that you have Microsoft Excel, you can also use Excel's own exposed ActiveX interface. More information here:
http://msdn.microsoft.com/en-us/library/wss56bz7(VS.80).aspx
You could use that in anything that can use ActiveX (C++, VB6, VB.NET etc...) to create a parser as well, to follow-up on what Berek said.
I have done this before with perl and MYSQL. I wrote a simple perl script which parsed the file and output the contents to a .sql file. Then, this can eather be done manually or included in the perl script, open MYSQL and use the .sql file.
This may seem a bit simplistic, but you could simply dump the data in csv format and do some parsing of the output to convert to insert statements for SQL.
For a java based application , POI (http://poi.apache.org/) is pretty good for Excel integration applications.
You might want to look at CLR procedures and functions in SQL Server. With a CLR procedure you could do all of your scrubbing in a VB or C# .NET application but still run the jobs from SQL Server just like any other stored procedure or UDF.