Is there a way to read Excel 2010/2013 files natively ?
We are importing Excel files into SQL Server and have come across a specific issue whereby it looks as though the Excel driver decides the type of a destination data column depends upon testing the contents of only the first 65K odd rows.
This has only just started happening within the past 3 weeks, before then we had managed to convince Excel of the error of its ways by a simple registry hack that forced it to read the entire set of rows.
The problem is that we have some datasets that contain, say 120,000 rows and these may have all numeric values for the first 80,000, then it will have some non-numeric yet vital information that we wish to retain.
Yes, the data is not correctly typed, we know.
Because the source data type has been determined by the Excel driver to be a float it promptly turns all our non-numeric values into NULLs - not very useful.
If there was some other way to read an Excel file not using the standard ODBC/OLEDB drivers that might help.
We have tried saving it into various other formats before importing but of course all these exports use the Excel driver which has the problem.
I think the closest we have got is to save it as XML (which is frankly huge at 800MB) and then shred it using standard xpath queries and some pretty dodgy workarounds to handle no doubt well-formed but still tricky variations on how column data is represented.
Edit: changed title to more closely reflect the issue
As well as the registry key, when connectting to your excel file have you tried setting the following:
;Extended Properties="IMEX=1"
See here
Also see this MSDN article
Related
The firm I work in has a lot of data sources entering the firm database using the Informatica ETL tool, stored in maplets and other data models (sorry If I'm not using the exact terminology).
The problem is that all the business logic is stored in the 'graphical interface' and nowhere else - Every time I want to see what field goes into the target field I have to trace the inputs through the maplet and that takes a very long time.
The Question is: Is there a tool that can takes all the relationships in the Informatica maplet and somehow export them to a excel table (so I can see it all without tracing)? that way I could try to make proper documentation....
Thanks in Advance.
It's possible to export mappings or whole workflows to XML. Next, you can use this tool - it will create tables with source to target dependency for every mapping.
Keep in mind it will only map input to output, it won't extract the full logic and transformations done along the way - that would've been to complex for simple visualization.
Informatica supports exporting mapping information to Excel - just search the documentation which tells you how to do it.
However, for anything other than the simplest of mappings, what ends up in Excel is not that easy to understand. If your Informatica installation supports it, then using the lineage capabilities is a much better bet.
I have to import about 50 different types of files every day. Some of them with a few columns, some inculde up to 250 columns.
The Flat File connection always defaults all columns to 50 chars.
Some columns can be way longer than 50 chars, and will of course end up in errors.
Currently i am doing a stupid search&replace with notepad++ - Opening all SISS packages, replacing:
DTS:MaximumWidth="50"
by
DTS:MaximumWidth="500"
This is an annoying workaround.
Is there any possibility to set a default length for flatfile string columns to a certain value?
I am developing in Microsoft Visual Studio Professional 2015 and SQL Server Data Tools 14.0.61021.0
Thanks!
I don't think that there is a way to achieve this from SQL Server Data Tools.
But you can do some workaround to achieve this:
Easiest solution, In the Flat file connection manager - Advanced Tab, select all columns (using Ctrl key) and change the data length property for them all in one edit. (detailed in #MikeHoney answer)
You can use BIML (Business Intelligence Markup Language) to create ssis package, if you're new to BIML you can access to BIML Script website for detailed tutorials.
You can create a Small application that loop over .dtsx files in a folder and replace DTS:MaximumWidth="50" with DTS:MaximumWidth="500" using normal String.Replace function or using Regular expressions. (you can read my answer # Automate Version number Retrieval from .Dtsx files to see an exmaple on reading .dtsx file using Regular expressions)
Function To Read and Replace content of dtsx file (Vb.Net)
Public Sub FixDTSX(byval strFile as string)
dim strContent as string = string.empty
Using sr as new Io.StreamReader(strFile)
strContent = sr.ReadToEnd()
sr.Close()
End Using
strContent = strContent.Replace("DTS:MaximumWidth=""50""","DTS:MaximumWidth=""500""")
Using sw as new Io.StreamWriter(strFile,False)
sw.Write(strContent)
sw.Close()
End Using
End Sub
There is a way to achieve what you want using the standard Visual Studio SSDT UI, although it is quite obscure. AFAIK it works in every version of this editor since SQL Server 2005.
With the package open, from the Connection Managers pane, right-click your Flat File Connection and choose Edit. Then navigate to the Advanced page. Then multi-select the columns you want to change (e.g. shift-click a range or ctrl-click a specific set). Now the Properties appearing at the right will be applied to all the selected columns.
In the example shown below, I have set all the selected columns to a width of 255.
Esteban,
I suggest you use the Object Model API which allows you to develop SSIS packages programmatically. Using that, you can make use of any .net code that allows you to gather data/metadata from text files. Also, the assumption is that, since you are using SSIS, you already might be familiar with writing code in C#/VB.Net
Now, if you are just starting with the Object Model API, there would be a huge learning curve (but it is worth learning it if SSIS is your day to day life). If you do not have the time to invest right now, I would recommend you to use a library I wrote (called Pegasus) which greatly simplifies how you can use the Object Model API; you can create your packages in an almost declarative fashion (using C#).
On Github, there is an example that shows how to create a package that loads any number of text files with differing schemas in a given folder. See here; specifically the method GenerateProjectToLoadTextFilesToSqlServerDatabase().
In the example, I use a third party .Net library called lumenworks.framework to probe delimited files and get their metadata. Using this library, I get the names of the columns; and I also infer data types and lengths based on sampling the first 'n' number of rows. (In my code, I am only inferring ints, dates and strings; if you have more data types, add relevant code accordingly). Or you can specify one specific data type and length (looks like you want to use string of 500 chars) for all your columns. [Or (in some cases), you might have this metadata available outside in a excel file/config file.] Then I use this metadata to configure my text file connection managers programmatically.
YOu can download the code from Github and run the DataFlowExample by specifying where your source files are and see how far it gets you.
Another recommendation would be Biml, but I am not sure if you can incorporate your own/third party full fledged C# code (not just snippets) into Biml workflow. If you can, then go with Biml.
Let me know if you have any questions.
Background: This is my day 1 of learning ETL. Had a little computer science training (not very systematically) before.
So I was learning this SSIS Tutorial on msdn (https://msdn.microsoft.com/en-us/library/ms170583.aspx?f=255&MSPPError=-2147217396), one of the steps is to remap column data types from a flat file.
I never ran into this kind of data types (I mean those [DT_WSTR] thing) before. My question is how do I find the correct data type in Flat File Connection Manager Editor for my destination column? A complete list of reference of datatypes would be perfect. Thanks everyone!
It's actually a rather complex beast. Despite SO best practice of not linking to external resources, I'm going to disregard it here
http://www.cathrinewilhelmsen.net/2014/05/27/sql-server-ssis-and-biml-data-types/
http://milambda.blogspot.com/2014/02/sql-server-integration-services-data.html
http://www.sqlservercentral.com/blogs/dknight/2010/12/22/ssis-to-sql-server-data-type-translations/
https://msdn.microsoft.com/en-us/library/ms187752.aspx
https://msdn.microsoft.com/en-us/library/ms141036.aspx
Is there a way to automatically generate SSIS packages? I need to create a lot of SSIS packages that just erase data from one table and import data from a text file. The file name matches table name and the column headers are in the first line of the file.
For more detailed information:
I am working on a project in which I have to separate two systems that are currently coupled (one system has direct access to the other's database). After the modifications, one system will provide data through txt files to be loaded in the other database.
We have to use SSIS to load data into the database from the text files.
The text files will be provided in CSV format with column headers in the first line.
The tables from both databases have matching column names, and all we need to do is clear the table and load data from the files.
I have more than one hundred tables with different number of columns. Do I need to create each package manually?
I'm familiar with 2 free options.
EzAPI might be a good place if you're a .NET heavy shop or just really want to geek out with the API. This approach allows you to control the pretty much the entire package generation but at the cost of coding time. I find EzAPI generally easier than working with the base COM/.NET libraries for SSIS.
Biml is an interesting beast. Varigence will be happy to sell you a license to Mist but it's not needed. All you would need is BIDSHelper and then browse through BimlScript and look for a recipe that approximates your needs. Once you have that, click the context sensitive menu button in BIDSHelper and whoosh, it generates packages.
I did this just using vb, I passed in the table names as a command parameter and used vb to generate the insert and clear, worked a charm... I can try and dig it out tomorrow when I'm back in the office but it was pretty simple. There didn't seem to be any other way to say "just get x and export it", "just take y and import it into z" so vb it had to be. In fact come to think of it I think I actually used a small xml file to pass the table info for export and then determined the table name for import from the csv file name. To be clear, this was only one package but it could dynamically choose the number of imports/exports it did. Further clarification this was vb within ssis as a processing step
I have for some time helped a customer to export mdb table data to csv files (and then to further process these csv files). I have used Ubuntu, so mdbtools (mdb viewer) has been available to me. Now the customer wants me to automate the work I do in the form of a Windows program. I have run into two problems:
After some hours, I still haven't found a free tool on Windows that can export my table data in a way that I can incorporate in a program/script. Jackcess (jackcess.sourceforge.net) looks promising, but when running the downloaded jar a totally unrelated Nokia Suite program pops up...
I have managed to open two of the tables in a python program by using the pyodbc module, but all the other tables fail to open because of "no read permissions". Until now I thought that there were no access restrictions on the database, because mdb viewer on Ubuntu opens all tables without any fuzz. There is no other file available to be, just the mdb file. One possibility might be that this is not a permissions problem at all, but a problem with special characters in column names. All the tables that I cannot open have at least one column name with a national character, whereas the 2 two tables I can open do not. I tried to use square brackets in the SQL select called from python, like so:
SQL = 'SELECT [colname] from SomeTable;'
but it makes no difference. I cannot fetch data from the columns that do not contain national characters either (except from the 2 two tables that do work).
If it indeed is a permission problem, any solution must also be possible for my program to perform, there must not be any manual steps.
Edit: The developer of the program that produces the mdb files has confirmed that there is no restrictions for any tables. So, the error message "no read permissions" is misleading. I will instead focus on getting around what I presume is a problem with national characters in column names. I will start with the JSDB approach suggested below. Thanks everyone!
Edit 2: I made a discovery that I feel is important: All tables that I can open using pyodbc have Owner=Admin whereas all tables that I cannot open have no owner at all (empty string it seems, "Owner=").
Edit 3: I gave JDBC a shot. Same error again, as one could expect given the finding in Edit 2. Apparently the problem to solve is the table ownership (although MDB Viewer under Linux doesn't seem to care about that...). Since the creator of the files says he didn't introduce any permission settings, I guess the strange table ownership could be the result of using new programs (like 2010) to read data produced in a old program (like sometime in the 90s), or were introduced during some migration of the old program. Any ideas on how to solve it?
You might be able to use VBScript. VBScript is usually used in ASP files for web pages, but can be used stand alone as a Windows program as well.
VBScript is free as it's code you write in Notepad.
Others may come up with better answers for you. Good luck.