Earlier I have always been able to import to SQL Server data having column names on first row and information beginning from second row. However, now I should work with data having different structure and I have not possibilities to fix it, so I have to take it as "given".
In data there is first three totally empty rows. Column names are located on fourth and fifth rows, so that names of first three columns are on fourth row and names of other columns on fifth one. Information itself begins from sixth row.
I have no idea, how to manage with this kind of data structure in SQL environment? Former practices seems to be useless, or at least results have been far away from that what I want and need.
Related
Alright so I am not sure how to go about this, I have files that will be coming in a format like this that I need to read into a SQL Server database:
As you can see, it is "~" delimited and it contains no columns names at all. I will have multiple files like this incoming every couple of hours and I have the entire SSIS set up ready besides the part where I actually need to read the data because I am confused on how to handle this delimiter format that other department came up with.
As you can see if I specify the column delimiter just to be "~" it works fine
until it reaches that point where the row ends at which point there is this unnecessary row of "~" that starts and it confuses the connection manager into thinking these are separate columns, creating a bunch of empty columns
I can't simply delete all empty columns because some legit columns can sometimes come in as empty. The only mediocre solution I found so far is to go to advanced options in file connection manager and manually delete all of the columns I don't need. But the reason this will not work at all is because next file I will get might contain more rows than this one and it will still think that "~" after every data row is a column delimiter when in reality it is just a row separator. The number of columns however will always remain static in each file.
I'm trying to make an SQL Query output more readable for our staff. It is a kind of a Warehouse Delivery note. Does not matter if I use Word or Excel or something else.
So I made an SQL Query (MS-SQL 2000 Server), this is working fine. The output obviously is a Table. The first 7 columns containing data that needs to be one time on the print as heading. The next columns containing data that needs to be on a list on the print. Already used PowerPivot to put it in Excel.
To make it some clearer I made a pictures in Excel. The data in the first column contains a "warehouse" number. I need a separate print per Warehouse. As you see, the first 7 columns also containing data concerning the warehouse. The last 3 columns are the products.
I have to put this:
into this, were Column 1 contains the same value. Other value of Column 1 goes to the next page and so on:
Return 2 datasets into your excel workbook, one for the heading values and the other for the details. I'm assuming your first dataset should only return the one row, which means you can reference the values in the data table directly without issue.
Your second table can then be directly inserted into the details section of your spreadsheet and formatted accordingly.
Well, so far it is working now. I used the Grouped Serial Letter function in Word from one Excel Table only. I tried the two Excel table Database Serial Letter version also, but somehow it did not worked. And what did not worked? I dont know, when I run the Database version I only get gibberish output and Errors.
So now I have it that way, that I run a SQL Query, that gives me one Excel Table. After that I run Word and print a Serial Letter
I'm learning how to develop SSIS packages for ETL systems this week. One of my first objectives is to discover different ways to import flat files into a database. As this is pretty straight forward for the most part, I've been playing around with different flat files that contain a variety of data.
One issue I ran into today was with a Excel document that contained data in the first row, the header information in the second row and foot information in the last couple of rows. What I want to import into the database is the header and all the rows leading up to the footer. I do not want the first row and I do not want the footer.
My current solution is to create a Data Flow task in Advance Settings and OpenRowSet with "Sheet1$A2:I20000". This allows me to open the sheet I want, select the second row (where my header resides) and then select all other rows that are between A2 and I20000.
This solution also allows me to read the header information (which I want) and all the rows that follow for importation. Unfortunately, this also selects the footer rows and doesn't seem optimize for good performance as the package has to scan a massive range of rows regardless if there is data in those rows or not.
The screenshot below contains the Excel sheet that I'm trying to import based on the MS SQL sample database. The rows I want to remove or ignore are circles with the red box. Everything else not circled is what I want to import.
Any thoughts on how I can ignore the first row, read the second row for my header information, read the rows that follow the header for my data set and then ignore the last couple of rows that I'm deeming as the footer?
Addition Information About This File
The first row will never change.
The header row will never change.
The data set after the header will change values, not data types.
The first column of footer will never change.
The second column of footer will change values, not data types.
The rest of the footer columns will never change.
I figured out the solution to my own question.
I used the Conditional Split as shown in my diagram to filter out the rows I didn't need. For example, I put a condition that checks if the first column of data (member_no) was < (less than) a number. If TRUE, it goes to my OLE DB. If False, it goes nowhere. This prevented the "SUM TOTAL" from being passed to the database.
I also edited my start range with 'Sheet1$A2:I' as opposed to 'Sheet1$A2:I20000'. That way the package scans until there is no records to scan and stops (I assume).
SSIS does 2 things in relation to handling flat files which are particularly frustrating, and it seems there should be a way around them, but I can't figure it out. If you define a flat file with 10 columns, tab delimited with CRLF as the end of row marker this will work perfectly for files where there are exactly 10 columns in every row. The 2 painful scenarios are these:
If someone supplies a file with an 11th column anywhere, it would be nice if SSIS simply ignored it, since you haven't defined it. It should just read the 10 columns you have defined then skip to the end of row marker, but what is does instead is concatenate any additional data with the data in the 10th column and bung all that into the 10th column. Kind of useless really. I realise this happens because the delimiter for the 10th column is not tab like all the others, but CRLF, so it just grabs everything up to the CRLF, replacing extra tabs with nothing as it does so. This is not smart, in my opinion.
If someone supplies a file with only 9 columns something even worse happens. It will temporarily disregard the CRLF it has unexpectedly found and pad any missing columns with columns from the start of the next row! Not smart is an understatement here. Who would EVER want that to happen? The remainder of the file is garbage at that point.
It doesn't seem unreasonable to have variations in file width for whatever reason (of course only variations at the end of a row can reaonably be handled (x fewer or extra columns) but it looks like this is simply not handled well, unless I'm missing something.
So far our only solution to this is to load a row as one giant column (column0) and then use a script task to dynamically split it using however many delimiters it finds. This works well, except that it limits row widths to 4000 chars (the max width of one unicode column). If you need to import a wider row (say with multiple 4000 wide columns for text import) then you need to define multiple columns as above, but you are then stuck with requiring a strict number of columns per row.
Is there any way around these limitations?
Glenn, i feel your pain :)
SSIS cannot make the columns dynamic, as it needs to store metadata of each column as it come through, and since we're working with flat files which can contain any kind of data, it can't assume that the CRLF in a 'column-that-is-not-that-last-column', is indeed the end of the data line its supposed to read.
Unlike DTS in SQL2000, you can't change the properties of a SSIS package at runtime.
What you could do is create a parent package, that reads the flat file (script task), and only reads the first line of the flat file to get the number of columns, and the column names. This info can be stored in a variable.
Then, the parent package loads the child package (script task again) programmatically, and updates the metadata of the Source Connection of the child package. This is where you would either
1. Add / remove columns to match the flat file.
2. Set the column delimiter for the columns, the last column has to be the CRLF - matching the ROW delimiter
3. Reinitialise the metadata (ComponentMetadata.ReinitializeMetadata()) of the Source Compoenent in the Dataflow task (to recognize the recent changes in the Source Connection).
4. Save the child ssis package.
Details on programmatically modifying a package is readily available only.
Then, your parent package just executes the Child package (Execute Package Task), and it'll execute with your new mappings.
My problem is as follows. I have a CSV file (~100k rows) containting history information with the column format of:
ID1,History1,ID2,History2...ID110,History110
Each row may have anywhere between 0 and 110 history entries. Each separate entry requires a stored procedure to be called.
If there were a small number of possible entries per row, I imagine the way to do this would be to transform the data using a script, and send it to a unique path. Creating 110 paths would probably work, but isn't very elegant (and quite time consuming).
What would the best way to approach this be?
Just load the data (raw csv unchanged, one row per file line) into a staging table. Then, call a stored procedure that will use a string splitter to break up and loop over the staging table rows and call your other procedure for each history entry.
see: Arrays and Lists in SQL Server 2005 and Beyond
also see this previous answer: SQL comma delimted column => to rows then sum totals?
If you want to solve this in SSIS without the staging tables, you could create a destination script component. You could use switch statement or hashtable to lookup the right sproc to execute for the data row.
It is unclear whether this is a better solution then the staging table approach above; but it is an alternative.
I know you already accepted an answer, but couldn't you use an Unpivot task to achieve what you wanted to do here?