How to deal with Weird records in FLAT FILE? - sql-server

SSIS falls flat on it back with this scenario .
In my flat file, we have Normal looking records like this
"1","2","STATUSCHANGED","A","02-MAY-12 21:52:34","","Re","Initial review",""
And some like this ; ( record spread over several lines )
"1","2","SALESNOTIFICATIONRESPOND","Ac","02-MAY-12 21:55:19","From: W, J
Sent: Wednesday, May 08, 2012 2:00 PM
To: XXXX, A; Acost
Subject: RE: Notification Id 1219 - Qu ID XXXXXX
I got this from earlier today. Our team is reviewing the request.
Thanks,
Hi,
This account belongs to D please approve/deny.
Thanks!
Claud","","","Reassign"
So looking at the file in NOTEPAD + which is amazing it shows me that within that field that is spread over several line, I should take out all the {CR}{LF} in that field.
The row delimiter for this file is LF and the text qualifier is “.
So 2 things I need to do on a collection of 200 file ?
Remove all the {CR}{LF} in the file ?
Remove any embedded “ in the actual fields as “ is the text qualifier ?
Anyone have any idea how to do this in windows , dos or vba for such a large number of files so its automated ?

For data such as this, I prefer using a Script Component to perform the parse. I wrote a blog post describing one approach.
Hope this helps,
Andy

Powershell will do this for you for the {CR}{LF} but it might take you a while to code if you have never used powershell before.
The " qualifier appearing in the middle of fields is a real mess, you may be able to develop rules to clean this up but there is no guarantee that you will succeed.

If the proper row terminator is just LF and you are certain that every row is properly terminated by LF then you can remove all {CR}{LF}, but you should not actually need to. As long as they {CR}{LF} is properly inside a pair of text qualifiers, it should just be imported literally.
And yes, you definitely need to remove any text qualifiers (or escape them, as you prefer) from within an actual field when the entire field is surrounded by text qualifiers. That will cause confusion.
Personally, I would approach this by either writing a python script to preprocess the data before feeding it to SSIS or just have the script import the entire thing into SQL for me.

I agree with Andy. I had a similar issue and I took care of it with a script component task.
Your code could look something like this (doesnt handle the CR LF issue)
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
<Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute> _
<CLSCompliant(False)> _
Public Class ScriptMain
Inherits UserComponent
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Dim strRow As String
Dim strColSeperator As String
Dim rowValues As String()
strRow = Row.Line.ToString()
If strRow.Contains(",") Then
strColSeperator = (",")
ElseIf strRow.Contains(";") Then
strColSeperator = ";"
End If
rowValues = Row.Line.Split(CChar(strColSeperator))
If (rowValues.Length > 1) Then
Row.Code = rowValues.GetValue(0).ToString()
Row.Description = rowValues.GetValue(1).ToString()
Row.Blank = rowValues.GetValue(2).ToString()
Row.Weight = rowValues.GetValue(3).ToString()
Row.Scan = rowValues.GetValue(4).ToString()
End If
End Sub
End Class
A step by step tutorial is available at Andy Mitchell's post

Related

ssis unicode flatfile processing by script component

I have an awkward flat file input that can be virtually any length. This is a comma delimited file, but has embedded tables delimited by "[{" and "}]" or "{" and "}" .. depending on the table type. I cannot use the off the shelf SSIS comma delimited flat file as there may be records with no embedded tables at all.
To get around this I've set the flat file input to be ragged right and with one column of 8,000 characters.
I've then done the string splitting in a script component and output the table data to separate output streams.
However, I am now receiving files that exceed 8000 characters which has broken my process.
I've tried converting the flat file from "1252 (ANSI Latin 1)" into unicode with the column in NTEXT.
I've then inserted the following code to convert this to a string
See http://www.bimonkey.com/2010/09/convert-text-stream-to-string/
Dim TextStream As Byte() ' To hold Text Stream
Dim TextStreamAsString As String ' To Hold Text Stream converted to String
' Load Text Stream into variable
TextStream = Row.CopyofColumn0.GetBlobData(0, CInt(Row.CopyofColumn0.Length))
' Convert Text Stream to string
TextStreamAsString = System.Text.Encoding.Unicode.GetString(TextStream)
But when I look at the string I get appear to get a lot of kanji type characters and no line feeds.
Any ideas what I can try next?
As I found it difficult finding an exact match to using the filesystemobject in an SSIS vb.net script component source transformation, I thought I'd share my findings!
The following imports are required
Imports System.IO
Imports System.Text
and the code ..
Public Overrides Sub CreateNewOutputRows()
<Output Name>Buffer".
Dim strFilePath As String
Dim strFileContent As String
Dim objFileInfo As FileInfo
Dim objStreamReader As StreamReader
Try
strFilePath = "c:\myfile.csv" 'Me.Variables.FullFilePath
objFileInfo = New FileInfo(strFilePath)
objStreamReader = New StreamReader(strFilePath)
Do Until objStreamReader.EndOfStream
strFileContent = objStreamReader.ReadLine
Process_data(strFileContent) ' do the work in this a sub!
Loop
Catch ex As Exception
MessageBox.Show(ex.Message.ToString(), "Error", MessageBoxButtons.OK)
End Try
End Sub
Note: I use a foreach loop to obtain the filename in my script. The hard coded filepath here is just as an example.
Instead of using a flat file source, you could just use a script source component that opens the file with a file system object.

Using VB6 to update info in database

I'm being taught VB6 by a co-worker who gives me assignments every week. I think this time he's overestimated my skills. I'm supposed to find a line in a text file that contains Brand IDs and their respective brand name. Once I find the line, I'm to split it into variables and use that info to create a program that, via an inserted SQL statement, finds the brand, and replaces the "BrandName" in the item description with the "NewBrandname".
Here's what I'm working with
Dim ff as integer
ff = freefile
Open "\\tsclient\c\temp\BrandNames.txt" For Input as #ff
Do Until EOF(ff)
Dim fileline as string,linefields() as string
line input #ff, fileline
linefields = split(fileline,",")
brandID = linefields(0)
BrandName = linefields(1)
NewBrandName = linefields(2)
I want to use the following line in the text file, since It's the brand I'm working with:
BrandID =CHEFJ, BrandName=Chef Jay's NewBrandName=Chef Jays
That's what 'fileline' is- just don't know how to select just that one line
As for updating the info, here's what I've got:
dim rs as ADODB.Recordset, newDesc1 as String
rs = hgSelect("select desc1 from thprod where brandID='CHEFJ'")
do while not rs.eof
if left(rs!desc1,len(BrandName)) = BrandName then
dim newDesc1 as string
newDesc1 = NewBrandname & mid(rs!desc1, len(BrandName)+1)
hgExec "update thprod set desc1=" & adoquote(NewBrandName) & "+right(desc1,len(BrandName))" where brandId=CHEFJ and desc1 like 'BrandName%'"
end if
rs.movenext
loop
end while
How do I put this all together?
Just to give you some guidelines;
Firstly you need to read the Text file, which you are already doing.
Then, once you get the data, you need to spot the format and SPLIT the data to retrieve only the parts you need.
For example, if the data read from textfile gives you BrandID=CHEFJ, BrandName=Chef Jay's, NewBrandName=Chef Jays, you will see that the data are delimited by commas ,, and the property values are preceded by equal signs.
Follow LINK for more info of how to split.
Once you've split the data, you can easily use them to proceed with your database update. To update your db, first of all you will need to create the connection. Then your query to update using the data you've fetched from the Text file.
Finally you need to execute your query using ADODB. This EXAMPLE can help.
Do not forget to dispose the objects used, including your connection.
Hope it helps.

How to convert data in excel spreasheet forms into format for upload into a database

I need a good way to convert input data from 900 spreadsheets into a format suitable for upload to a relational database (XML or flat file/s). The spreadsheets are multi-sheet, multi-line Excel 2007 each one consisting of 7 forms (so its definitely not a simple grid). There will be no formula data to get, just text, dates, integer data.
The 900 spreadsheets are all in the same format.
I will need some kind of scripted solution.
I'm expecting I should be able to do this with excel macros (and I expect a fancy scriptable editor could do it too) or possibly SSIS.
Can someone tell me how you would approach this if it was yours to do?
Can anyone give a link to some technical info on a good way to do this?
I'm new to excel macros but used to programming and scripting languages, sql, others.
Why? We're using spreadsheet forms as an interim solution and I then need to get the data into the database.
You probably want to write data out to a plain text file. Use the CreateTextFile method of FileSystemObject. Documentation here:
http://msdn.microsoft.com/en-us/library/aa265018(v=vs.60).aspx
There are many examples on the web of how to iterate over worksheets, capture the data and then use WriteLine method.
Sub ExampleTextFile()
Dim fso as Object
Dim oFile as Object
Dim fullExport as String
'Script that will capture data from worksheet belongs here _
' use the fullExport string variable to hold this data, for now we will _
' just create a dummy string for illustration purposes
fullExport = "Example string contents that will be inserted in to my text file!"
Set fso = CreateObject("Scripting.FileSystemObject")
'In the next line, replace "C:\filename.txt" with the specified file you want to create
set oFile = fso.CreateTextFile("C:\filename.txt", Overwrite:=True, unicode:=True)
oFile.WriteLine(fullExport) '<-- inserts the captured string to your new TXT file
oFile.Close
Set fso = Nothing
Set oFile = Nothing
End Sub
If you have character encoding issues (I recently ran in to a problem with UTF16LE vs. UTF8 encoding, you will need to use the ADODB.Stream object, but that will require a different method of writing the file.

Database access excel formatting or if statement or grouping

I have a database formatting problem in which I am trying to concatenate column "B" rows based on column "A" rows. Like So:
https://docs.google.com/spreadsheet/ccc?key=0Am8J-Fv99YModE5Va3hLSFdnU0RibmQwNVFNelJCWHc
Sorry I couldn't post a picture. I don't have enough reputation points YET. I'LL Get them eventually though
So I'd like to solve this problem within Excel or Access. Its currently an access database, but I can export it to excel easily. As you can see, I want to find "userid" in column A and where there are multiple column A's such as "shawn" I'd like to combine the multiple instances of shawn and concatenate property num as such.
Even though there are multiple instances of column A still, I could just filter all unique instances of the table later. My concern is how to concatenate column B with a "|" in the middle if column A has multiple instances.
This is just a segment of my data (There is a lot more), so I would be very thankful for your help.
The pseudo code in my head so far is:
If( Column A has more than one instance)
Then Concatenate(Column B with "#"+ "|" +"#")
I'm also wondering if there is a way to do this on access with grouping.
Well Anyways, PLEASE HELP.
In excel we can achieve it easily by custom function in vba module. Hopefully using vba(Macros) is not an issue for you.
Here is the code for the function which can be added in vba. (Press Alt+F11, this will take you to visual editor, right click the project and add a module. Add the below code in module)
Public Function ConcatenatePipe(ByVal lst As Range, ByVal values As Range, ByVal name As Range) As String
ConcatenatePipe = ""
Dim i As Integer
For i = 1 To lst.Count
If name.Value = lst.Item(i).Value Then ConcatenatePipe = ConcatenatePipe & "|" & values.Item(i).Value
Next
ConcatenatePipe = Mid(ConcatenatePipe, 2)
End Function
This function you can use in excel in F Column of your example. Copy the below formulla in F2 and the copy paste the cell to rest of F column. =ConcatenatePipe($A$2:$A$20,$B$2:$B$20,E2)
I believe you can solve this with an SQL GROUP BY function. At least, here's how I'd do it in MySQL or similar:
SELECT userid, GROUP_CONCAT(propertynum SEPARATOR '|') FROM Names GROUP BY userid
as described in this stack overflow post: How to use GROUP BY to concatenate strings in MySQL?
Here's a link on how to use SQL in MS Access: http://www.igetit.net/newsletters/Y03_10/SQLInAccess.aspx
Unfortunately there is not a GROUP_CONCAT function in MSAccess, but this other SO post explains some ways round that: is there a group_concat function in ms-access?

Bulk importing text files / VB2005 / SQL Server 2005

I've inherited a .NET app to support / enhance which reads in a couple of files of high hundreds of thousands of rows, and one of millions of row.
The original developer left me code like :-
For Each ModelListRow As String In ModelListDataArray
If ModelListRow.Trim.Length = 0 Or ModelListRow.Contains(",") = False Then
GoTo SKIP_ROW
End If
Dim ModelInfo = ModelListRow.Split(",")
Dim ModelLocation As String = UCase(ModelInfo(0))
Dim ModelCustomer As String = UCase(ModelInfo(1))
Dim ModelNumber As String = UCase(ModelInfo(2))
If ModelLocation = "LOCATION" Or ModelNumber = "MODEL" Then
GoTo SKIP_ROW
End If
Dim MyDataRow As DataRow = dsModels.Tables(0).NewRow
MyDataRow.Item("location") = ModelLocation.Replace(vbCr, "").Replace(vbLf, "").Replace(vbCrLf, "")
MyDataRow.Item("model") = ModelNumber.Replace(vbCr, "").Replace(vbLf, "").Replace(vbCrLf, "")
dsModels.Tables(0).Rows.Add(MyDataRow)
SKIP_ROW:
Next
and it takes an age (well, nearly half an hour) to import these files.
I suspect there's a MUCH better way to do it. I'm looking for suggestions.
Thanks in advance.
Take a look at BULK INSERT.
http://msdn.microsoft.com/en-us/library/ms188365(v=SQL.90).aspx
Basically you point SQL Server at a text file in CSV format and it does all the logic of pulling the data into a table. If you need to massage it more than that, you can pull the text file into a staging location in SQL Server, and then run a stored proc to massage it into the format you are looking for.
The main options (apart from writing your own code from scratch) are:
BULK INSERT or bcp.exe, which work well if your data is cleanly formatted
SSIS, if you need workflow, data type transformations, data cleansing etc.
.NET SqlBulkCopy API
jkohlhepp's suggestion about pulling data into a staging table then cleaning it is a good one and a very common pattern in ETL processes. But if your "massaging" isn't easy to do in TSQL then you will probably need some .NET code anyway, whether it's in SSIS or in a CLR procedure.
Personally I would use SSIS in your case, because it looks like the data is not cleanly formatted so you will probably need some custom code to clean/re-format the data on its way to the database. However it does depend on what you're most comfortable/productive with and what existing tools and standards you have in place.
Dim ExcelConnection As New System.Data.OleDb.OleDbConnection("Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\MyExcelSpreadsheet.xlsx;Extended Properties=""Excel 12.0 Xml;HDR=Yes""")
ExcelConnection.Open()

Resources