ssis unicode flatfile processing by script component - sql-server
I have an awkward flat file input that can be virtually any length. This is a comma delimited file, but has embedded tables delimited by "[{" and "}]" or "{" and "}" .. depending on the table type. I cannot use the off the shelf SSIS comma delimited flat file as there may be records with no embedded tables at all.
To get around this I've set the flat file input to be ragged right and with one column of 8,000 characters.
I've then done the string splitting in a script component and output the table data to separate output streams.
However, I am now receiving files that exceed 8000 characters which has broken my process.
I've tried converting the flat file from "1252 (ANSI Latin 1)" into unicode with the column in NTEXT.
I've then inserted the following code to convert this to a string
See http://www.bimonkey.com/2010/09/convert-text-stream-to-string/
Dim TextStream As Byte() ' To hold Text Stream
Dim TextStreamAsString As String ' To Hold Text Stream converted to String
' Load Text Stream into variable
TextStream = Row.CopyofColumn0.GetBlobData(0, CInt(Row.CopyofColumn0.Length))
' Convert Text Stream to string
TextStreamAsString = System.Text.Encoding.Unicode.GetString(TextStream)
But when I look at the string I get appear to get a lot of kanji type characters and no line feeds.
Any ideas what I can try next?
As I found it difficult finding an exact match to using the filesystemobject in an SSIS vb.net script component source transformation, I thought I'd share my findings!
The following imports are required
Imports System.IO
Imports System.Text
and the code ..
Public Overrides Sub CreateNewOutputRows()
<Output Name>Buffer".
Dim strFilePath As String
Dim strFileContent As String
Dim objFileInfo As FileInfo
Dim objStreamReader As StreamReader
Try
strFilePath = "c:\myfile.csv" 'Me.Variables.FullFilePath
objFileInfo = New FileInfo(strFilePath)
objStreamReader = New StreamReader(strFilePath)
Do Until objStreamReader.EndOfStream
strFileContent = objStreamReader.ReadLine
Process_data(strFileContent) ' do the work in this a sub!
Loop
Catch ex As Exception
MessageBox.Show(ex.Message.ToString(), "Error", MessageBoxButtons.OK)
End Try
End Sub
Note: I use a foreach loop to obtain the filename in my script. The hard coded filepath here is just as an example.
Instead of using a flat file source, you could just use a script source component that opens the file with a file system object.
Related
Using VB6 to update info in database
I'm being taught VB6 by a co-worker who gives me assignments every week. I think this time he's overestimated my skills. I'm supposed to find a line in a text file that contains Brand IDs and their respective brand name. Once I find the line, I'm to split it into variables and use that info to create a program that, via an inserted SQL statement, finds the brand, and replaces the "BrandName" in the item description with the "NewBrandname". Here's what I'm working with Dim ff as integer ff = freefile Open "\\tsclient\c\temp\BrandNames.txt" For Input as #ff Do Until EOF(ff) Dim fileline as string,linefields() as string line input #ff, fileline linefields = split(fileline,",") brandID = linefields(0) BrandName = linefields(1) NewBrandName = linefields(2) I want to use the following line in the text file, since It's the brand I'm working with: BrandID =CHEFJ, BrandName=Chef Jay's NewBrandName=Chef Jays That's what 'fileline' is- just don't know how to select just that one line As for updating the info, here's what I've got: dim rs as ADODB.Recordset, newDesc1 as String rs = hgSelect("select desc1 from thprod where brandID='CHEFJ'") do while not rs.eof if left(rs!desc1,len(BrandName)) = BrandName then dim newDesc1 as string newDesc1 = NewBrandname & mid(rs!desc1, len(BrandName)+1) hgExec "update thprod set desc1=" & adoquote(NewBrandName) & "+right(desc1,len(BrandName))" where brandId=CHEFJ and desc1 like 'BrandName%'" end if rs.movenext loop end while How do I put this all together?
Just to give you some guidelines; Firstly you need to read the Text file, which you are already doing. Then, once you get the data, you need to spot the format and SPLIT the data to retrieve only the parts you need. For example, if the data read from textfile gives you BrandID=CHEFJ, BrandName=Chef Jay's, NewBrandName=Chef Jays, you will see that the data are delimited by commas ,, and the property values are preceded by equal signs. Follow LINK for more info of how to split. Once you've split the data, you can easily use them to proceed with your database update. To update your db, first of all you will need to create the connection. Then your query to update using the data you've fetched from the Text file. Finally you need to execute your query using ADODB. This EXAMPLE can help. Do not forget to dispose the objects used, including your connection. Hope it helps.
Text File to Array in Access VBA
I am looking to load a .txt file into a VBA array (in an access VBA), manipulate it there, and then paste the manipulated array into an access table. I will loop this macro then through round about 500 .txt files and populate the database with years of data. I have done it before using Excel in this way:Populating the sheet for 1 .txt file, manipulating the data, loading into the database, clearing the sheet, and loading the next txt file and loop till all files have been processed. However, this takes years and it becomes obvious that Excel is not really needed, since everything is stored in Access anyway. Since then, I have tried to figure out how to do it in access straight away, without using excel, but I have been struggling. I found some nice code on this board here: Load csv file into a VBA array rather than Excel Sheet and tried to amend it so it would work for me. Especially The_Barman's Answer at the bottom seems simple and very interesting. I am new to arrays, VBA, SQL, basically everything, so maybe there is some big mistake I am making here that is easy to resolve. See my code below: Sub LoadCSVtoArray() Dim strfile As String Dim strcon As String Dim cn As ADODB.Connection Dim rs As Recordset Dim rsARR() As Variant Set cn = New ADODB.Connection strcon = "Provider=Microsoft.JET.OLEDB.4.0;Data Source=" & "\\filename.txt" & ";Extended Properties=""text;HDR=Yes;FMT=Delimited"";" cn.Open strcon strSQL = "SELECT * filename.txt;" Set rs = cn.Execute(strSQL) rsARR = WorksheetFunction.Transpose(rs.GetRows) rs.Close Set cn = Nothing [a1].Resize(UBound(rsARR), UBound(Application.Transpose(rsARR))) = rsARR End Sub I dont even know if the bottom part of the code works, because an error message pops up that the file is not found. The interesting thing is, if I debug and copy the value of strcon into windows "run", it opens the correct file. So I guess the path is correct? Can I even open a .txt file through an ADODB connection? Right now I am a bit confused if this is possible and if it is the best solution. Some more background regarding the text files I am trying to save into the array: -They are output from another program, so it is always the same structure and very oreganized it comes in this format: Column a Column b .... data 1 data 1 data 2 Data 2 ... and so on. If possible, I would like to retain this structure, and even safe it as a table with the first row as column headers.
The Data Source is the folder path containing the file and the file is the table (SELECT * FROM ..). Replace "\\filename.txt" in strcon with the folder path. http://www.connectionstrings.com/textfile/
Bulk Insert from CSV file to MS SQL Database
I have this working script that I use to BULK INSERT A CSV FILE. The code is: ' OPEN DATABASE dim objConn,strQuery,objBULK,strConnection set objConn = Server.CreateObject("ADODB.Connection") objConn.ConnectionString = "Driver={SQL Server Native Client 11.0};Server=DemoSrvCld;Trusted_Connection=no;UID=dcdcdcdcd;PWD=blabla;database=demotestTST;" objConn.Open strConnection set objBULK = Server.CreateObject("ADODB.Recordset") set objBULK.ActiveConnection = objConn dim strAPPPATH strAPPPATH="C:\DEMO001Test.CSV" strQuery = "BULK INSERT EFS_OlderStyle FROM '" & strAPPPATH & "' WITH (firstrow=1, FIELDTERMINATOR=',', ROWTERMINATOR='\n')" Set objBULK= objConn.Execute(strQuery) objConn.Close HERE IS A EXAMPLE OF THE .CSV FILE: Date,Time,Card Number,Driver Id,Driver Name,Unit No,Sub-Fleet,Hub Miles,Odo Miles,Trip No,Invoice,T/S Code,In Dir,T/S Name,T/S City,ST,Total Inv,Fee,PPU,Fuel_UOM,Fuel_CUR,RFuel_UOM,RFuel_CUR,Oil_CUR,Add_CUR,Cash Adv,Tax,Amt Billed,Svc Bill,Chain,Ambest,MPU 10/08/13,03:20,70113531460800693,,,2100,,,,,0454591156,546200,Y,PILOT QUARTZSITE 328,QUARTZSITE,AZ,742.30,1.00,3.749,149.000,558.60,49.00,183.70,0.00,0.00,0.00,0.00,743.30,S, ,N,0.0 10/08/13,07:03,70110535170800735,,,6210,,,,,343723,512227,Y,PETRO WHEELER RIDGE,LEBEC,CA,678.78,1.00,4.169,139.140,580.08,23.68,98.70,0.00,0.00,0.00,0.00,679.78,S, ,N,0.0 But the .CSV FILE I HAVE NOW IS DIFFRENT then the one above. HERE IS A EXAMPLE OF THE CURRENT .CSV FILE: "BRANCH","CARD","BILL_TYPE","AUTH_CODE","INVOICE","UNIT","EMP_NUM","TRIP","TRAILER","HUB/SPEED","VEH_LICENSE","DRIVER","DATE","TIME","CHAIN","IN_NETWORK","TS#","TS_NAME","TS_CITY","TS_STATE","PPG","NET_PPG","FUEL_GALS","FUEL_AMT","RFR_GALS","RFR_AMT","CASH","MISC","INV_TOTAL","FEE","DISC","INV_BALANCE",1.00,1.00,"E","004ACS","02812","365","-","-","0",0.00,"-","JOHN S ",11/4/2013,"16:18:49E","IC","N",3257.00,"IRVING HOULTON","HOULTON","ME",3.95,3.95,121.57,480.08,0.00,0.00,0.00,0.00,480.08,1.50,0.00,481.58 "BRANCH","CARD","BILL_TYPE","AUTH_CODE","INVOICE","UNIT","EMP_NUM","TRIP","TRAILER","HUB/SPEED","VEH_LICENSE","DRIVER","DATE","TIME","CHAIN","IN_NETWORK","TS#","TS_NAME","TS_CITY","TS_STATE","PPG","NET_PPG","FUEL_GALS","FUEL_AMT","RFR_GALS","RFR_AMT","CASH","MISC","INV_TOTAL","FEE","DISC","INV_BALANCE",1.00,2.00,"E","014ACI","976234","430","-","-","0",0.00,"-","STACY ",11/4/2013,"00:21:16E","F","Y",8796.00,"PILOT 405","TIFTON","GA",3.77,3.77,172.65,650.73,0.00,0.00,0.00,0.00,650.73,1.50,0.00,652.23 I have edited the ms sql database fields to reflect the new .csv fields but the old and new .csv files do not store the info. in the same way. How do I fix this so that it works ? I was thinking to first remove all of the " and then to remove all but one "BRANCH","CARD","BILL_TYPE","AUTH_CODE","INVOICE","UNIT","EMP_NUM","TRIP","TRAILER","HUB/SPEED","VEH_LICENSE","DRIVER","DATE","TIME","CHAIN","IN_NETWORK","TS#","TS_NAME","TS_CITY","TS_STATE","PPG","NET_PPG","FUEL_GALS","FUEL_AMT","RFR_GALS","RFR_AMT","CASH","MISC","INV_TOTAL","FEE","DISC","INV_BALANCE", then save the .csv file and then reopen it. But I think/hope there is another way ? Please help... Thank you.
Sure: there are a few ways you can work around this problem. The right solution for you will depend on the time and energy you have to dedicate to this problem, as well as whether this is a one time import or a process you want to streamline. A few solutions: 1. Change the formating of your CSV file to resemble the old version. this can be done realtively easily: Download a text editor like notepad++ Open your CSV file in this editor Do a find/Replace operation for: "BRANCH","CARD","BILL_TYPE","AUTH_CODE","INVOICE","UNIT","EMP_NUM","TRIP","TRAILER","HUB/SPEED","VEH_LICENSE","DRIVER","DATE","TIME","CHAIN","IN_NETWORK","TS#","TS_NAME","TS_CITY","TS_STATE","PPG","NET_PPG","FUEL_GALS","FUEL_AMT","RFR_GALS","RFR_AMT","CASH","MISC","INV_TOTAL","FEE","DISC","INV_BALANCE" replace with: "" Finally add the line above as your header- quickly reformatting your new file in the same format as your old file. Note: this may be the best option if you have a one time import. 2. Make the above changes in your code programatically Since the beginning of each line contains the fields you wish to ignore, you can easily truncate each line based on the number of characters. The String.Replace function can be used to replace the initial (ignorable) part of the line with String.Empty before it is inserted into the DB.
How to convert data in excel spreasheet forms into format for upload into a database
I need a good way to convert input data from 900 spreadsheets into a format suitable for upload to a relational database (XML or flat file/s). The spreadsheets are multi-sheet, multi-line Excel 2007 each one consisting of 7 forms (so its definitely not a simple grid). There will be no formula data to get, just text, dates, integer data. The 900 spreadsheets are all in the same format. I will need some kind of scripted solution. I'm expecting I should be able to do this with excel macros (and I expect a fancy scriptable editor could do it too) or possibly SSIS. Can someone tell me how you would approach this if it was yours to do? Can anyone give a link to some technical info on a good way to do this? I'm new to excel macros but used to programming and scripting languages, sql, others. Why? We're using spreadsheet forms as an interim solution and I then need to get the data into the database.
You probably want to write data out to a plain text file. Use the CreateTextFile method of FileSystemObject. Documentation here: http://msdn.microsoft.com/en-us/library/aa265018(v=vs.60).aspx There are many examples on the web of how to iterate over worksheets, capture the data and then use WriteLine method. Sub ExampleTextFile() Dim fso as Object Dim oFile as Object Dim fullExport as String 'Script that will capture data from worksheet belongs here _ ' use the fullExport string variable to hold this data, for now we will _ ' just create a dummy string for illustration purposes fullExport = "Example string contents that will be inserted in to my text file!" Set fso = CreateObject("Scripting.FileSystemObject") 'In the next line, replace "C:\filename.txt" with the specified file you want to create set oFile = fso.CreateTextFile("C:\filename.txt", Overwrite:=True, unicode:=True) oFile.WriteLine(fullExport) '<-- inserts the captured string to your new TXT file oFile.Close Set fso = Nothing Set oFile = Nothing End Sub If you have character encoding issues (I recently ran in to a problem with UTF16LE vs. UTF8 encoding, you will need to use the ADODB.Stream object, but that will require a different method of writing the file.
How to split the data in this file vb6
I have this file. It stores a names, a project, the week that they are storing the data for and hours spent on project. here is an example "James","Project5","15/05/2010","3" "Matt","Project1","01/05/2010","5" "Ellie","Project5","24/04/2010","1" "Ellie","Project2","10/05/2010","3" "Matt","Project3","03/05/2010","4" I need to print it on the form without quotes. There it should only show the name once and then just display projects under the name. I've looked tihs up and the split function seems interesting any help would be good.
Create a Dictionary object and then put everything you find for a given name into one dictionary entry. Then in a second iteration print all that out.
Microsoft has a CSV ADO provider. I think it is installed along with the rest of ADO. This is exactly the format it was designed to read. See http://www.vb-helper.com/howto_ado_load_csv.html for a VB sample.
Do I understand you correctly in that you want to keep track of the names entered and thus re-order the data that way? Why not just read the data into a list of some new type that has the name, project, and other information and then sort that before printing it? While the Dictionary solution is simpler, this may be a better solution if you are OK with building a class and implementing the IComparer so that you could sort the list to get this done pretty easily.
You could read each line, strip out the quotes, split on the comma, then process the array of data you would be left with: Dim filenum As Integer Dim inputLine As String Dim data() As String filenum = FreeFile Open "U:\test.txt" For Input As #filenum Do While Not EOF(filenum) Line Input #filenum, inputLine inputLine = Replace(inputLine, Chr(34), vbNullString) data = Split(inputLine, ",") Debug.Print data(0), data(1), data(2), data(3) Loop Close #filenum Or you could have the Input command strip the quotes, and read the data into variables: Dim filenum As Integer Dim name As String, project As String, dat As String, hours As String filenum = FreeFile Open "U:\test.txt" For Input As #filenum Do While Not EOF(filenum) Input #filenum, name, project, dat, hours Debug.Print name, project, dat, hours Loop Close #filenum