Combine 2 rows of CSV file in SSIS - sql-server

I have one CSV file where the information is spread on two lines
Line 1 contains Name and age
Line 2 contains detail like address, city, salary, occupation
I want to combine 2 rows to insert it in a database.
CSV file :
Raju, 42
12345 west andheri,Mumbai, 100000, service
In SQL Server I can do by using cursor. But I have to do in SSIS.

For a similar case, i will read each line as one column and use a script component to fix the structure. You can follow my answer on the following question. It contains a step-by-step guide:
SSIS reading LF as terminator when its set as CRLF

I like using a script component in order to be able to store data from a different row in this case.
Read the file as a single column CSV into Column1.
Add script component and add a new Output called CorrectedOutput and define all columns from both rows. Also, mark Column1 as read.
Create 2 variables outside of row processing to 'hold' first row
string name = string.Empty;
string Age = string.Empty;
Use a split to determine line 1 or line 2
string[] str = Row.Column1.Split(',');
Use an if to determine row 1 or 2
if(str.Length == 2)
{
name = str[0];
age=str[1];}
else
{
CorrectedOutputBuffer.AddRow();
CorrectedOutputBuffer.Name = name; //This uses the stored value from prior row
CorrectedOutputBuffer.Age = age; //This uses the stored value from prior row
CorrectedOutputBuffer.Address = str[0];
CorrectedOutputBuffer.City = str[1];
CorrectedOutputBuffer.Salary = str[2];
CorrectedOutputBuffer.Occupation = str[3];
}
The overall effect is this...
On Row 1, you just hold the data in variables
On Row 2, you write out the data to 1 new row.

Related

Csv file to a Lua table and access the lines as new table or function()

Currently my code have simple tables containing the data needed for each object like this:
infantry = {class = "army", type = "human", power = 2}
cavalry = {class = "panzer", type = "motorized", power = 12}
battleship = {class = "navy", type = "motorized", power = 256}
I use the tables names as identifiers in various functions to have their values processed one by one as a function that is simply called to have access to the values.
Now I want to have this data stored in a spreadsheet (csv file) instead that looks something like this:
Name class type power
Infantry army human 2
Cavalry panzer motorized 12
Battleship navy motorized 256
The spreadsheet will not have more than 50 lines and I want to be able to increase columns in the future.
Tried a couple approaches from similar situation I found here but due to lacking skills I failed to access any values from the nested table. I think this is because I don't fully understand how the tables structure are after reading each line from the csv file to the table and therefore fail to print any values at all.
If there is a way to get the name,class,type,power from the table and use that line just as my old simple tables, I would appreciate having a educational example presented. Another approach could be to declare new tables from the csv that behaves exactly like my old simple tables, line by line from the csv file. I don't know if this is doable.
Using Lua 5.1
You can read the csv file in as a string . i will use a multi line string here to represent the csv.
gmatch with pattern [^\n]+ will return each row of the csv.
gmatch with pattern [^,]+ will return the value of each column from our given row.
if more rows or columns are added or if the columns are moved around we will still reliably convert then information as long as the first row has the header information.
The only column that can not move is the first one the Name column if that is moved it will change the key used to store the row in to the table.
Using gmatch and 2 patterns, [^,]+ and [^\n]+, you can separate the string into each row and column of the csv. Comments in the following code:
local csv = [[
Name,class,type,power
Infantry,army,human,2
Cavalry,panzer,motorized,12
Battleship,navy,motorized,256
]]
local items = {} -- Store our values here
local headers = {} --
local first = true
for line in csv:gmatch("[^\n]+") do
if first then -- this is to handle the first line and capture our headers.
local count = 1
for header in line:gmatch("[^,]+") do
headers[count] = header
count = count + 1
end
first = false -- set first to false to switch off the header block
else
local name
local i = 2 -- We start at 2 because we wont be increment for the header
for field in line:gmatch("[^,]+") do
name = name or field -- check if we know the name of our row
if items[name] then -- if the name is already in the items table then this is a field
items[name][headers[i]] = field -- assign our value at the header in the table with the given name.
i = i + 1
else -- if the name is not in the table we create a new index for it
items[name] = {}
end
end
end
end
Here is how you can load a csv using the I/O library:
-- Example of how to load the csv.
path = "some\\path\\to\\file.csv"
local f = assert(io.open(path))
local csv = f:read("*all")
f:close()
Alternative you can use io.lines(path) which would take the place of csv:gmatch("[^\n]+") in the for loop sections as well.
Here is an example of using the resulting table:
-- print table out
print("items = {")
for name, item in pairs(items) do
print(" " .. name .. " = { ")
for field, value in pairs(item) do
print(" " .. field .. " = ".. value .. ",")
end
print(" },")
end
print("}")
The output:
items = {
Infantry = {
type = human,
class = army,
power = 2,
},
Battleship = {
type = motorized,
class = navy,
power = 256,
},
Cavalry = {
type = motorized,
class = panzer,
power = 12,
},
}

Flink : How to save list of rows in Database

Right now I am reading rows from a file and saving in database using the below code:
String strQuery = "INSERT INTO public.alarm (id, name, marks) VALUES (?, ?, ?)";
JDBCOutputFormat jdbcOutput = JDBCOutputFormat.buildJDBCOutputFormat()
.setDrivername("org.postgresql.Driver")
.setDBUrl("jdbc:postgresql://localhost:5432/postgres?user=michel&password=polnareff")
.setQuery(strQuery)
.setSqlTypes(new int[] { Types.INTEGER, Types.VARCHAR, Types.INTEGER}) //set the types
.finish();
DataStream<Row> rows = FilterStream
.map((tuple)-> {
Row row = new Row(3);
row.setField(0, tuple.f0);
row.setField(1, tuple.f1);
row.setField(2, tuple.f2);
return row;
});
rows.writeUsingOutputFormat(jdbcOutput);
env.execute();
}
}
The above is working fine and it picks rows one by one from a file and saves it in the database.
For example:
If the file contains:
1, mark, 20
then database entry will look like:
id name marks
------------------
1 mark 20
Now the requirement is for every row, I have to create 2 different rows and it should look like below:
For example:
If the file contains:
1, mark, 20
then database entry should look like this:
id name marks
------------------
1 mark-1 20
1 mark-2 20
Now I should return List instead of row and datastream variable should look like DataStream<List<Row>> rows.
What should I change in JDBCOutputFormat variable in order to achieve this?

how to copy excel column data to another worksheet and do text-to-columns programatically

I have a working copy of an application that will open workbooks/sheets and copy data succesffully between the two then saves but I need to parse some data as I copy it into another cell.
I was thinking..
~ create array
~ get all values in xlSourceFile.worksheets("sheet1") and store into an array
~ parse through the array extracting the data I need (text-to-column programatically)
~ write the array data to two specific columns in excel worksheet
the data I am trying to parse is Firstname / Lastname - Email and I want this as a result:
Joe Shmoe to go into one column // Joe Shmoes Email to another column.
I am writing this in vb.net using the imports Microsoft.Office.Interop to manipulate Excel.
Excuse the formatting, I'm new to SO. This is VBA but I believe the general logic will work. It assumes that the email address has no space padding after it. It searches backward on the raw combined string for the first blank space and flags that as the start of the email address (end of the name).
It loops out when the next cell is empty.
The data is assumed to look like this:
"First Name Last Name myaddress#example.com"
For Each cell In Worksheets("Sheet1").Range("A:A")
i = i + 1
If cell = "" Then GoTo loopout
rawstring = cell.Value
'rawString = "First Name Last Name myaddress#example.com"
emailStartPosition = InStrRev(rawstring, " ")
myname = Left(rawstring, emailStartPosition)
myemail = Right(rawstring, Len(rawstring) - emailStartPosition)
Worksheets("Sheet1").Range("B" & i).Value = myname
Worksheets("Sheet1").Range("C" & i).Value = myemail
Next
loopout:
End Sub
Column B will have the name and Column C will have the email address.

Splitting column with XML data

I have a SQL column named "details" and it contains the following data:
<changes><RoundID><new>8394</new></RoundID><RoundLeg><new>JAYS CLOSE AL6 Odds(1 - 5)</new></RoundLeg><SortType><new>1</new></SortType><SortOrder><new>230</new></SortOrder><StartDate><new>01/01/2009</new></StartDate><EndDate><new>01/01/2021</new></EndDate><RoundLegTypeID><new>1</new></RoundLegTypeID></changes>
<changes><RoundID><new>8404</new></RoundID><RoundLeg><new>HOLLY AREA AL6 (1 - 9)</new></RoundLeg><SortType><new>1</new></SortType><SortOrder><new>730</new></SortOrder><StartDate><new>01/01/2009</new></StartDate><EndDate><new>01/01/2021</new></EndDate><RoundLegTypeID><new>1</new></RoundLegTypeID></changes>
<changes><RoundID><new>8379</new></RoundID><RoundLeg><new>PRI PARK AL6 (1 - 42)</new></RoundLeg><SortType><new>1</new></SortType><SortOrder><new>300</new></SortOrder><StartDate><new>01/01/2009</new></StartDate><EndDate><new>01/01/2021</new></EndDate><RoundLegTypeID><new>1</new></RoundLegTypeID></changes>
What is the easiest way to separate this data out into individual columns? (that is all one column)
Try this:
SELECT DATA.query('/changes/RoundID/new/text()') AS RoundID
,DATA.query('/changes/RoundLeg/new/text()') AS RoundLeg
,DATA.query('/changes/SortType/new/text()') AS SortType
-- And so on and so forth
FROM (SELECT CONVERT(XML, Details) AS DATA
FROM YourTable) AS T
Once you get your result set from the sql (mysql or whatever) you will probably have an array of strings. As I understand your question, you wanted to know how to extract each of the xml nodes that were contained in the string that was stored in the column in question. You could loop through the results from the sql query and extract the data that you want. In php it would look like this:
// Set a counter variable for the first dimension of the array, this will
// number the result sets. So for each row in the table you will have a
// number identifier in the corresponding array.
$i = 0;
$output = array();
foreach($results as $result) {
$xml = simplexml_load_string($result);
// Here use simpleXML to extract the node data, just by using the names of the
// XML Nodes, and give it the same name in the array's second dimension.
$output[$i]['RoundID'] = $xml->RoundID->new;
$output[$i]['RoudLeg'] = $xml->RoundLeg->new;
// Simply create more array items here for each of the elements you want
$i++;
}
foreach ($output as $out) {
// Step through the created array do what you like with it.
echo $out['RoundID']."\n";
var_dump($out);
}

String or binary data would be truncated.\r\nThe statement has been terminated. while xml insertion

Can you please let me know how do I resolve this problem while inserting xml data into Sql Server 2008
ex = {"String or binary data would be truncated.\r\nThe statement has been terminated."}
I already replaced ', "" with an empty string
thanks in Advance
Please check the datetype of the column. Make sure it has enough space.
Check the Database Column Size and Data length.
you will get this error when you are storing data of length greater than that of defined size.
Ex. string Data = "saving this data causes error.";
int datalength = Data.Length; // will be 30 characters , greater than defined size
Customer.Name = Data;
//you will get error for this line if you have Name (varchar 25, null) in db .
If you're using a subquery change the = between the sub query and main query to in.
It may be that you have more than one field for each record of the main query.

Resources