SSIS Combine multiple rows into single row - sql-server

I have a flat file that has 6 columns: NoteID, Sequence, FileNumber, EntryDte, NoteType, and NoteText. The NoteText column has 200 characters and if a note is longer than 200 characters then a second row in the file contains the continuation of the note. It looks something like this:
|NoteID | Sequence | NoteText |
---------------------------------------------
|1234 | 1 | start of note text... |
|1234 | 2 | continue of note.... |
|1234 | 3 | more continuation of first note... |
|1235 | 1 | start of new note.... |
How can I in SSIS combine the multiple rows of NoteText into one row so the row would like this:
| NoteID | Sequence | NoteText |
---------------------------------------------------
|1234 | 1 | start of note text... continue of note... more continuation of first note... |
|1235 | 1 | start of new note.... |
Greatly appreciate any help?
Update: Changing the SynchronousInputID to None exposed the Output0Buffer and I was able to use it. Below is what I have in place now.
Dim NoteID As String = "-1"
Dim NoteString As String = ""
Dim IsFirstRow As Boolean = True
Dim NoteBlob As Byte()
Dim enc As New System.Text.ASCIIEncoding()
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
If Row.NoteID.ToString() = NoteID Then
NoteString += Row.NoteHTML
IsFirstRow = True
Else
If IsFirstRow Then
Output0Buffer.AddRow()
IsFirstRow = False
End If
NoteID = Row.NoteID.ToString()
NoteString = Row.NoteHTML.ToString()
End If
NoteBlob = enc.GetBytes(NoteString)
Output0Buffer.SingleNoteHTML.AddBlobData(NoteBlob)
Output0Buffer.ClaimID = Row.ClaimID
Output0Buffer.UserID = Row.UserID
Output0Buffer.NoteTypeLookupID = Row.NoteTypeLookupID
Output0Buffer.DateCreatedUTC = Row.DateCreated
Output0Buffer.ActivityDateUTC = Row.ActivityDate
Output0Buffer.IsPublic = Row.IsPublic
End Sub
My problem now is that I had to convert the output column from Wstr(4000) to NText because some of the notes are so long. When it imports into my SQL table, it is just jibberish characters and not the actual notes.

In SQL Server Management Studio (using SQL), you could easily combine your NoteText field using stuff function with XML Path to combine your row values to a single column like this:
select distinct
noteid,
min(sequence) over (partition by n.noteid order by n.sequence) as sequence,
stuff((select ' ' + NoteText
from notes n1
where n.noteid = n1.noteid
for xml path ('')
),1,1,'') as NoteText
from notes n;
You will probably want to look into something along the line that does similar thing in SSIS. Check out this link on how to create a script component in SSIS to do something similar: SSIS Script Component - concat rows
SQL Fiddle Demo

Related

How to transform data when we have comma separated values in csv format file in snowflake

I have an excel csv format data set with the following data:
Columns: id, product_name, sales, quantity, Profit
Data: 1, "Novimex Executive Leather Armchair, Black","$3,709.40", 9, -$288.77
When I am trying to insert these records from stage to snowflake table, data is getting shifted from product name column because we have comma separated , Black and similarly for following columns data are getting shifted. After loading the data it is looking like as per below:
+----+-------------------------------------+--------+----------+---------+
| id | product_name | sales | quantity | Profit |
+----+-------------------------------------+--------+----------+---------+
| 1 | "Novimex Executive Leather Armchair | Black" | $3 | 709.40" |
+----+-------------------------------------+--------+----------+---------+
Query used:
copy into orders_staging (id,Product_Name,Sales,Quantity,Profit)
from
(select $1,$2,$3,$4,$5
from #sales_data_stage)
file_format = (type = csv field_delimiter = ',' skip_header = 1 ENCODING = 'iso-8859-1');
Use Field Enclosure.
FIELD_OPTIONALLY_ENCLOSED_BY='"'
If you have any issues with accounting styled numbers, remember to put " " around them too.
https://community.snowflake.com/s/question/0D50Z00008pDcoRSAS/copying-csv-files-delimited-by-commas-where-commas-are-also-enclosed-in-strings
Additional documentation for Copy To
https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html#type-csv
Additional documentation on the Create File
https://docs.snowflake.com/en/sql-reference/sql/create-file-format.html

How can I write a query that returns two different columns for two different conditions?

I have a Microsoft SQL Server table that has three columns in it - one for Locations, one for the associated Images and one indicating whether a given image is meant be used as the main Image for a given location (this is done because historically multiple images were uploaded for each location and this column indiciated which image is the one that actually gets used).
Now we want to be able pick and assign a second image as a logo for our locations, resulting in a fourth column added to indiciate which image becomes that logo.
So I have a table that looks something like this:
+----------+----------+-----------------+-----------+
| filename | location | IsMainImage | IsLogo |
+----------+----------+-----------------+-----------+
| img1 | 10 | True | Null |
| img2 | 10 | Null | True |
| img3 | 10 | Null | Null |
| img4 | 20 | True | Null |
| img5 | 20 | NULL | True |
+----------+----------+-----------------+-----------+
My goal is to write a query that would return both img1 and img2 as different columns within the same row in my query, followed by img3 and img4 another row. From the table above I need the output to look like this:
+-----------+-------------+
| filename1 | filename2 |
+-----------+-------------+
| img1 | img2 |
| img4 | img5 |
+-----------+-------------+
Please note that my description is an oversimplication. I am modifying and SSIS package that is consumed by another proces that I cannot modify. This is the reason why I need the output in this format.
What became Filename1 and Filename2 used to be the same file (logo was a resized version of the main image) and now I need to differentiate between the two.
It is crucial that only the columns flagged as IsMainImage show under filename1 and only the columns flagged under IsLogo show under filename2.
I would appreciate any help with this. Thank you!
Using Case expression:
SELECT max(CASE WHEN [IsMainImage] = 'True' then filename end) as filename1,
max(CASE WHEN [IsLogo] = 'True' then filename end) as filename2
from Table_testcase group by location;
You can join the table with itself, something like this:
SELECT im1.filename as filename1,im2.filename as filename2
FROM images im1
JOIN images im2
ON im1.location = im2.location
AND im2.IsLogo = 1
WHERE im1.IsMainImage = 1

How can I update many rows with different values for the same column?

I have a table with a column containing a path to a file. The path is an absolute path, and values for this column look like this: C:\CI\Media\animal.jpg.
The table looks like so, except there are many rows so editing by hand is not practical:
`+----+-----------------------------------+
| ID | Path |
+----+-----------------------------------+
| 1 | C:\CI\Media\sushi.jpg |
| 2 | C:\CI\Media\animal.jpg |
| 3 | C:\CI\Media\Tuscany Trip\pisa.png |
+----+-----------------------------------+`
Path is an nvarchar(260)
And what'd I'd like to do is run a query that will update each record so the path for each record replaces C:\CI\ with C:\CI\Net, and end up with a table that looks like so:
`+----+---------------------------------------+
| ID | Path |
+----+---------------------------------------+
| 1 | C:\CI\Net\Media\sushi.jpg |
| 2 | C:\CI\Net\Media\animal.jpg |
| 3 | C:\CI\Net\Media\Tuscany Trip\pisa.png |
+----+---------------------------------------+`
Is there a way to format a query that will update every record, but update it based on the existing value (replace the C:\CI portion with C:\CI\Net for each record while maintaining the rest of the the value) instead of setting each column to the same value like a normal Update table set column = value ?
Gosh you almost wrote the code yourself.
Update YourTable
set path = replace(path, 'C:\CI', 'C:\CI\Net')

Usage of " " inside a concat statement in excel

I'm working on data cleansing of a database and I'm currently in the process of changing the upper case names into proper case. Hence, I'm using excel to have an update statement like this:
A | B | C | D |
| 1 | Name | id | Proper case name| SQL Statement |
|-----|------|-----|-----------------|---------------|
| 2 | AAAA | 1 |Aaaa |=CONCAT("UPDATE table SET Name = "'",C2,"'" WHERE id = ",B2,";") |
|-----|------|-----|-----------------|---------------|
| 3 | BBBB | 2 |Bbbb |=CONCAT("UPDATE table SET Name = "'",C3,"'" WHERE id = ",B3,";")|
The SQL state should be something like this:
UPDATE table SET Name = 'Aaaa' WHERE id = 1
UPDATE table SET Name = 'Bbbb' WHERE id = 2
I'm finding it difficult to get apostrophe around the name.
I think you need:
=CONCATENATE("UPDATE table SET Name = '",C2,"' WHERE id = ",B2,";")

Loops through Access Table and for each Column with Data

I think this should be simple, but I can't find the right way to do it. I have a table with an ID number column, and 10 rows following it labeled Question #1, Question #2, and so forth.
There are no duplicate ID numbers, but each ID number could have more than one row of questions.
I would like to take the ID row and for each different question where applicable create a new row with the same ID. So if an ID number has a question listed under the Question #1 and Question #2, Id like to create a duplicate for that ID number and have have both questions listed under one column Lets call it "Total Questions", and grouped by that ID number. This can be done by creating a new table.
Example:
From:
+-------+---------------------------+---------------------------+
| ID | Question #1 | Question #2 |
+-------+---------------------------+---------------------------+
| 11111 | Was it notated correctly? | Was it completed on time? |
+-------+---------------------------+---------------------------+
To:
+-------+-------------------------------------+
| ID | Total Questions |
+-------+-------------------------------------+
| 11111 | Was it notated correctly? |
| 11111 | Was it completed on time? |
+-------+-------------------------------------+
A simple solution using DAO
sub SomeProcedure()
Dim db as DAO.Database, recIn as DAO.Recordset, recOut as DAO.Recordset
Set db = currentdb()
Set recIn = db.openRecordset("yourQuestionsInputTable", dbOpenDynaset, dbReadOnly)
Set recOut = db.openRecordset("yourQuestionsOutputTable", dbOpenDynaset, dbEditAdd)
with recIn
.moveFirst
do
for i = 1 to .Fields.count
if left(.Fields(i).Name, 8) = "Question" then
recOut.addNew
recOut.Fields("Id") = .fields("Id")
recOut.Fields("Total Questions") = .Fields(i)
recOut.update
end if
next i
.moveNext
loop until .EOF
end with
recIn.close
recOut.close
db.close
end sub
The explanation:
What I'm doing is:
Read each record from the input table
For each column wich name begins with "Question", create a new record in the output table, with the Id of the input table, and the value of the selected column.
This is just a draft. You'll need to tweak the code to fit your needs.
Hope this helps.
Alternatives
After thinking a little, I may have an alternative to the problem you mention in your comments.
I think you can change the loop like this:
' You'll need a variable of type Field
Dim f as DAO.Field ' Check if this is right
' Some code
with recIn
.moveFirst
do
for f in .Fields
if left(f.Name, 8) = "Question" then
recOut.addNew
recOut.Fields("Id") = .Fields("Id").Value
recOut.Fields("Total Questions") = .Fields(f.Name).Value
recOut.update
end if
next f
.moveNext
loop until .EOF
end with
' More code
Instead of iterating on the Fields collection with an index, this will iterate with any Field member in it. That should avoid the "Item not found in collection" issue.
Warning: Not tested
Try a couple queries like this:
SELECT ID, Question1 AS TotalQuestions
INTO NewTable
FROM OriginalTable;
SELECT ID, Question2 AS TotalQuestions
INTO NewTable
FROM OriginalTable;

Resources