Loading Flat File into SQL Server using SSIS - sql-server

New to SSIS and am trying to import a flat file into my DB. There are 6 different rows on the flat file that I need to combine into one row in the database, each of these rows contain a different price for one symbol. For example below:
IGBGK 21 w 47
IGBGK 21 u 2.9150
IGBGK 21 h 2.9300
IGBGK 21 l 2.9050
IGBGK 22 h 2.9300
IGBGK 22 l 2.8800
So each of these are in a different rows on the flat file but will become one row in different columns for symbol IGBGK. I can transform the data to place each number into its own column but can not get them to combine into one row.
Any help on the direction I need to go with this is greatly appreciated.
End product should look like:
Symbol | col 1 | col 2 | col 3 | col 4 | col 5 | col 6
-------+-------+-------+-------+-------+-------+-------
IGBGK | 47 | 2.915 | 29.30 | 2.905 | 2.930 | 2.880

1.Name a variable with whatever name you want with system object type
2.Use execute sql task
Query for you table:
WIth ABC
as
(Select * From table --which give you the original result
)
Select * From ABC
PIVOT (Count(**4th Column Name**) for **1st Column Name** IN ([col 1],[col 2],[col 3],[col 4],[col 5],[col 6]))
4.copy all the complete query into that task and specify the result Set to Full result
5.Switch to Result Set page, choose the variable you create, and set the result name to 0
6.Now every time you run the package the variable will be assigned as the complete result table as shown in your desired format above.
7.And specify another 7 variables corresponding to each column, "symbol, [col 1]...", should be string data type for each variable
Use another execute sql task, specify Variable in SQL Source Type, then go to the Parameter Mapping page, choose that System Object variable, set Name to 0, after that go to Result set page, choose all those seven parameters one by one, and change the parameter name to 0,1,2,3,4,5,6
From now on every time you run the package, each variable would be assigned each value, if you want to load them into target table, here comes the last step
Use another Execute SQL Task, using query like this:
Insert into table
select ?,?,?,?,?,?,?
go to the Parameter Mapping page, choose all those seven variables and change name to 0,1,2,3,4,5,6 for each one by one to map the ?
There could be some small issue you need to figure by yourself, like the data type, but the logic is almost like this.
Hope this helps!

Related

Split data from strings into columns

I have a column with a long string. The data needs split into columns and there are variable lengths of strings with not always the same amount of columns. Not exactly sure how to do this so was looking for some advice here.
Lets say I have this string:
VS5~MedCond1~35.4|VS4~MedCond2~16|VS1~MedCond3~155|VS2~MedCond4~70|SPO2~MedCond5~100|VS3~MedCond6~64|FiO2~MedCond7~21|MAP~MedCond8~98|
And in some cases the string might not have all the medical conditions just some of them.
I need to split into columns where the column name is in between the tilds i.e. MedCond1 and the value would be the value to the right of the tild but before the pipe and end up like this:
MedCond1 MedCond2 MedCond3 MedCond4 MedCond5 MedCond6 MedCond7 MedCond8
======== ======== ======== ======== ======== ======== ======== ========
35.1 24 110 64 100 88 21 79
I need to do this for a lot of rows within a large table and as I said not all the columns are always present but they will not be different names, you might have med cond 1- 8, then in another set have med cond 3, 4, 7.
Here is a query I created that is kind of what I want but not dynamic so it is picking up the values with some extra bits of the string
select MainCol, case when charindex('MedCond1', MainCol) > 0 then
substring(MainCol, charindex('MedCond1', MainCol) + 9, 4) end as [MedCond1]
from MedTable
Will return
MedCond1
========
35.3
40.2
33.6
33|V <--- Problem
As you can see the numeric value is sometimes picked up with additional part of the string due to hard coding of the charindex number. The value is sometimes 4 characters long with a decimal place, sometimes 2 long with no decimal place. I would like to make this dynamic. The pipe defines the end of the data I need and the start is defined by the tild at the end of the column name.
Thanks for any thoughts on making this dynamic
Andrew
This data looks like a table itself. It could have been stored in SQL Server as xml. SQL Server supports xml fields and allows querying them. In fact, one could try to convert this string to XML, then try to query it:
declare #medTable table (item nvarchar(2000))
insert into #medTable
values ('VS5~MedCond1~35.4|VS4~MedCond2~16|VS1~MedCond3~155|VS2~MedCond4~70|SPO2~MedCond5~100|VS3~MedCond6~64|FiO2~MedCond7~21|MAP~MedCond8~98|');
-- Step 1: Replace `|` with <item> tags and `~` with `tag` tags
-- This will return an xml value for each medTable row
with items as (
select xmlField= cast('<item><tag>'
+ replace(
replace(item,'|','</tag></item><item><tag>'),
'~','</tag><tag>' )
+ '</tag></item>' as xml)
from #medTable
)
-- Step 2: Select different tags and display them as fields
select
y.item.value('(tag/text())[1]','nvarchar(20)'),
y.item.value('(tag/text())[2]','nvarchar(20)'),
y.item.value('(tag/text())[3]','nvarchar(20)')
from items outer apply xmlField.nodes('item') as y(item)
The result is :
-------------------- -------------------- -------
VS5 MedCond1 35.4
VS4 MedCond2 16
VS1 MedCond3 155
VS2 MedCond4 70
SPO2 MedCond5 100
VS3 MedCond6 64
FiO2 MedCond7 21
MAP MedCond8 98
NULL NULL NULL
It would be better to perform this conversion when loading the data though. It's easier for example, to make the replacements in C# or SSIS and store a complete xml value in the database.
You can modify this query too, to generate the xml value and store it in the database:
declare #medTable2 table (xmlField xml)
with items as (
select xmlField= cast('<item><tag>' + replace(replace(item,'|','</tag></item><item><tag>'),'~','</tag><tag>' ) + '</tag></item>' as xml)
from #medTable
)
insert into #medTable2
select items.xmlField
from items
-- Query the new table from now on
select
y.item.value('(tag/text())[1]','nvarchar(20)'),
y.item.value('(tag/text())[2]','nvarchar(20)'),
y.item.value('(tag/text())[3]','nvarchar(20)')
from #medTable2 outer apply xmlField.nodes('item') as y(item)
OK, let me take a stab at this. The solution I'm outlining is not going to be purely SQL Server, however, it uses a round-trip via a text-file.
The approach uses the following steps:
Unpivot the data delimited by the pipe symbols (to create more than one line of output for each line of input)
Round-trip the data from SQL Server to a text file and back
Separate the data into columns on the tilde ~ symbol delimiter
Pivot the data back into columns
The key benefit of this approach is the unpivot operation, which allows you to handle missing columns like MedCond2 naturally by the absence of an equivalent row. It also eliminates nearly all string manipulation, save for the one REPLACE function in step 1 below.
Given a single row's contents like the following:
VS5~MedCond1~35.4|VS4~MedCond2~16|VS1~MedCond3~155|VS2~MedCond4~70|SPO2~MedCond5~100|VS3~MedCond6~64|FiO2~MedCond7~21|MAP~MedCond8~98|
Step 1 (Unpivot): Find and replace all instances of the pipe symbol with a newline character. So, REPLACE(column, '|', CHAR(13)) will give you the following lines of text (i.e. multiple lines of text in a single database row) for a single input row:
VS5~MedCond1~35.4
VS4~MedCond2~16
VS1~MedCond3~155
VS2~MedCond4~70
SPO2~MedCond5~100
VS3~MedCond6~64
FiO2~MedCond7~21
MAP~MedCond8~98
Step 2 (Round-trip): Write the above output to a text file, using your tool of choice (SSIS, SQLCMD, etc.) and ensure that the newline character defined is the same as that used in the REPLACE command in step 1.
The purpose of this step is to concatenate multiple lines within the same row with other lines in different rows.
Note that steps 1 can be eliminated by defining the row delimiter for steps 2 & 3 as the pipe symbol. I've put in the additional step 1 using newlines only to make it easier to understand and debug.
Step 3 (Separate columns): Import the text file back into SQL Server using the same tool, and define the column delimiter as the tilde ~ symbol, row delimiter same as in steps 1/2.
ColA MedCondTitle MedCondValue
------ ------------- -------------
VS5 MedCond1 35.4
VS4 MedCond2 16
VS1 MedCond3 155
VS2 MedCond4 70
SPO2 MedCond5 100
VS3 MedCond6 64
FiO2 MedCond7 21
MAP MedCond8 98
Step 4 (Pivot): Now you'd have a trivially simple step of pivoting rows to columns, which can be achieved with a statement of the form:
SUM(CASE WHEN MedCondTitle='MedCond1' THEN MedCondValue ELSE 0) as MedCond1

Sqoop & Hadoop - How to join/merge old data and new data imported by Sqoop in lastmodified mode?

Background:
I have a table with the following schema on a SQL server. Updates to existing rows is possible and new rows are also added to this table.
unique_id | user_id | last_login_date | count
123-111 | 111 | 2016-06-18 19:07:00.0 | 180
124-100 | 100 | 2016-06-02 10:27:00.0 | 50
I am using Sqoop to add incremental updates in lastmodified mode. My --check-column parameter is the last_login_date column. In my first run, I got the above two records into Hadoop - let's call this current data. I noted that the last value (the max value of the the check column from this first import) is 2016-06-18 19:07:00.0.
Assuming there is a change on the SQL server side, I now have the following changes on the SQL server side:
unique_id | user_id | last_login_date | count
123-111 | 111 | 2016-06-25 20:10:00.0 | 200
124-100 | 100 | 2016-06-02 10:27:00.0 | 50
125-500 | 500 | 2016-06-28 19:54:00.0 | 1
I have the row 123-111 updated with a more recent last_login_date value and the count column has also been updated. I also have a new row 125-500 added.
On my second run, sqoop looks at all columns with a last_login_date column greater than my known last value from the previous import - 2016-06-18 19:07:00.0
This gives me only the changed data, i.e. 123-111 and 125-500 records. Let's call this - new data.
Question
How do I do a merge join in Hadoop/Hive using the current data and the new data so that I end up with the updated version of 123-111, 124-100, and the newly added 125-500?
Changed data load using scoop is a two phase process.
1st phase - load changed data into some temp (stage) table using
sqoop import utility.
2nd phase - Merge changed data with old data using sqoop-merge
utility.
If the table is small(say few M records) then use full load using sqoop import.
Sometimes it's possible to load only latest partition - in such case use sqoop import utility to load partition using custom query, then instead of merge simply insert overwrite loaded partition into target table, or copy files - this will work faster than sqoop merge.
You can change the existing Sqoop query (by specifying a new custom query) to get ALL the data from the source table instead of getting only the changed data. Refer using_sqoop_to_move_data_into_hive. This would be the simplest way to accomplish this - i.e doing a full data refresh instead of applying deltas.

execute stored procedure with dbslim with Fitnesse (Selenium,Xebium)

https://github.com/markfink/dbslim
I'd like to execute the stored procedures with DbSlim using Fitnesse (Selenium, Xebium)
now what I tried to do is:
!define dbQuerySelectCustomerbalance (
execute dbo.uspLogError
)
| script | Db Slim Select Query | !-${dbQuerySelectCustomerbalance}-! |
which gives a green indicator,
however Microsoft SQL Server profiler gives no actions/logging...
so what i'd like to know is: is it possible to use dbslim for executing stored procedures,
if yes
what is the correct way to do it?
By the way, the connection to the Database i've on 1 page, and on the query page i included the connection to the database. (is that ok?)
Take out the !- ... -!. It is used to escape wikified words. But in this case you want it to be translated to the actual query.
!define dbQuerySelectCustomerbalance ( execute dbo.uspLogError )
| script | Db Slim Select Query | ${dbQuerySelectCustomerbalance} |
| show | data by column index | 1 | and row index | 1 |
You can add in the last line which outputing the first column of the first row for testing purpose if your SP is returning some result (or you can create one simple SP just to test this out)
Specifying the connection anywhere before this block will be fine, be it on the same page or in an SetUp/SuiteSetUp/normal page included/executed before.

SSIS "Enumerator failed to retrieve element at index" Error

In my SSIS package I am using a data flow task to extract data from SQL Server and put it into a dataset with the following schema:
Column1 Int32
Column2 Object
Column3 Object
Column4 String
Column5 Double
That step seems to work well. In the foreach editor I mapped the columns to variables like this:
VARIABLE | INDEX
User::Column1 | 0
User::Column2 | 1
User::Column3 | 2
User::Column4 | 3
User::Column5 | 4
When I run the package I get the following error on the foreach task:
Error: The enumerator failed to retrieve element at index "4".
Error: ForEach Variable Mapping number 5 to variable "User::Column5" cannot be applied.
There are no null values in Column5 and I can clearly see all 5 columns in the query when I run it against the database. Any assistance is greatly appreciated!
I finally found the problem. The target dataset in the data flow task was dropping the last column for some reason. Once I recreated the dataset destination everything worked.

Loop 5 records at a time and assign it to variable

I have a table of 811 records. I want to get five records at a time and assign it to variable. Next time when I run the foreach loop task in SSIS, it will loop another five records and overwrite the variable. I have tried doing with cursor but couldn't find the solution. Any help will be highly appreciated. I have table like this for e.g.
ServerId ServerName
1 Abc11
2 Cde22
3 Fgh33
4 Ijk44
5 Lmn55
6 Opq66
7 Rst77
. .
. .
. .
I want query should take first five names as follows and assign it to variable
ServerId ServerName
1 Abc11
2 Cde22
3 Fgh33
4 Ijk44
5 Lmn55
Then next loop takes another five name and overwrite the variable value and so on till the last record is consumed.
Taking ltn's answer into consideration this is how you can achieve limiting the rows in SSIS.
The Design will look like
Step 1 : Create the variables
Name DataType
Count int
Initial int
Final int
Step 2 : For the 1st Execute SQL Task write the sql to store the count
Select count(*) from YourTable
In the General tab of this task Select the ResultSet as Single Row.
In the ResultSet tab map the result to the variable
ResultName VariableName
0 User::Count
Step 3 : In the For Loop container enter the expression as shown below
Step 4 : Inside the For Loop drag an Execute SQL Task and write the expression
In Parameter Mapping map the initial variable
VariableName Direction DataType ParameterName ParameterSize
User::Initial Input NUMERIC 0 -1
Result Set tab
Result Name Variable Name
0 User::Final
Inside the DFT u can write the sqL to get the particular rows
Click on Parameters and select the variable INITIAL and FINAL
if your data will not be update between paging cycles and the sort order is always the same then you could try an approach similiar to:
CREATE PROCEDURE TEST
(
#StartNumber INT,
#TakeNumber INT
)
AS
SELECT TOP(#TakeNumber)
*
FROM(
SELECT
RowNumber=ROW_NUMBER() OVER(ORDER BY IDField DESC),
NameField
FROM
TableName
)AS X
WHERE RowNumber>=#StartNumber

Resources