ssis merge join more than 2 data sets - sql-server

I'm working on an ssis package to fix some data from a table. The table looks something like this:
CustID FieldID INT_VAL DEC_VAL VARCHAR_VAL DATE_VAL
1 1 23
1 2 500.0
1 3 David
1 4 4/1/05
1 5 52369871
2 1 25
2 2 896.23
2 3 Allan
2 4 9/20/03
2 5 52369872
I want to transform it into this:
CustID FirstName AccountNumber Age JoinDate Balance
1 David 52369871 23 4/1/05 500.0
2 Allan 52369872 25 9/20/03 896.23
Currently, I've got my SSIS package set up to pull in the data from the source table, does a conditional split on the field id, then generates a derived column on each split. The part I'm stuck on is joining the data back together. I want to join the data back together on the CustId.
However, the join merge only allows you to join 2 datasets, in the end I will need to join about 30 data sets. Is there a good way to do that without having to have a bunch of merge joins?

That seems a bit awkward, why not just do it in a query?
select
CustID,
max(case when FieldID = 3 then VARCHAR_VAL else null end) as 'FirstName',
max(case when FieldID = 5 then INT_VAL else null end) as 'AccountNumber',
max(case when FieldID = 1 then INT_VAL else null end) as 'Age',
max(case when FieldID = 4 then DATE_VAL else null end) as 'JoinDate',
max(case when FieldID = 2 then DEC_VAL else null end) as 'Balance'
from
dbo.StagingTable
group by
CustID
If your source system is MSSQL, then you can use that query from SSIS or even create a view in the source database (if you're allowed to). If not, then copy the data directly to a staging table in MSSQL and query it from there.

Related

Grouping ID while counting specific attribute values

I want to count how many occurrences there is of the value 1 in the attribute months for each ID in a table.
Here is what I am working with
ID. Months
1000 1
1000 1
1000 2
1001 2
1002 3
1003 1
This is what I would like to have
ID. Count(Months=1)
1000 2
1003 1
If you want to count row for just one month, you can use WHERE clause for filtering:
select id,
count(*) as cnt
from your_table
where month = 1
group by id;
If you want to get counts for multiple months in one row (it's called pivoting), you can use conditional aggregation in most of the databases:
select id,
count(case when month = 1 then 1 end) as cnt_month_1,
count(case when month = 2 then 1 end) as cnt_month_2,
count(case when month = 3 then 1 end) as cnt_month_3,
. . .
from your_table
group by id;
Some databases offer PIVOT operator for this task. For that, you'll need to specify which database you are using.

Efficiently select nested dependency tables and multiple columns in a single row using join in SQL server

I have one parent table "Employee", the employee information is stored in three children tables and each children table has one children table. Consider the following tables:
Table: Employee (Level #1)
EmpId IsActive
__________________
1 1
2 1
3 1
4 0
5 0
6 1
Table: EmployeeEmail (Level #2)
EmpEmailId EmpId EmailId
______________________________
1 1 1
2 4 3
3 6 4
Table: EmailAddress (Level #3)
EmailId Email
____________________________
1 one#gmail.com
2 two#gmail.com
3 three#gmail.com
4 four#gmail.com
Table: EmployeePhone (Level #2)
EmpPhoneId EmpId PhoneId Type
____________________________________________
1 1 1 Mobile
2 2 2 Mobile
3 5 4 Fax
4 1 6 Fax
5 2 9 Home
Table: PhoneNumber (Level #3)
PhoneId PhoneNumber
_______________________
1 9912345671
2 9912345672
3 9912345673
4 9912345674
5 9912345675
6 9912345676
7 9912345677
8 9912345678
9 9912345679
Now I need to select the Active Employee Records (Full Information), if the employee has phone number then it should come otherwise it should be NULL, I need the same for Email too.
My expected output:
EmpId Email Home Mobile Fax
____________________________________________________________________
1 one#gmail.com NULL 9912345671 9912345676
(...)
This question is similar to my previous question How to effeciently SELECT Nested dependency Tables using JOIN in SQL Server
Kindly assist me how to fetch the multiple phone numbers in a single row?
This is a simple case style pivot that looks like it would suit your needs:
select
e.EmpId
, Email = max(em.Email)
, Home = max(case when ep.Type = 'Home' then pn.PhoneNumber else null end)
, Mobile = max(case when ep.Type = 'Mobile' then pn.PhoneNumber else null end)
, Fax = max(case when ep.Type = 'Fax' then pn.PhoneNumber else null end)
from Employee as e
left join EmployeeEmail as ee on e.EmpId = ee.EmpId
left join EmailAddress as ea on ee.EmailId = ea.EmailId
left join EmployeePhone as ep on e.EmpId = ep.EmpId
left join PhoneNumber as pn on ep.PhoneId = pn.PhoneId
where e.IsActive = 1
group by e.EmpId

SQL Pivot only select rows

I am attempting to pivot a database so that only certain rows become columns. Below is what my table looks like:
ID QType CharV NumV
1 AccNum 10
1 EmpNam John Inc 0
1 UW Josh 0
2 AccNum 11
2 EmpNam CBS 0
2 UW Dan 0
I would like the table to look like this:
ID AccNum EmpNam
1 10 John Inc
2 11 CBS
I have two main problems I am trying to account for.
1st: the value that I am trying to get isn't always in the same column. So while AccNum is always in the NumV column, EmpName is always in the CharV column.
2nd: I need to find a way to ignore data that I don't want. In this example it would be the row with UW in the QType column.
Below is the code that I have:
SELECT *
FROM testTable
Pivot(
MAX(NumV)
FOR[QType]
In ([AccNum],[TheValue])
)p
But it's giving me the below result:
ID CharV AccNum TheValue
1 10 NULL
2 11 NULL
2 CBS NULL NULL
2 Dan NULL NULL
1 John IncNULL NULL
1 Josh NULL NULL
In this case grouping with conditional aggregation should work. Try something like:
SELECT ID
, MAX(CASE WHEN QType = 'AccNum' THEN NumV END) AS AccNum
, MAX(CASE WHEN QType = 'EmpNam' THEN CharV END) AS EmpNam
FROM testTable
GROUP BY ID
Since the inner CASE only gets a value when the WHEN condition is met, the MAX function will give you the value desired. This of course, only works as long as there are only unique QTypes per ID.
Generally using PIVOT in Sql-Server doesn't work in one step when your conditions are complex, specially when you need values from different columns. You could pivot your table in two queries and join those, but it would perform poorly and is less readable than my suggestion.

Transposing only few columns in SQL Server

I have 4 columns in my table like :
key cusi isi name
1 46644UAQ1 US46642EAV83 A
1 46644UAR9 XS0062104145 A
1 254206AC9 A
2 05617YAJ8 US86359AXP38 B
2 885220BP7 B
2 null B
3 885220BP5 885220BP7345 c
the key and name column content is getting duplicated because of the cusi and isi column .I would like to transpose only few columns in this case cusi and isi column so that i get 1 record of id =1 and another one for id=2 .In my use case there can be at the max 3 ditinct cusi or 3 isi column.
The transpose table should like
key name cusi1 cusi2 cusi3 isi1 isi2 isi3
1 A 46644UAQ1 46644UAR9 254206AC9 US46642EAV83 XS0062104145 NULL
2 A 46644UAR9 05617YAJ8 885220BP7 US86359AXP38 NULL NULL
3 c 885220BP5 null null 885220BP7345 NULL NULL
In some cases there might be only 1 row like in t he above example it is for key= 3
i know that sql has PIVOT and UNPIVOT queries but i am not sure how to use it for transposing selecting columns of a table
Any help would be of great help.
Thanks
If you know that each key-name group will have a fixed number of records (three, based on the sample data you gave us), then an ordinary non pivot should work. In the query below, I make use of the row number to distinguish each of the three columns you want for cusi and isi in your result set.
SELECT t.key,
t.name,
MAX(CASE WHEN t.rn = 1 THEN cusi END) AS cusi1,
MAX(CASE WHEN t.rn = 2 THEN cusi END) AS cusi2,
MAX(CASE WHEN t.rn = 3 THEN cusi END) AS cusi3,
MAX(CASE WHEN t.rn = 1 THEN isi END) AS isi1,
MAX(CASE WHEN t.rn = 2 THEN isi END) AS isi2,
MAX(CASE WHEN t.rn = 3 THEN isi END) AS isi3
FROM
(
SELECT key,
cusi,
isi,
name,
ROW_NUMBER() OVER(PARTITION BY key ORDER BY cusi) AS rn
FROM yourTable
) t
GROUP BY t.key,
t.name
Note that SQL Server also has a PIVOT function, which is an alternative to what I gave.

COUNT and COUNT DISTINCT for different groups

For a SQL Server based report,
Table:
CID Date ID Service Days
1 3/7/2016 1 Individual 3
2 4/5/2016 2 Individual 4
3 5/24/2016 1 Individual 3
4 4/4/2016 4 Group 2
5 4/4/2016 4 Group 2
6 2/18/2016 4 Group 2
7 5/5/2016 5 Group 1
8 5/5/2016 5 Group 1
I used this code:
SELECT
ID,
Service,
COUNT(WHEN Days = 4 THEN 1 END) AS '4Days',
COUNT(WHEN Days = 3 THEN 1 END) AS '3Days',
COUNT(WHEN Days = 2 THEN 1 END) AS '2Days',
COUNT(WHEN Days = 1 THEN 1 END) AS '1Day'
FROM Table T1
GROUP BY
ID,
Service
which gives me this Output:
ID Service 4Days 3Days 2Days 1Day
1 Individual 0 2 0 0
2 Individual 1 0 0 0
4 Group 0 0 3 0
5 Group 0 0 0 2
What I want to do is not count the Group services as separate services for separate individuals, but just as one service per group. A Count Distinct used with the Date or ID could help me do that but I don't know how to make that play with the Individual services where I just wanna count them individually and not using DISTINCT. So the desired output is:
ID Service 4Days 3Days 2Days 1Day
1 Individual 0 2 0 0
2 Individual 1 0 0 0
4 Group 0 0 2 0
5 Group 0 0 0 1
I'll edit the post in case I oversimplified the problem since this is dummy data.
Looks like you could use distinct this way if you wanted:
count(distinct
case when Days = 1 then case when Service = 'Group' then 1 else "Date" end end
) as [1Day]
Depending on your indexing it's possible that introducing another column in the query would change the query plan. I suspect that probably isn't the case though.
If I am not wrong for '2Days' column service type 'Group' count should be '2' if our grouping based on 'Date' column, if so then try this:
SELECT
ID,
Service,
CASE WHEN MAX(t.days) = 4 THEN MAX(t.date) ELSE 0 END AS '4Days',
CASE WHEN MAX(t.days) = 3 THEN MAX(t.date) ELSE 0 END AS '3Days',
CASE WHEN MAX(t.days) = 2 THEN MAX(t.date) ELSE 0 END AS '2Days',
CASE WHEN MAX(t.days) = 1 THEN MAX(t.date) ELSE 0 END AS '1Day'
FROM table T1
OUTER APPLY (SELECT days,
COUNT(DISTINCT(date)) date
FROM Table WHERE days = t1.days GROUP BY days) t
GROUP BY id, service
ORDER BY ID
Based on your last edit, this is the most straight forward way I could think of to handle the query:
with cte as (
select id, service, days
from table t1
where service = 'Individual'
union all
select id, service, days
from table t1
where service = 'Group'
group by id, service, days, date
)
select id,
service,
count(case when days = 4 then 'X' end) as [4Days],
count(case when days = 3 then 'X' end) as [3Days],
count(case when days = 2 then 'X' end) as [2Days],
count(case when days = 1 then 'X' end) as [1Day]
from cte
group by id, service

Resources