SSIS Flat File Source Row Restructure - sql-server

I have a flat file source of thousands of records (> 100K in some cases). This source is externally procured, and I cannot request a layout revision.
In this flat file, each row contains four columns:
| User ID | Status_1 | Status_2 | Status_3
| 1337 | Green | Yellow | Red
| 1234 | Red | Red | Green
The destination table was designed to accept two columns:
| User ID | Status Codes
| 1337 | Green
| 1337 | Yellow
| 1337 | Red
| 1234 | Red
| 1234 | Red
| 1234 | Green
Until now, I have been running 3 different SSIS packages to my destination table, one for each Status Column in the Flat File.
What I would like is to use a single SSIS package, and create either another Flat File Destination or Temp Table to mirror the destination table, and import from there.
Is this achievable? If so, what are the best practice Tasks to use, rather than simply UPDATE & SET to the Temp Table.

heh looks like a case for good ole SQL. I would use an UNPIVOT on this one.
http://technet.microsoft.com/en-us/library/ms177410(v=sql.105).aspx
That link has a great example which looks very similar to your data:
--Create the table and insert values as portrayed in the previous example.
CREATE TABLE pvt (VendorID int, Emp1 int, Emp2 int,
Emp3 int, Emp4 int, Emp5 int);
GO
INSERT INTO pvt VALUES (1,4,3,5,4,4);
INSERT INTO pvt VALUES (2,4,1,5,5,5);
INSERT INTO pvt VALUES (3,4,3,5,4,4);
INSERT INTO pvt VALUES (4,4,2,5,5,4);
INSERT INTO pvt VALUES (5,5,1,5,5,5);
GO
--Unpivot the table.
SELECT VendorID, Employee, Orders
FROM
(SELECT VendorID, Emp1, Emp2, Emp3, Emp4, Emp5
FROM pvt) p
UNPIVOT
(Orders FOR Employee IN
(Emp1, Emp2, Emp3, Emp4, Emp5)
)AS unpvt;
GO
Back when I was data warehousing, half my job seemed like it was using UNPIVOT on crap data I got through spreadsheets.

Related

How to print out multiple rows of XML values using SQL

I have a database called Companies.
There is a table within Companies called Employees.
Within Employees there is a column that contains an XML response. The Column is called Data.
The XML Response looks like this
<Employee>
<Tenure>7</Tenure>
<Age>55</Age>
<OfficesVisited>
<int>1132</int>
<int>3345</int>
<int>7534</int>
</OfficesVisited>
</Employee>
What I would like my sql query to print out is:
OfficesVisited
1132
3345
7534
What I am currently getting is 113233457534
I am using this sql query:
use Companies
SELECT Employees.Data.query('(/Employee/OfficesVisited/int/text())') as OfficesVisited
FROM Employees
Where Employee.Employee_ID = 65035277
I've tried using OUTER APPLY and CROSS APPLY and I can get it into 3 rows but all three rows look like the above.
Can Anyone help?
Thanks!
Please try the following. It shows how to use .nodes() and .value() XML methods correctly.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, [data] XML);
INSERT INTO #tbl ([data]) VALUES
(N'<Employee>
<Tenure>7</Tenure>
<Age>55</Age>
<OfficesVisited>
<int>1132</int>
<int>3345</int>
<int>7534</int>
</OfficesVisited>
</Employee>')
, (N'<Employee>
<Tenure>77</Tenure>
<Age>58</Age>
<OfficesVisited>
<int>7777</int>
<int>8888</int>
</OfficesVisited>
</Employee>')
, (N'<Employee>
<Tenure>90</Tenure>
<Age>50</Age>
<OfficesVisited>
<int>1111</int>
</OfficesVisited>
</Employee>');
-- DDL and samplex data population, end
SELECT c.value('(./text())[1]','INT') AS OfficesVisited
FROM #tbl AS tbl
CROSS APPLY tbl.[data].nodes('/Employee/OfficesVisited/int') AS t(c)
WHERE id IN (1,2);
Output
+----------------+
| OfficesVisited |
+----------------+
| 1132 |
| 3345 |
| 7534 |
| 7777 |
| 8888 |
+----------------+

SQL - aggregate related accounts - maually set up ID

I have two tables:
Account & Amount column
list of related accounts
Data samples:
Account | Amount
--------+---------
001 | $100
002 | $150
003 | $200
004 | $300
Account | Related Account
--------+------------------
001 | 002
002 | 003
003 | 002
My goal is to be able to aggregate all related accounts. From table two - 001,002 & 003 are actually all related to each other. What I would like to be able to do is to get a sum of all related accounts. Possibly ID 001 to 003 as Account #1, so I can aggregate them.
Result below
ID | Account | Amount
-----+-----------+--------
#1 | 001 | $100
#1 | 002 | $150
#1 | 003 | $200
#2 | 004 | $300
I can then manipulate the above table as below (final result)
ID | Amount
-----+--------
#1 | $450
#2 | $300
I tried doing a join, but it doesn't quite achieve what I want. I still have a problem relating account 001 with 003 (they are indirectly related because 002 is related with both 001 and 003.
If anyone can point me to the right direction, will be much appreciated.
Well, you really made this harder then it should be.
If you could change the data in the second table, so it will not contain reversed duplicates (in your sample data - 2,3 and 3,2) it would simplify the solution.
If you could refactor both tables into a single table, where the related column is a self referencing nullable foreign key, it would simplify the solution even more.
Let's assume for a minute you can't do either, and you have to work with the data as provided. So the first thing you want to do is to ignore the reversed duplicates in the second table. This can be done using a common table expression and a couple of case expressions.
First, create and populate sample tables (Please save us this step in your future questions):
DECLARE #TAccount AS TABLE
(
Account int,
Amount int
)
INSERT INTO #TAccount (Account, Amount) VALUES
(1, 100),
(2, 150),
(3, 200),
(4, 300)
DECLARE #TRelatedAccounts AS TABLE
(
Account int,
Related int
)
INSERT INTO #TRelatedAccounts (Account, Related) VALUES
(1,2),
(2,3),
(3,2)
You want to get only the first two records from the #TRelatedAccounts table.
This is the AccountAndRelated CTE.
Now, you want to left join the #TAccount table with the results of this query, so for each Account we will have the Account, the Amount, and the Related Account or NULL, if the account is not related to any other account or it's the first on the relationship chain.
This is the CTERecursiveBase CTE.
Then, based on that you can create a recursive CTE (called CTERecursive), and finally select the sum of amount from the recursive CTE based on the root of the recursion.
Here is the entire script:
;WITH AccountAndRelated AS
(
SELECT DISTINCT CASE WHEN Account > Related THEN Account Else Related END As Account,
CASE WHEN Account > Related THEN Related Else Account END As Related
FROM #TRelatedAccounts
)
, CTERecursiveBase AS
(
SELECT A.Account, Related, Amount
FROM #TAccount As A
LEFT JOIN AccountAndRelated As R ON A.Account = R.Account
)
, CTERecursive AS
(
SELECT Account As Id, Account, Related, Amount
FROM CTERecursiveBase
WHERE Related IS NULL
UNION ALL
SELECT Id, B.Account, B.Related, B.Amount
FROM CTERecursiveBase AS B
JOIN CTERecursive AS R ON B.Related = R.Account
)
SELECT Id, SUM(Amount) As TotalAmount
FROM CTERecursive
GROUP BY Id
Results:
Id TotalAmount
1 450
4 300
You can see a live demo on rextester.
Now, Let's assume you can modify the data of the second table. You can use the AccountAndRelated cte to get only the records you need to keep in the #TRelatedAccounts table - This means you can skip the AccountAndRelated cte and use the #TRelatedAccounts directly in the CTERecursiveBase cte.
You can see a live demo of that as well.
Finally, let's assume you can refactor your database. In that case, I would recommend joining the two tables together - so your #TAccount table would look like this:
Account Amount Related
1 100 NULL
2 150 1
3 200 2
4 300 NULL
Then you only need the recursive cte.
Here is a live demo of that option as well.

Condensing Row Data into a View

I have data in my PeopleInfo table where there are some people that have multiple records that I am trying to combine together into one record for a view.
All people data is the almost the same except for the PlanId and PlanName. So:
| FirstName | LastName | SSN | PlanId | PlanName | Status | Price1 | Price2 |
|-----------|----------|-----------|--------|----------|-----------|---------|--------|
| John | Doe | 123456789 | 1 | Plan A | Primary | 9.00 | NULL |
|-----------|----------|-----------|--------|----------|-----------|---------|--------|
| John | Doe | 123456789 | 2 | Plan B | Secondary | NULL | 5.00 |
I would like to only to have one John Doe record in my view that looked like this:
| FirstName | LastName | SSN | PlanId | PlanName | Status | Price1 | Price2 |
|-----------|----------|-----------|--------|----------|-----------|---------|--------|
| John | Doe | 123456789 | 1 | Plan A | Primary | 9.00 | 5.00 |
Where the Primary status determines which PlanId and PlanName to show. Can anyone help me with this query?
declare #t table ( FNAME varchar(10), LNAME varchar(10), SSN varchar(10), PLANID INT,PLANNAME varchar(10),stat varchar(10),Price1 decimal(18,2),Price2 decimal(18,2))
insert into #t (FNAME,LNAME,SSN,PLANID,PLANNAME,stat,Price1,Price2)values ('john','doe','12345',1,'PlanA','primary',9.00,NULL),('john','doe','12345',1,'PlanB','secondary',Null,8.00)
select
FNAME,
LNAME,
SSN,
MAX(PLANID)PLANID,
MIN(PLANNAME)PLANNAME,
MIN(stat)stat,
MIN(Price1)Price1,
MIN(Price2)Price2 from #t
GROUP BY FNAME,LNAME,SSN
(I can't yet add a comment, so have an answer.)
The only thing that troubles me here is that i am also determining which PlanId and PlanName since they are different and i want to show a specific one based off of the Status field of both records.
Then you don't even need GROUPing. It would be much simpler. Just SELECT WHERE 'Primary' = PlanName. Assuming that (A) there will always be this PlanName for each user, and (B) You are happy to ignore all others.
P.S. If you will only be using Primary and Secondary PlanNames, you might want to change the column to a bit named something like isPrimaryPlan where 1 indicates true and 0 false. However, if you might bring in e.g. Bronze and Consolation Prize Plans later, then you'll need to retain a more variable datatype. Perhaps store the plans in a separate table and have an int FOREIGN KEY to it... I could go on!
OK, I'm back after having a sleep, which has improved my brain slightly,
First, let the record reflect that I don't like the database design here. The People and Plans should be separate tables, linked by foreign keys - via a 3rd table, e.g. PeoplePlans. That takes me to another point: the people here have no primary key (at least not that you have specified). So when writing the below, I had to pick the SSN, assuming that will always be present and unique.
Anyway, something like this should work, with the caveat that I'm not going to replicate the database structure to test it.
select
FirstName,
LastName,
SSN,
PlanId,
PlanName,
Status,
_ca._sum_Price1,
_ca._sum_Price2
from
PeopleInfo as _Primary
cross apply (
select
sum(Price1) as _sum_Price1,
sum(Price2) as _sum_Price2
from
PeopleInfo
where
_Primary.SSN = SSN
) as _ca
where
'Primary' = Status;
This SELECTs all People with Primary status in order to get you those rows. It then CROSS APPLYs their Primary and any other rows and takes the summed Prices.
Hopefully this makes sense. If not, you'll have to do some reading about CROSS APPLY, in addition to about good relational database design. ;-)

How to update records with multiple tables linked by a foreign key using XQuery (or OPENXML) in SQL Server?

I have T-SQL code like this:
DECLARE #xml XML = (SELECT CONVERT(xml, BulkColumn, 2) FROM OPENROWSET(Bulk 'C:\test.xml', SINGLE_BLOB) [blah])
-- Data for Table 1
SELECT
ES.value('id-number[1]', 'VARCHAR(8)') IDNumber,
ES.value('name[1]', 'VARCHAR(8)') Name,
ES.value('date[1]', 'VARCHAR(8)') Date,
ES.value('test[1]', 'VARCHAR(3)') Test,
ES.value('testing[1]', 'VARCHAR(511)') Testing,
ES.value('testingest[1]', 'VARCHAR(5)') Testingest
FROM #xml.nodes('xmlnodes/path') AS EfficiencyStatement(ES)
-- Data for Table 2
SELECT
U.value('fork[1]', 'VARCHAR(8)') Fork,
U.value('spoon[1]', 'VARCHAR(3)') Spoon,
U.value('spork[1]', 'VARCHAR(3)') Spork,
FROM #xml.nodes('xmlnodes/path/nextpath') AS Utensils(U)
Now, I've tried what I normally use, and other variants, such as:
AS XML ON xml.[id-number] = [table1].[id-number]
For the record, id-number is unique across the entire document. It can never occur again.
This is good for grabbing the data from my XML file, but there's zero referential integrity. How do I make sure that Table 2 (and up) maintains referential integrity when inserting?
This should be a much better explanation:
I want to load XML values from a file. For INSERT, I have no trouble using OPENXML and binding it based on the id-number using AS XML ON xml.[id-number] = [table1].[id-number] at the end.
I want to update the database record (with all linked tables and their columns) using UPDATE, MERGE, or something -- anything! To do this, I believe I need to find a way to maintain referential integrity based on the Foreign_ID value present in each table. There are dozens of tables which are all linked via Foreign_ID, so how do I update all of these?
Table Example
Table #1
+-------------+-----------+-----------+------------+---------+-----------+------------+
| Primary_Key | ID_Number | Name | Date | Test | Testing | Testingest |
+-------------+-----------+-----------+------------+---------+-----------+------------|
| 70001 | 12345 | Tom | 01/21/14 | Hi | Yep | Of course! |
| 70002 | 12346 | Dick | 02/22/14 | Bye | No | Never! |
| 70003 | 12347 | Harry | 03/23/14 | Sup | Dunno | Same. |
+----^--------+-----------+-----------+------------+---------+-----------+------------+
|
|-----------------|
|
Table #2 | Linked to primary key in the first table.
+-------------+--------v--------+-------------+-------------+------------+
| Primary_Key | Foreign_ID | Fork | Spoon | Spork |
+-------------+-----------------+-------------+-------------+------------+
| 0001 | 70001 | Yes | No | No |
| 0002 | 70002 | No | Yes | No |
| 0003 | 70003 | No | No | Yes |
+-------------+-----------------+-------------+-------------+------------+
After that is inserted, I need to be able to UPDATE the tables and columns from the XML files. After much research, I can't figure out how to update the values of every table linked by Foreign_ID while maintaining referential integrity. This means I am inserting the wrong data in the other tables.
I want the correct data updated. To update it correctly, I need to ensure that XQuery is matching the right data. Some tables have multiple fields for one particular Foreign_ID.
Here's the code I'm using:
DECLARE #xml XML = (SELECT CONVERT(xml, BulkColumn, 2) FROM OPENROWSET(Bulk 'C:\test.xml', SINGLE_BLOB) [blah])
-- Data for Table 1
SELECT
ES.value('id-number[1]', 'VARCHAR(8)') IDNumber,
ES.value('name[1]', 'VARCHAR(8)') Name,
ES.value('date[1]', 'VARCHAR(8)') Date,
ES.value('test[1]', 'VARCHAR(3)') Test,
ES.value('testing[1]', 'VARCHAR(511)') Testing,
ES.value('testingest[1]', 'VARCHAR(5)') Testingest
INTO #TempTable
FROM #xml.nodes('xmlnodes/path') AS EfficiencyStatement(ES)
-- #Serial Error: Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
SET #IDNumber = (SELECT SerialNumber from #TempTable)
SET #Foreign_ID = (SELECT [Foreign_ID] from [table] WHERE [id-number] = #IDNumber)
MERGE dbo.[table1] AS CF
USING (SELECT IDNumber, Name, Date, Test, Testing, Testingest FROM #TempTable) AS src
ON CF.[id-number] = src.IDNumber
-- ID-Number is unique, and is used to setup the initial referential integrity. Foreign_ID does not exist in the XML files, so we are not matching on that.
WHEN MATCHED THEN UPDATE
SET
CF.[id-number] = src.IDNumber
-- and so on...
WHEN NOT MATCHED THEN
-- Insert statements here
GO
This works for the first table. It does not maintain integrity when updating the other tables via Foreign_ID. Note that SET #Serial has an error, but when I set it to anything else, it will update properly.
I am not fully sure what you are asking here, but if you cannot use the suggested article to enforce references in your XML, there is not really a post-op way for you to do it just in XML.
For Table2+ you can do EXISTS checks against TABLE 1 and process accordingly that way (see Referential integrity issue with Untyped XML in TSQL for example)
The only other way that I can think of is to create "real" tables that represent your schema for table 1, table 2...tableN that have the relevant FKs and insert into them.

SQL make rows into columns, PIVOT maybe

I have an MS SQL Server with a database for an E-commerce storefront.
This is some of the tables I have:
Products:
Id | Name | Price
ProductAttributeTypes: -Color, Size, Format
Id | Name
ProductAttributes: --Red, Green, 12x20 cm, Mirrored
Id | ProductAttributeTypeId | Name
Orders:
Id | DateCreated
OrderItems:
Id | OrderId | ProductId
OrderItemsToProductAttributes: --Relates an OrderItem to its product and selected attributes
OrderItemId | ProductAttributeId | ProductAttributeTypeId | ProductId
I want to select from the OrderItems table, to see which items have been purchased.
To see what kind of variants (ProductAtriibutes) was selected, I want those as "dynamic" columns in the resultset.
So the resultset should look like this:
OrderItemId | ProductId | ProductName | Color | Size | Format
1234 123 Mount. Bike Red 2x20 Mirror
I don't know if PIVOT is the thing to use? I'm not using any aggregate functions, so I guess not...
Is there any SQL Ninjas that can help me out?
If you are using sql2005 or 2008 you can use the pivot command. See here.
In the example below the OrderAttributes set will look like:
OrderItemId AttName AttValue
----- ------ -----
100 Color Red
100 Size Small
101 Color Blue
101 Size Small
102 Color Red
102 Size Small
103 Color Blue
103 Size Large
The final results after the PIVOT will be:
OrderItemId Size Color
----- ------ -----
100 Small Red
101 Small Blue
102 Small Red
103 Large Blue
WITH OrderAttributes(OrderItemId, AttName, AttValue)
AS (
SELECT
OrderItemId,
pat.Name AS AttName,
pa.Name AS AttValue
FROM OrderItemsToProductAttributes x
INNER JOIN ProductAttributes pa
ON x.ProductAttributeId = pa.id
INNER JOIN ProductAttributeTypes pat
ON pa.ProductAttributeTypeId = pat.Id
)
SELECT AttrPivot.OrderItemId,
[Size] AS [Size],
[Color] AS Color
FROM OrderAttributes
PIVOT (
MAX([AttValue])
FOR [AttName] IN ([Color],[Size])
) AS AttrPivot
ORDER BY AttrPivot.OrderItemId
There is a way to dynamically build the columns (i.e. the Color and Size columns), as can be seen here. Make sure your database compatibility level on your database is set to something greater than 2000 or you will get strange errors.
In the past, I've created physical tables for read purposes only. The structure you have above is GREAT for storage, but terrible for reporting.
So you could do the following:
Write a script (that is scheduled nightly) or a trigger (on data change) that does the following tasks:
First, you would dynamically go through each Product and build a static table "Product_[ProductName]"
Then go through each ProductAttributeTypes for each product and create/update/delete a physical column on the corresponding Product table.
Then, fill that table with the proper values based on OrderItemsToProductAttributes and ProductAttributes
This is just a rough idea. Make sure you are storing OrderID in the "Static"/"Flattened" tables. And make sure you do everything else you need to do. But after that, you should be able to start pulling from those flattened tables to get the data you need.
Pivot is your best bet, but what I did for reporting purposes, and to make it work well with SSIS is to create a view, which then has this query:
SELECT [InputSetID], [InputSetName], CAST([470] AS int) AS [Created By], CAST([480] AS datetime) AS [Created], CAST([479] AS int) AS [Updated By], CAST([460] AS datetime)
AS [Updated]
FROM (SELECT st.InputSetID, st.InputSetName, avt.InputSetID AS avtID, avt.AttributeID, avt.Value
FROM app.InputSetAttributeValue avt JOIN
app.InputSets st ON avt.InputSetID = st.InputSetID) AS p PIVOT (MAX(Value) FOR AttributeID IN ([470], [480], [479], [460])) AS pvt
Then I can just interact with the view, but, I have a trigger on the table that any new dynamic attributes must be added to, which recreates this view, so I can assume the view is always correct.

Resources