SSIS - Split multiple columns into single rows - sql-server

Below is a very small example of the flat table I need to split. The first table is a Lesson table which has the ID, Name and Duration. The second table is a student table which only has the Student Name as a PK. And the third table will be a Many to Many of Lesson ID and Student Name.
Lesson Id
Lesson Name
Lesson Duration
Student1
Student2
Student3
Student4
1
Maths
1 Hour
Jean
Paul
Jane
Doe
2
English
1 Hour
Jean
Jane
Doe
I don't know how, using SSIS, I can assign Jean, Paul, Jane and Doe to their own tables using the Student 1, 2, 3 and 4 columns. When I figure this out, I imagine I can use the same logic to map the Lesson ID and columns to the third Many to Many table?
How do I handle duplicate entries, for example Jean Jane and Doe already exist from the first row so they do not need to be added to the Students table.
I assume I use a conditional split to skip null values? For example Student4 on the second row is Null.
Thanks for the assistance.

Were it me, I would design this as 3 data flows.
Data flow 1 - student population
Since we're assuming the name is what makes a student unique, we need to build a big list of the unique names.
SELECT D.*
FROM
(
SELECT S.Student1 AS StudentName
FROM dbo.MyTable AS S
UNION
SELECT S.Student2 AS StudentName
FROM dbo.MyTable AS S
UNION
SELECT S.Student3 AS StudentName
FROM dbo.MyTable AS S
UNION
SELECT S.Student4 AS StudentName
FROM dbo.MyTable AS S
)D
WHERE D.StudentName IS NOT NULL
ORDER BY D.StudentName;
The use of UNION in the query will handle deduplication of data and we wrap that in a derived table to filter the NULLs.
I add an explicit order by not that it's needed but since I'm assuming you're using the name as the primary key, let's avoid sort operation when we land the data.
Add an OLE DB Source to your data flow and instead of picking a table in the drop down, you'll use the above query.
Add an OLE DB Destination to the same data flow and connect the two. Assuming your target table looks something like
CREATE TABLE dbo.Student
(
StudentName varchar(50) NOT NULL CONSTRAINT PK__dbo__Student PRIMARY KEY(StudentName)
);
Data Flow 2 - Lessons
Dealers choice here, you can either write the query or just point at the source table.
A very good practice to get into with SSIS is to only bring the data you need into the buffers so I would write a query like
SELECT DISTINCT S.[Lesson Id], S.[Lesson Name], S.[Lesson Duration]
FROM dbo.MyTable AS S;
I favor a distinct here as I don't know enough about your data but if it were extended and a second Maths class was offered to accommodate another 4 students, it might be Lesson Id 1 again. Or it might be 3 as it indicates course time or something else.
Add an OLE DB Destination and land the data.
Data Flow 3 - Many to Many
There's a few different ways to handle this. I'd favor the lazy way and repeat our approach from the first data flow
SELECT D.*
FROM
(
SELECT S.Student1 AS StudentName, S.[Lesson Id]
FROM dbo.MyTable AS S
UNION
SELECT S.Student2 AS StudentName, S.[Lesson Id]
FROM dbo.MyTable AS S
UNION
SELECT S.Student3 AS StudentName, S.[Lesson Id]
FROM dbo.MyTable AS S
UNION
SELECT S.Student4 AS StudentName, S.[Lesson Id]
FROM dbo.MyTable AS S
)D
WHERE D.StudentName IS NOT NULL
ORDER BY D.StudentName;
And then land in your bridge table with an OLE DB Destination and be done with it.
If this has been homework/an assignment to have you learn the native components...
Do keep with the 3 data flow approach. Trying to do too much in one go is a recipe for trouble.
The operation of moving wide data to narrow data is an Unpivot operation. You'd use that in the Student and bridge table data flows but honestly, I think I've used that component less than 10 times in my career and/or answering SSIS questions here and I do a lot of that.
If the Unpivot operation generates a NULL, then yes, you'd likely want to use a Conditional Split to filter those rows out.
If your reference tables were more complex, then you'd likely be adding a Lookup component to your bridge table population step to retrieve the surrogate key.

Related

how to use case to combine spelling variations of an item in a table in sql

I have two SQL tables, with deviations of the spellings of department names. I'm needing to combine those using case to create one spelling of the location name. Budget_Rc is the only one with same spelling in both tables. Here's an example:
Table-1 table-2
Depart_Name Room_Loc Depart_Name Room_Loc
1. Finance_P1 P144 1. Fin_P1 P1444
2. Budget_Rc R2c 2. Budget_Rc R2c
3. Payroll_P1_2 P1144 3. Finan_P1_1 P1444
4. PR_P1_2 P1140
What I'm needing to achieve is for the department to be 1 entity, with one room location. These should show as one with one room location in the main table (Table-1).
Depart_Name Room_Loc
1. Finance_P1 F144
2. Budget_Rc R2c
3. Payroll_P1_2 P1144
Many many thanks in advance!
I'd first try a
DECLARE #AllSpellings TABLE(DepName VARCHAR(100));
INSERT INTO #AllSpellings(DepName)
SELECT Depart_Name FROM tbl1 GROUP BY Depart_Name
UNION
SELECT Depart_Name FROM tbl2 GROUP BY Depart_Name;
SELECT DepName
FROM #AllSpellings
ORDER BY DepName
This will help you to find all existing values...
Now you create a clean table with all Departments with an IDENTITY ID-column.
Now you have two choices:
In case you cannot change the table's layout
Use the upper select-statement to find all existing entries and create a mapping table, which you can use as indirect link
Better: real FK-relation
Replace the department's names with the ID and let this be a FOREIGN KEY REFERENCE
Can more than one department be in a Room?
If so then its harder and you can't really write a dynamic query without having a list of all the possible one to many relationships such as Finance has the department key of FIN and they have these three names. You will have to define that table to make any sort of relationship.
For instance:
DEPARTMENT TABLE
ID NAME ROOMID
FIN FINANCE P1444
PAY PAYROLL P1140
DEPARTMENTNAMES
ID DEPARTMENTNAME DEPARTMENTID
1 Finance_P1 FIN
2 Payroll_P1_2 PAY
3 Fin_P1 FIN
etc...
This way you can correctly match up all the departments and their names. I would use this match table to get the data organized and normalized before then cleaning up all your data and then just using a singular department name. Its going to be manual but should be one time if you then clean up the data.
If the room is only ever going to belong to one department you can join on the room which makes it a lot easier.
Since there does not appear any solid rule for mapping department names from table one to table two, the way I would approach this is to create a mapping table. This mapping table will relate the two department names.
mapping
Depart_Name_1 | Depart_Name_2
-----------------------------
Finance_P1 | Fin_P1
Budget_Rc | Budget_Rc
Payroll_P1_2 | PR_P1_2
Then, you can do a three-way join to bring everything into a single result set:
SELECT t1.*, t2.*
FROM table1 t1
INNER JOIN mapping m
ON t1.Depart_Name = m.Depart_Name_1
INNER JOIN table2 t2
ON m.Depart_Name_2 = t2.Depart_Name
It may seem tedious to create the mapping table, but it may be unavoidable here. If you can think of a way to automate it, then this could cut down on the time spent there.

SQL SERVER - Retrieve Last Entered Data

I've searched for long time for getting last entered data in a table. But I got same answer.
SELECT TOP 1 CustomerName FROM Customers
ORDER BY CustomerID DESC;
My scenario is, how to get last data if that Customers table is having CustomerName column only? No other columns such as ID or createdDate I entered four names in following order.
James
Arun
Suresh
Bryen
Now I want to select last entered CustomerName, i.e., Bryen. How can I get it..?
If the table is not properly designed (IDENTITY, TIMESTAMP, identifier generated using SEQUENCE etc.), INSERT order is not kept by SQL Server. So, "last" record is meaningless without some criteria to use for ordering.
One possible workaround is if, by chance, records in this table are linked to some other table records (FKs, 1:1 or 1:n connection) and that table has a timestamp or something similar and you can deduct insertion order.
More details about "ordering without criteria" can be found here and here.
; with cte_new as (
select *,row_number() over(order by(select 1000)) as new from tablename
)
select * from cte_new where new=4

SQL JOIN all tables from one data

I am trying to get all the data from all tables in one DB.
I have looked around, but i haven't been able to find any solution that works with my current problems.
I made a C# program that creates a table for each day the program runs. The table name will be like this tbl18_12_2015 for today's date (Danish date format).
Now in order to make a yearly report i would love if i can get ALL the data from all the tables in the DB that stores these reports. I have no way of knowing how many tables there will be or what they are called, other than the format (tblDD-MM-YYYY).
in thinking something like this(that obviously doesen't work)
SELECT * FROM DB_NAME.*
All the tables have the same columns, and one of them is a primary key, that auto increments.
Here is a table named tbl17_12_2015
ID PERSONID NAME PAYMENT TYPE RESULT TYPE
3 92545 TOM 20,5 A NULL NULL
4 92545 TOM 20,5 A NULL NULL
6 117681 LISA NULL NULL 207 R
Here is a table named tbl18_12_2015
ID PERSONID NAME PAYMENT TYPE RESULT TYPE
3 117681 LISA 30 A NULL NULL
4 53694 DAVID 78 A NULL NULL
6 58461 MICHELLE NULL NULL 207 R
What i would like to get is something like this(from all tables in the DB):
PERSONID NAME PAYMENT TYPE RESULT TYPE
92545 TOM 20,5 A NULL NULL
92545 TOM 20,5 A NULL NULL
117681 LISA NULL NULL 207 R
117681 LISA 30 A NULL NULL
53694 DAVID 78 A NULL NULL
58461 MICHELLE NULL NULL 207 R
Have tried some different query's but none of them returned this, just a lot of info about the tables.
Thanks in advance, and happy holidays
edit: corrected tbl18_12_2015 col 3 header to english rather than danish
Thanks to all those who tried to help me solving this question, but i can't (due to my skill set most likely) get the UNION to work, so that's why i decided to refactor my DB.
While you could store the table names in a database and use dynamic sql to union them together, this is NOT a good idea and you shouldn't even consider it - STOP NOW!!!!!
What you need to do is create a new table with the same fields - and add an ID (auto-incrementing identity column) and a DateTime field. Then, instead of creating a new table for each day, just write your data to this table with the DateTime. Then, you can use the DateTime field to filter your results, whether you want something from a day, week, month, year, decade, etc. - and you don't need dynamic sql - and you don't have 10,000 database tables.
I know some people posted comments expressing the same sentiments, but, really, this should be an answer.
If you had all the tables in the same database you would be able to use the UNION Operator to combine all your tables..
Maybe you can do something like this to select all the tables names from a given database
For SQL Server:
SELECT TABLE_NAME
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE = 'BASE TABLE' AND TABLE_CATALOG='dbName'
For MySQL:
SELECT TABLE_NAME
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE = 'BASE TABLE' AND TABLE_SCHEMA='dbName'
Once you have the list of tables you can move all the tables to 1 database and create your report using Unions..
You will need to use a UNION between each select query.
Do not use *, always list the name of the columns you are bringing up.
If you want duplicates, then UNION ALL is what you want.
If you want unique records based on the PERSONID, but there is likely to be differences, then I will guess that an UPDATE_DATE column will be useful to determine which one to use but what if each records with the same PERSONID lived a life of its own on each side?
You'd need to determine business rules to find out which specific changes to keep and merge into the unique resulting record and you'd be on your own.
What is "Skyttenavn"? Is it Danish? If it is the same as "NAME", you'd want to alias that column as 'NAME' in the select query, although it's the order of the columns as listed that counts when determining what to unite.
You'd need a new auto-incremented ID as a unique primary key, by the way, if you are likely to have conflicting IDs. If you want to merge them together into a new primary key identity column, you'd want to set IDENTITY_INSERT to OFF then back to ON if you want to restart natural incrementation.

break normalization rules , creat table /view from muliple tables

I have a question and am not sure if this is the correct forum to post it .
I have two tables, StudentTable and CourseTable, where each student takes more than one course.
Example: Student1 takes 2 courses: (C1, C2).
Student2 takes 3 courses (C1, C2, C3).
I need to create a table/view that contains student information from StudentTable, plus all the courses and the score for each course from CourseTable - in one row.
Example:
Row1= Student1_Id, C1_code, C1_name, C1_Score, C2_code, C2_name, C2_Score
Row2=
Student2_Id, C1_code, C1_name, C1_Score, C2_code, C2_name, C2_Score, C3_code, C3_name, C3_Score
Since Student1 has just two courses, I should enter NULL in 'Course 3 fields'
My struggle is in the insert statement. I tried the following but it showed an error.
Insert Into Newtable
( St_ID, C1_code,c1_name, C1_Score ,C2_code ,C2_name,C2_score,C3_code ,C3_name,C3_score)
Select
(Select St_ID from StudentTable)
,
(Select C_code,c_name,c_Score
from Coursetable,SudentTable
where course.Stid =Studet.stid)
,
(Select C_code,c_name,c_Score
from course ,student
where course.Stid =Studet.stid ),
(Select C_code,c_name,c_Score
from course ,student
where course.Stid =Studet.stid );
I'm fully aware that the New table/View will break the rules of normalization, but I need it for a specific purpose.
I tried also the PIVOT BY functionality but no luck with it.
FYI, I'm not expert in SQL syntax. I just know the basics.
I will be great full for any helpful suggestions to try.
I added My DB structure so you can have better Idea
First Table is Member table which Represent Students Information .The fields in this table are
member_sk (PrimaryKey), full_or_part_time, gender, age_at_entry, age_band_at_entry, disability, ethnicity,
widening_participation_level, nationality
Second Table is Modules table which include the Courses' scores that Student took .
The fields in this table are
Module_result_k(Primary Key), member_sk(Foreign key to connect to Member table), member_stage_sk ,module_k(Foreign key to connect to Module table), module_confirmed_grade_src, credit_or_result
Third Table is AllModuleInfo which is include general information for each course .The fields in this table are
Module_k (Primary key), module_name ,module_code, Module_credit, Module stage.
The New table that I will create has the following fields
member_sk (PrimaryKey), full_or_part_time, gender, age_at_entry, age_band_at_entry, disability, ethnicity,
widening_participation_level, nationality " This will be retrieved from Member table"
Also will include
Module 1_name ,module1_code, Module1_credit, Module1_ stage, member1_stage_sk , module1_confirmed_grade_src, credit1_or_result
Module 2_name ,module2_code, Module2_credit, Module2_ stage, member2_stage_sk , module2_confirmed_grade_src, credit2_or_result
-
-
-
I will repeat this fields 14 times which is equal to Maximum courses number that any of the students took.
//// I hope now my questions become more clear

How to automatically set the SNo of the row in MS SQL Server 2005?

I have a couple of rows in a database table (lets call it Customer). Each row is numbered by SNo, which gets automatically incremented by the identity property inherent in MS SQLServer. But when I delete a particular row that particular row number is left blank, but I want the table to auto correct itself.
To give you a example:
I have a sample Customer Table with following rows:
SNo CustomerName Age
1 Dani 28
2 Alex 29
3 Duran 21
4 Mark 24
And suppose I delete 3rd row the table looks like this:
SNo CustomerName Age
1 Dani 28
2 Alex 29
4 Mark 24
But I want the table to look like this:
SNo CustomerName Age
1 Dani 28
2 Alex 29
3 Mark 24
How can I achieve that?
Please help me out
Thanks in anticipation
As has been pointed out doing that would break anything in a relationship with SNo, however if your doing this because you need ordinal numbers in you presentation layer for example, you can pull off a [1..n] row number with;
SELECT ROW_NUMBER() OVER(ORDER BY SNo ASC), SNo, CustomerName, Age FROM Customer
Obviously in this case the row number is just an incrementing number, its meaningless in relation to anything else.
I don't think you want to do that. Imagine the scenario where you have another table CustomerOrder that stores all customer orders. The structure for that table might look something like this:
CustomerOrder
-------------
OrderID INT
SNo INT
OrderDate DATETIME
...
In this case, the SNo field is a foreign key into the CustomerOrder table, and we use it to relate orders to a customer. If you delete a record from your Customer table (say with SNo = 1), are you going to go back and update the SNo values in the entire CustomerOrder table? It's best to just let the ID's autoincrement and not worry about spaces in the IDs due to deletions.
Why not create a view?
CREATE VIEW <ViewName>
AS
SELECT
ROW_NUMBER() OVER(ORDER BY SNo ASC) AS SNo
,CustomerName
,Age
FROM Customers
GO
Then access the data in customers table by selecting from the view.
Of course the SNo shown by the view has no meaning in the context of relationships, but the data returned will look exactly like you want it to look.
Using transactions when inserting records in the Database with C#
You have to use DBCC CHECKIDENT(table_name, RESEED, next_val_less_1);
As have been pointed out in other answers, this is a bad idea, and if the reason is for a presentation there are other solutions.
-- Add data to temp table
select SNo, CustomerName, Age
into #Customer
from Customer
-- Truncate Customer
-- Resets identity to seed value for column
truncate table Customer
-- Add rows back to Customer
insert into Customer(CustomerName, Age)
select CustomerName, Age
from #Customer
order by SNo
drop table #Customer

Resources