T-SQL for loop through column names and insert - sql-server

I'm working on data warehouse project, and i'm stuck on moment, where i need to loop through column names in dimension table and select value coresponding to specific column name in my base data table (the one with actual data, that i want to insert into fact table). Here's my table structure:
Data Table
closing_course | max_course | min_course
234 | 241 | 187
254 | 277 | 198
Dimension Table
course_id | course_type
1 | closing_course
2 | max_course
3 | min_course
In short i want to build a procedure that FOR EVERY COURSE TYPE it will get the VALUE of each course and insert course_id and coresponding value inside FACT TABLE (among other dimension data but i think i can handle that).

I am not quite sure what you are looking for, maybe you could show an example what do you want to achieve. Here a some possible solution:
INSERT INTO FactTable (courseId,value)
SELECT 1, closing_course FROM DataTable
UNION
SELECT 2, max_course FROM DataTable
UNION
SELECT 3, min_course FROM DataTable

Related

Does taking advantage of dynamic columns in Cassandra require duplicated data in each row?

I've been trying to understand how one would model time series data in Cassandra, like shown in the below image from a popular System Design Interview video, where counts of views are stored hourly.
While I would think the schema for this time series data would be something like the below, I don't believe this would lead to data actually being stored in the way the screenshot shows.
CREATE table views_data {
video_id uuid
channel_name varchar
video_name varchar
viewed_at timestamp
count int
PRIMARY_KEY (video_id, viewed_at)
};
Instead, I'm assuming it would lead to something like this (inspired by datastax), where technically there is a single row for each video_id, but the other columns seem like they would all be duplicated, such as channel_name, video_name, etc.. within the row for each unique viewed_at.
[cassandra-cli]
list views_data;
RowKey: A
=> (channel_name='System Design Interview', video_name='Distributed Cache', count=2, viewed_at=1370463146717000)
=> (channel_name='System Design Interview', video_name='Distributed Cache', count=3, viewed_at=1370463282090000)
=> (channel_name='System Design Interview', video_name='Distributed Cache', count=8, viewed_at=1370463282093000)
-------------------
RowKey: B
=> (channel_name='Some other channel', video_name='Some video', count=4, viewed_at=1370463282093000)
I assume this is still considered dynamic wide row, as we're able to expand the row for each unique (video_id, viewed_at) combination. But it seems less than ideal that we need to duplicate the extra information such as channel_name and video_name.
Is the screenshot of modeling time series data misleading or is it actually possible to have dynamic columns where certain columns in the row do not need to be duplicated?
If I was upserting time series data to this row, I wouldn't want to have to provide the channel_name and video_name for every single upsert, I would just want to provide the count.
No, it is not necessary to duplicate the values of columns within the rows of a partition. It is possible to model your table to accomodate your use case.
In Cassandra, there is a concept of "static columns" -- columns which have the same value for all rows within a partition.
Here's the schema of an example table that contains two static columns, colour and item:
CREATE TABLE statictbl (
pk int,
ck text,
c int,
colour text static,
item text static,
PRIMARY KEY (pk, ck)
)
In this table, each partition share the same colour and item for all rows of the same partition. For example, partition pk=1 has the same colour='red' and item='apple' for all rows:
pk | ck | colour | item | c
----+----+--------+--------+----
1 | a | red | apple | 12
1 | b | red | apple | 23
1 | c | red | apple | 34
If I insert a new partition pk=2:
INSERT INTO statictbl (pk, ck, colour, item, c) VALUES (2, 'd', 'yellow', 'banana', 45)
we get:
pk | ck | colour | item | c
----+----+--------+--------+----
2 | d | yellow | banana | 45
If I then insert another row withOUT specifying a colour and item:
INSERT INTO statictbl (pk, ck, c) VALUES (2, 'e', 56)
the new row with ck='e' still has the colour and item populated even though I didn't insert a value for them:
pk | ck | colour | item | c
----+----+--------+--------+----
2 | d | yellow | banana | 45
2 | e | yellow | banana | 56
In your case, both the channel and video names will share the same value for all rows in a given partition if you declare them as static and you only ever need to insert them once. Note that when you update the value of static columns, ALL the rows for that partition will reflect the updated value.
For details, see Sharing a static column in Cassandra. Cheers!

How to add data to a single column

I have a question in regards to adding data to a particular column of a table, i had a post yesterday where a user guided me (thanks for that) to what i needed and said an update was the way to go for what i need, but i still can't achieve my goal.
i have two tables, the tables where the information will be added from and the table where the information will be added to, here is an example:
source_table (has only a column called "name_expedient_reviser" that is nvarchar(50))
name_expedient_reviser
kim
randy
phil
cathy
josh
etc.
on the other hand i have the destination table, this one has two columns, one with the ids and the other where the names will be inserted, this column values are null, there are some ids that are going to be used for this.
this is how the other table looks like
dbo_expedient_reviser (has 2 columns, unique_reviser_code numeric PK NOT AI, and name_expedient_reviser who are the users who check expedients this one is set as nvarchar(50)) also this is the way this table is now:
dbo_expedient_reviser
unique_reviser_code | name_expedient_reviser
1 | NULL
2 | NULL
3 | NULL
4 | NULL
5 | NULL
6 | NULL
what i need is the information of the source_table to be inserted into the row name_expedient_reviser, so the result should look like this
dbo_expedient_reviser
unique_reviser_code | name_expedient_reviser
1 | kim
2 | randy
3 | phil
4 | cathy
5 | josh
6 | etc.
how can i pass the information into this table? what do i have to do?.
EDIT
the query i saw that should have worked doesn't update which is this one:
UPDATE dbo_expedient_reviser
SET dbo_expedient_reviser.name_expedient_reviser = source_table.name_expedient_reviser
FROM source_table
JOIN dbo_expedient_reviser ON source_table.name_expedient_reviser = dbo_expedient_reviser.name_expedient_reviser
WHERE dbo_expedient_reviser.name_expedient_reviser IS NULL
the query was supposed to update the information into the table, extracting it from the source_table as long as the row name_expedient_reviser is null which it is but is doesn't work.
Since the Names do not have an Id associated with them I would just use ROW_NUMBER and join on ROW_NUMBER = unique_reviser_code. The only problem is, knowing what rows are null. From what I see, they all appear null. In your data, is this the case or are there names sporadically in the table like 5,17,29...etc? If the name_expedient_reviser is empty in dbo_expedient_reviser you could also truncate the table and insert values directly. Hopefully that unique_reviser_code isn't already linked to other things.
WITH CTE (name_expedient_reviser, unique_reviser_code)
AS
(
SELECT name_expedient_reviser
,ROW_NUMBER() OVER (ORDER BY name_expedient_reviser)
FROM source_table
)
UPDATE er
SET er.name_expedient_reviser = cte.name_expedient_reviser
FROM dbo_expedient_reviser er
JOIN CTE on cte.unique_reviser_code = er.unique_reviser_code
Or Truncate:
Truncate Table dbo_expedient_reviser
INSERT INTO dbo_expedient_reviser (name_expedient_reviser, unique_reviser_code)
SELECT DISTINCT
unique_reviser_code = ROW_NUMBER() OVER (ORDER BY name_expedient_reviser)
,name_expedient_reviser
FROM source_table
it is not posible to INSERT the data into a single column, but to UPDATE and move the data you want is the only way to go in that cases

T-SQL Select, manipulate, and re-insert via stored procedure

The short version is I'm trying to map from a flat table to a new set of tables with a stored procedure.
The long version: I want to SELECT records from an existing table, and then for each record INSERT into a new set of tables (most columns will go into one table, but some will go to others and be related back to this new table).
I'm a little new to stored procedures and T-SQL. I haven't been able to find anything particularly clear on this subject.
It would appear I want to something along the lines of
INSERT INTO [dbo].[MyNewTable] (col1, col2, col3)
SELECT
OldCol1, OldCol2, OldCol3
FROM
[dbo].[MyOldTable]
But I'm uncertain how to get that to save related records since I'm splitting it into multiple tables. I'll also need to manipulate some of the data from the old columns before it will fit into the new columns.
Thanks
Example data
MyOldTable
Id | Year | Make | Model | Customer Name
572 | 2001 | Ford | Focus | Bobby Smith
782 | 2015 | Ford | Mustang | Bobby Smith
Into (with no worries about duplicate customers or retaining old Ids):
MyNewCarTable
Id | Year | Make | Model
1 | 2001 | Ford | Focus
2 | 2015 | Ford | Mustang
MyNewCustomerTable
Id | FirstName | LastName | CarId
1 | Bobby | Smith | 1
2 | Bobby | Smith | 2
I would say you have your OldTable Id to preserve in new table till you process data.
I assume you create an Identity column Id on your MyNewCarTable
INSERT INTO MyNewCarTable (OldId, Year, Make, Model)
SELECT Id, Year, Make, Model FROM MyOldTable
Then, join the new table and above table to insert into your second table. I assume your MyNewCustomerTable also has Id column with Identity enabled.
INSERT INTO MyNewCustomerTable (CustomerName, CarId)
SELECT CustomerName, new.Id
FROM MyOldTable old
JOIN MyNewCarTable new ON old.Id = new.OldId
Note: I have not applied Split of Customer Name to First Name and
Last Name as I was unsure about existing data.
If you don't want your OldId in MyNewCarTable, you can DELETE it
ALTER TABLE MyNewCarTable DROP COLUMN OldId
You are missing a step in your normalization. You do not need to duplicate your customer information per vehicle. You need three tables for 4th Normal form. This will reduce storage size and more importantly allow an update to the customer data to take place in one location.
Customer
CustomerID
FirstName
LastName
Car
CarID
Make
Model
Year
CustomerCar
CustomerCarID
CarID
CustomerID
DatePurchaed
This way you can have multiple owners per car, multiple cars per owner and only one record needs to be updated per car and or customer...4th Normal Form.
If I am reading this correctly, you want to take each row from table 1, and create a new record into table A using some of that row data, and then data from the same original row into Table B, Table C but referencing back to Table A again?
If that's the case, you will create TableA with an Identity and make thats the PK.
Insert the required column data into that table and use the #IDENTITY to retrieve the last identity value, then you will insert the remaining data from the original table into the other tables, TableB, TableC, etc. and use the identity you retrieved from TableA as the FK in the other tables.
By Example:
Table 1 has columns col1, col2, col3, col4, col5
Table A has TabAID, col1, col2
Table B has TabBID, TabAID, col3
TableC has TabCID, TabAID, col4
When the first row is read, the values for col1 & col2 are inserted into TableA.
The Identity is captured from that row inserted, and then value for col3 AND the identity are entered into TableB, and then value for col4 AND the identity are entered into TableC.
This is a standard data migration technique for normalizing data.
Hope this assists,

I want to dynamically change the lookup columns in look up transformer in SSIS

I just want to map dynamically lookup column in LOOKUP Transform SSIS
In my task that look up column will all ways change
For Example:
TableA
```````
Col1 | Col2 | Col3
--------+-----------+---------
1 | 2 | 3
2 | 1 | 4
3 | 2 | 1
This time lookup columns are Col1+Col2
Next day it will change to Col2+Col3
I want to map dynamic input column with ssn
Depending on how the lookup columns are defined from one day to the next, it may be easier to make the query in the Lookup transformation dynamic instead. Check out the following link:
https://suneethasdiary.wordpress.com/2011/12/28/creating-a-dynamic-query-in-lookup-transformation-in-ssis/

References in a table

I have a table like this, that contains items that are added to the database.
Catalog table example
id | element | catalog
0 | mazda | car
1 | penguin | animal
2 | zebra | animal
etc....
And then I have a table where the user selects items from that table, and I keep a reference of what has been selected like this
User table example
id | name | age | itemsSelected
0 | john | 18 | 2;3;7;9
So what I am trying to say, is that I keep a reference to what the user has selected as a string if ID's, but I think this seems a tad troublesome
Because when I do a query to get information about a user, all I get is the string of 2;3;7;9, when what I really want is an array of the items corresponing to those ID's
Right now I get the ID's and I have to split the string, and then run another query to find the elements the ID's correspond to
Is there any easier ways to do this, if my question is understandable?
Yes, there is a way to do this. You create a third table which contains a map of A/B. It's called a Multiple to Multiple foreign-key relationship.
You have your Catalogue table (int, varchar(MAX), varchar(MAX)) or similar.
You have your User table (int, varchar(MAX), varchar(MAX), varchar(MAX)) or similar, essentially, remove the last column and then create another table:
You create a UserCatalogue table: (int UserId, int CatalogueId) with a Primary Key on both columns. Then the UserId column gets a Foreign-Key to User.Id, and the CatalogueId table gets a Foreign-Key to Catalogue.Id. This preserves the relationship and eases queries. It also means that if Catalogue.Id number 22 does not exist, you cannot accidentally insert it as a relation between the two. This is called referential-integrity. The SQL Server mandates that if you say, "This column must have a reference to this other table" then the SQL Server will mandate that relationship.
After you create this, for each itemsSelected you add an entry: I.e.
UserId | CatalogueId
0 | 2
0 | 3
0 | 7
0 | 9
This also alows you to use JOINs on the tables for faster queries.
Additionally, and unrelated to the question, you can also optimize the Catalogue table you have a bit, and create another table for CatalogueGroup, which contains your last column there (catalog: car, animal) which is referenced via a Foreign-Key Relationship in the current Catalogue table definition you have. This will also save storage space and speed up SQL Server work, as it no longer has to read a string column if you only want the element value.

Resources