Loading a flexible fact table SSIS - sql-server

I have the following fact table : PlaceId, DateId, StatisticId, StatisticValue.
And I have a dimension with the statistics Ids and its names as the following : StatisticId, StatisticName.
I want to load the fact table with Data with 2 statistics. With this architecture, each row of my data will be represented with 2 rows in my fact table.
The Data has the following attributes : Place,Date,Stat1_Value, Stat2_Value.
How to load my fact table with Ids of these measures and its corresponding Values.
Thank You.

I would use SSIS to move your data into a holding table that has the same columns as your data. Then call a stored procedure that uses SQL to populate your fact table, using UNION to get all the Stat1_Values, and then all the Stat2_Values.

Related

Storing processed results of connection in RDBMS

A csv file contains following two column : admission_number, project_name.
The relationship between two entities are many to many relationships : a specific admission_number can work over multiple projects. A specific project may have multiple admission_number.
Data will be like as follows and initially there are '1000 milion' rows and data will keep on updating on daily basis in this table will go upto 1300 milion rows.
admission_number,project_name
1234567890,ABC1234567
1234567890,ABC1234568
1234567891,ABC1234569
1234567892,ABC1234569
1234567893,ABC1234570
1234567894,ABC1234567
1234567895,ABC1234567
For a specific admission number(lets say 1234567890), i want to know all the admission_number who are working on the same projects (ABC1234567,ABC1234568). The output of above query will be
1234567894,1234567895.
Explanation : Since for admission number '1234567890', projects name are 'ABC1234567' and 'ABC1234568'. On these two projects other 'admission_number' are working as '1234567894','1234567895'
I came up with two solutions, To store the data,RDBMS will be used.
Approach 1 : By using two retrieval query : First query shall return all the projcects_name for a specific 'admission_number' and the second query will retrun all the admission_number for 'project_name'.
select admission_number from table where project_name IN (select project_name from table where admission_number='ABC1234567'.
Approach 2 : In this approach, before going for loading i am preprocessing the results and directly results is storing in database. I am only storing all the connected 'admission_number'.
Eg. For project_name 'ABC1234567', these 3 admission_number '1234567890','1234567894', '1234567895' are working. I want to store all connected admission_number in table with two columns (number,connected_number) like ('1234567890','1234567894'),('1234567890','1234567895'), ('1234567894','1234567895'), and query will work on both columns (number and connected_number).
But in this approach there will be many rows means if a specifc project_name 'p', there are n 'admission_number' than total number of rows will be n(n-1)/2
How can i store all the connected admission_number in RDBMS? Loading of data can be slow, but retrieval should be fast.
Do not optimize the data structure. It would only cause problems.
Create a simple table with two columns for both ID and create index for both columns.
The RDBMS will build and maintain an index of the column values, which will enable fast lookup for a specific record.

Datafactory - dynamically copy subsection of columns from one database table to another

I have a database on SQL Server on premises and need to regularly copy the data from 80 different tables to an Azure SQL Database. For each table the columns I need to select from and map are different - example, TableA - I need columns 1,2 and 5. For TableB I need just column 1. The tables are named the same in the source and target, but the column names are different.
I could create multiple Copy data pipelines and select the source and target data sets and map to the target table structures, but that seems like a lot of work for what is ultimately the same process repeated.
I've so far created a meta table, which lists all the tables and the column mapping information. This table holds the following data:
SourceSchema, SourceTableName, SourceColumnName, TargetSchema, TargetTableName, TargetColumnName.
For each table, data is held in this table to map the source tables to the target tables.
I have then created a lookup which selects each table from the mapping table. It then does a for each loop and does another lookup to get the source and target column data for the table in the foreach iteration.
From this information, I'm able to map the Source table and the Sink table in a Copy Data activity created within the foreach loop, but I'm not sure how I can dynamically map the columns, or dynamically select only the columns I require from each source table.
I have the "activity('LookupColumns').output" from the column lookup, but would be grateful if someone could suggest how I can use this to then map the source columns to the target columns for the copy activity. Thanks.
In your case, you can use the expression in the mapping setting.
It needs your provide an expression and it's data should like this:{"type":"TabularTranslator","mappings":[{"source":{"name":"Id"},"sink":{"name":"CustomerID"}},{"source":{"name":"Name"},"sink":{"name":"LastName"}},{"source":{"name":"LastModifiedDate"},"sink":{"name":"ModifiedDate"}}]}
So you need to add a column named as Translator in your meta table, and it's value should be like the above JSON data. Then use this expression to do mapping:#item().Translator
Reference: https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-schema-and-type-mapping#parameterize-mapping

Data warehouse dimension design technique

When implementing a development process to load data into a SCD2 dimension table, what is the most practical method of doing so for a scenario where there are multiple records in the staging table per BusinessKey in the Dimension table.
The first issue in this scenario is that you have 2 or more records in your staging table that can update the IscurrentFlag, EffectiveToDate.
Is implementing a post process to recalculate IsCurrentRecord and EffectiveToDates after data is loaded the only solution?
Scenario example:
The dimension table is populated from 1 source system.
The source system table (Customer) from which the data is extracted contains history. Multiple updates can be done for the same customer in 1 day resulting in multiple records in the source system table
Dimension table :
Staging Table :

Need help in a MS SQL DB design

Hi I have 2 type of data entry which needs to be stored in db so it can be used for calculations later. Each entry has a unique id for it. The data entry are -
1.
2.
So I have to save this data in DB. With my understanding I thought of the following -
Create 3 tables - Common, Entry1 and Entry2(multiple tables with unique id as name)
The Common table will have a unique entry of each data and which table refer to for the value (Entry1/Entry2).
The Entry1 data is a single line so it can be inserted. But the Entry2 data will require a complete table because of its structure. So whenever we add a type 2 entry then a new table has to be created, which will create a lot of tables.
Or I could save the type2 values in another database and fetch the values from there. So please suggest me a way which is better than this.
I believe that you have 2 entry types with identical structure, but one containing a single row and one containing many.
In this case, I would suggest a single table containing the data for all entries, wtih a second table grouping them together. Even if your input contains a single row, it should still gain an EntryID. Perhaps something like the below:

SQL equivalent query in Qlikview?

In SQL we can write query like:
Select field1,field2,field3,field4,field5,field6,field7
from table1 t1,table2 t2,table3 t3
where t1.field1 = t3.field3 and
t2.field2 = 'USD'
In Qlikview, I have created QVD's for 6 tables, now I want to create a single QVD of these 6 QVD's. Unfortunately these tables don't contain primary keys. So I cant use join. I have tried like this also:
fact:
load *
from
[D:\path\fact*.qvd](qvd);
//To store all qvd's into one qvd.
store fact into [D:\path\facttable.qvd];
This query creates a facttable but only with 2 columns, these columns are of first fact table. Diagram shows it much clear:
As it internally gives the name of all the facts table with fact, fact-1, fact-2 and so on and I have written the query store fact into [D:\path\facttable.qvd]; and in this diagram fact table contains only two columns so it creates the fact table with two columns only.
Please let me know the solution that how can we write this query in Qlikview or how can we create a fact table using all the QVDS?
Thanks in advance.
Since every qvd contains different field names it will create several tables with synthetic keys when you load *.
You can use Concatenate Load to stack each qvd onto one fact table. One simple example would be to first create a Fact table by:
Fact:
Load * INLINE [
dummyField
];
Now you can concatenate the qvd's onto that Fact table:
concatenate(Fact)
load *
from
[D:\path\fact*.qvd](qvd);
//To store all qvd's into one qvd.
store Fact into [D:\path\facttable.qvd];
//drop the dummy field.
drop field dummyField;

Resources