As a Modeler trying to find out what is the best way to handle deletes in SCD Type 2 tables.
As per principle an SCD Type 2 table is to track changes using ETL dates like START_DT and END_DT.
START_DT will be the date the record is effective from.
END_DT will be the date it changed to another form or Null/High Date to denote current version of record.
At all point of time, for a Key Combination there will be a Current Version record with END_DT either Null or High Date.
Now if a record is deleted from Source, what is the best option from below,
Have additional column like SRC_DELETE_IND which is set 'N' as default and 'Y' if record is deleted from source.
Ex: Record came on 1st Oct
PK_ID START_DT END_DT VALUE SRC_DELETE_IND
1 2021-10-01 Null ABC N
Record had an update on 2nd Oct
PK_ID START_DT END_DT VALUE SRC_DELETE_IND
1 2021-10-01 2021-10-02 ABC N
1 2021-10-02 Null XYZ N
Record got deleted on 3rd Oct
PK_ID START_DT END_DT VALUE SRC_DELETE_IND
1 2021-10-01 2021-10-02 ABC N
1 2021-10-02 Null XYZ Y
Same as 1 but insert new duplicate row when Delete Came.
Record got deleted on 3rd Oct
PK_ID START_DT END_DT VALUE SRC_DELETE_IND
1 2021-10-01 2021-10-02 ABC N
1 2021-10-02 2021-10-03 XYZ N
1 2021-10-03 Null XYZ Y
Instead of SRC_DELETE_IND expire/end date the record
Record got deleted on 3rd Oct
PK_ID START_DT END_DT VALUE
1 2021-10-01 2021-10-02 ABC
1 2021-10-02 2021-10-03 XYZ
But here we now dont have a Open record left.
Complexity is added if the record reappears in the Source stating as a Incorrect delete. Lets say on 10th
For Option 1,
Data will look like,
PK_ID START_DT END_DT VALUE SRC_DELETE_IND
1 2021-10-01 2021-10-02 ABC N
1 2021-10-02 Null XYZ N --Reversed
FOr Option 2
PK_ID START_DT END_DT VALUE SRC_DELETE_IND
1 2021-10-01 2021-10-02 ABC N
1 2021-10-02 2021-10-03 XYZ N
1 2021-10-03 Null XYZ N -- Reversed but now row is duplicate
For Option 3
PK_ID START_DT END_DT VALUE
1 2021-10-01 2021-10-02 ABC
1 2021-10-02 2021-10-03 XYZ
1 2021-10-10 Null XYZ --considered as New since no open record existed. Creates ETL gap
Which option makes more sense and is as per DWH best practices.
I would go for a simpler way, put a default END_DATE for the deleted records like 1000-12-31 :
PK_ID START_DT END_DT VALUE
1 2021-10-01 2021-10-02 ABC
1 2021-10-02 1000-12-31 XYZ --> this row is deleted
Also, avoid using NULL values.
A NULL value indicates a lack of a value, which is not the same thing as a value of zero. SQL NULL is a state, not a value. This usage is quite different from most programming languages, where null value of a reference means it is not pointing to any object.
I recommend you to use a default date for END_DT for example 9999-12-31 so when inserting a row, your dimension will be like below :
PK_ID START_DT END_DT VALUE
1 2021-10-01 9999-12-31 ABC
instead of :
PK_ID START_DT END_DT VALUE
1 2021-10-01 NULL ABC
I recommend you to add a surrogate key to your dimensions.
A dimension table is designed with one column serving as a unique primary key. This primary key cannot be the operational system’s natural key because there will be multiple dimension rows for that natural key when changes are tracked over time. In addition, natural keys for a dimension may be created by more than one source system, and these natural keys may be incompatible or poorly administered. The DW/BI system needs to claim control of the primary keys of all dimensions; rather than using explicit natural keys or natural keys with appended dates, you should create anonymous integer primary keys for every dimension. These dimension surrogate keys are simple integers, assigned in sequence, starting with the value 1, every time a new key is needed. The date dimension is exempt from the surrogate key rule; this highly predictable and stable dimension can use a more meaningful primary key.
Related
I am using VB.NET and SQL Server 2012.
I have a SQL Database named DB_COLLECTOR,
as well as a table named Fee_Payment,
with 6 columns named:
'S_No' Int Primary key identity(1,1),
'Date_Start' datetime Null,
'Date_End' datetime Null,
'Prefixed_Fee' decimal(10) Null,
'Paid_Amount' decimal(10) Null,
'Balance' decimal(10) Null
I also have 2 Forms:
The 1st Form takes Person Name, Deal_Start_Date, Deal_End_Date, Monthly_Fee and saves it into a Customer_Master table in the database.
And, another form, which takes Person Name, Payment_Amount and saves that 'Fee_Payment' table.
The problem:
If I enter a Deal_Start_Date of 01/04/2018 and a Deal_End_Date of 31/03/2019, for a customer named "John" who has a monthly fee of $50.00 on the 1st Form, there are three things that should happen:
It should automatically add 12 Rows into the Fee_Payment table with each row having the Date_Start as 01-Apr-2018, and Date_End as 30-Apr-2018, apart from the relevant data saved to Customer_Master. Like this:
S_No Date_Start Date_End Prefixed_Fee Paid_Amount Balance
---- ---------- -------- ------------ ----------- -------
1 01-Apr-2018 30-Apr-2018 Null Null Null
2 01-May-2018 31-May-2018 Null Null Null
3 01-Jun-2018 30-Jun-2018 Null Null Null
.. .... .... .. .. ..
12 01-Mar-2019 31-Mar-2019 Null Null Null
It should, also, automatically replace Null with $50.00 in the Prefixed_Fee Column of the Apr-2018 row (the top one) if the date has come under the range of 01-Apr-2018 to 30-Apr-2018 i.e. if the current date is 02-Apr-2018. Like this:
S_No Date_Start Date_End Prefixed_Fee Paid_Amount Balance
---- ---------- -------- ------------ ----------- -------
1 01-Apr-2018 30-Apr-2018 50.00 Null 50.00
2 01-May-2018 31-May-2018 Null Null Null
3 01-Jun-2018 30-Jun-2018 Null Null Null
.. .... .... .. .. ..
12 01-Mar-2019 31-Mar-2019 Null Null Null
So, if the customer, John, had paid $125.00, it should allocate $50.00 (from $125.00) to the first row, by filling the amount mentioned in Prefixed_Fee and then from the Balance, put $50.00 into second row and, finally, the Balance amount of $25.00 to the third row.
S_No Date_Start Date_End Prefixed_Fee Paid_Amount Balance
---- ---------- -------- ------------ ----------- -------
1 01-Apr-2018 30-Apr-2018 50.00 50.00 0.00
2 01-May-2018 31-May-2018 50.00 50.00 0.00
3 01-Jun-2018 30-Jun-2018 50.00 25.00 25.00
.. .... .... .. .. ..
12 01-Mar-2019 31-Mar-2019 Null Null Null
How do I do this?
Please find below input and required output. i need a query/procedure/function in T-SQL to get this output.
Requirement: I have table a and table b.
get all the date ranges from table b and missing date ranges from table a (when compared with table b).
Basically we need to make sure all the date ranges in table a, need to be covered in the output
Input
table b
Start date End date ID
1/1/2009 9/30/2009 1
1/1/2013 9/30/2013 1
11/1/2014 11/30/2014 1
2/2/2015 12/31/2016 1
table a
1/1/2009 12/31/2011 1
1/1/2013 9/30/2013 1
1/1/2014 4/30/2014 1
10/1/2014 12/31/2014 1
2/2/2015 12/31/9999 1
Output
table b
Start date End date ID
1/1/2009 9/30/2009 1
1/1/2013 9/30/2013 1
11/1/2014 11/30/2014 1
2/2/2015 12/31/2016 1
table a
10/1/2009 12/31/2011 1
1/1/2014 4/30/2014 1
10/1/2014 10/31/2014 1
12/1/2014 12/31/2014 1
1/1/2017 12/31/9999 1
Table [a] contain 4 records and table[b] contain 5 records. And you need that 5th record to be inserted into table [b]. If this is correct then...
Do simple outer join on table [a] and [b] on start data and end data. Get the value from table [a] where you find respective row NULL in table [b]
seems simple as it is :-)
Happy coding!!!
I have below SQL table:
Id | Code | DateTime1 | DateTime2
1 3AA2 2017-02-01 14:23:00.000 2017-02-01 20:00:00.000
2 E323 2017-02-12 17:34:34.032 2017-02-12 18:34:34.032
3 DFG3 2017-03-08 09:20:10.032 2017-03-08 12:30:10.032
4 LKF0 2017-04-24 11:14:00.000 2017-04-24 13:40:00.000
5 DFG3 2017-04-20 13:34:42.132 2017-04-20 15:12:12.132
6 DFG3 2017-04-20 13:34:42.132 NULL
Id is an auto numeric field.
Code is string and Datetime1 and DateTime2 are datetime type. Also DateTime1 cannot be null but datetime2 can be.
I would like to obtain the last row by datetime1 (MAX datetime1, most recent one) that match a concrete code and it has datetime2 set to NULL.
For example, taken into account above table, for code DFG3 I would like to obtain row with Id=6, its max date for datetime1, that is "2017-04-20 13:34:42.132"
But now imagine the following case:
Id | Code | DateTime1 | DateTime2
1 3AA2 2017-02-01 14:23:00.000 2017-02-01 20:00:00.000
2 E323 2017-02-12 17:34:34.032 2017-02-12 18:34:34.032
3 DFG3 2017-03-08 09:20:10.032 2017-03-08 12:30:10.032
4 LKF0 2017-04-24 11:14:00.000 2017-04-24 13:40:00.000
5 DFG3 2017-04-20 13:34:42.132 NULL
6 DFG3 2017-05-02 16:34:34.032 2017-05-02 21:00:00.032
Again, taken into account above table, I would like to obtain the same, that is, the last row by datetime1 (Max datetime1, most recent one) that match a concrete code and it has datetime2 set to NULL.
Then, in this last case for code DFG3 no rows must be return because row with Id=6 is the last by datetime1 (most recent) for code DFG3 but is not NULL.
How can I do this?
Can you try this query and let me know if it works for your case
Select * From [TableName] where [Code]='DFG3' and [datetime2] is null and [datetime1] = (select max([datetime1]) from [TableName] where [Code]='DFG3')
This bring you all the latest code on your table, then you select only the one with datetime2 is null
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Code
ORDER BY DateTime1 Desc) as rn
FROM yourTable
) as T
WHERE rn = 1 -- The row with latest date for each code will have 1
and dateTime2 IS NULL
and code = 'DFG3' -- OPTIONAL
I need to write a DAX statement which is somewhat complex from a conceptual/logical standpoint- so this might be hard to explain.
I have two tables.
On the first table (shown below) I have a list of numeric values (Wages). For each value I have a corresponding date range. I also have EmployeeID and FunctionID. The purpose of this table is to keep track of the hourly Wages paid to employees performing specific functions during specific date ranges. Each Function has it's own Wage on the Wage table, BUT each employee might get paid a different Wage for the same Function ( there is also a dimension for functions and employees ).
'Wages'
Wage StartDate EndDate EmployeeID FunctionID
20 1/1/2016 1/30/2016 3456 20
15 1/15/2016 2/12/2016 3456 22
27.5 1/20/2016 2/20/2016 7890 20
20 1/21/2016 2/10/2016 1234 19
On 'Table 2' I have a record for every day that an Employee worked a certain Function. Remember, Table 1 contains the Wage information for every function.
'Table 2'
Date EmployeeID FunctionID DailyWage
1/1/2016 1234 $20 =CALCULATE( SUMX( ??? ) )
1/2/2016 1234 $20 =CALCULATE( SUMX( ??? ) )
1/3/2016 1234 $22 see below
1/4/2016 1234 $22
1/1/2016 4567 $27
1/2/2016 4567 $27
1/3/2016 4567 $27
(Note that wages can change over time)
What I'm trying to do is create a Calculated Column on 'Table 2' called 'DailyWage'. I want every row on 'Table 2' to tell me how much the EmployeeID was paid for the full day (assuming an 8 hour workday).
I'm really struggling with the logic steps, so I'm not sure what the best way to do this calculation is...
To make things worse, an EmployeeID might get paid a different Wage for the same Function on a different Date. They might start out at one wage working function X and then generally, their wage should go up a few months in the future... That means that if I try to concatenate the EmployeeID and the FunctionID, I won't be able to connect the tables on the concatenated value because neither table will have unique values.
So in other words, if we CONCATENATE the EmployeeID and FunctionID into EmpFunID, we need to take the EmpFunID + the date for the current row and then say "take the EmpFunID in the current row, plus the date for the current row and then return the value from the Wage column on the Wages table that has the same EmpFunID AND has a StartDate less that the CurrentRowDate AND has an EndDate greater than the CurrentRowDate
HERE IS WHAT I HAVE SO FAR:
Step 1 = Filter 'Wages' table so that StartDate < CurrentRowDate
Step 2 = Filter 'Wages' table so that EndDate > CurrentRowDate
Step 3 = LOOKUPVALUE( 'Wages'[Wage], 'Wages'[EmpFunID], Table2[EmpFunID])
Now I just need that converted into a DAX function.
Not sure if got it totally right, but maybe something similar? If you put this into Table2 as a calculated column, it will transform the current row context of the Table2 into a filter context.
So SUMX will use the current row's data from Table2, and will do a sum on a filtered version of the wages table: wages table will be filtered by using the current date, employeeid and functionid from Table2, and for each row in the Table2 itt will only sum those wages, which are belong to the current row.
CALCULATE(
SUMX(
FILTER(
'Wages',
'Wages'[StartDate] >= 'Table2'[Date],
'Wages'[EndDate] <= 'Table2'[Date],
'Wages'[EmployeeId] = 'Table2'[EmployeeId],
'Wages'[FunctionId] = 'Table2'[FunctionId]
),
'Wages'[Wage]
)
Can we create pivot table with Multiple columns and each column contains multiple rows.
For example...........
Database Table:
BatchID BatchName Chemical Value
--------------------------------------------------------
BI-1 BN-1 CH-1 1
BI-2 BN-2 CH-2 2
--------------------------------------------------------
This is the table , i need to display like below in Excel Sheet
BI-1 BI-2
BN-1 BN-2
------------------------------------------
CH-1 1 null
------------------------------------------
CH-2 null 2
------------------------------------------
Here BI-1,BN-1 are two rows in a single columns i need to display chemical value as row of that.
Could Please help me to solve this problem.
Thank You.
Its bit difficult to get it in the format you want however a closer option would be as below -
Sum of Value BatchName BatchID
BN-1 BN-1 Total BN-2 BN-2 Total Grand Total
Chemical BI-1 BI-2
CH-1 1 1 NULL NULL 1
CH-2 NULL NULL 1 1 1
Grand Total 1 1 1 1 2
Process -
Create a pivot table
Add checmical to row field
Add Batch Name and Batch ID fields to column field
Add value to data items field.
Hope this helps... Cheers..