I have two tables.
1) Staging table with multiple columns and Date being one of them.
Date
9/1/2018
2) A Date Dimension Table which has only one column called as Date
Date
1/1/2018
2/1/2018
3/1/2018
4/1/2018
I am writing a logic in SSIS, where it checks the staging table with the dimension table and inserts the missing dates in the dimension table.
To do this, I use the following logic.
The Lookup component has the correct 1 row input from staging table and returns a value of NULL. So, the insertion fails due to constrains.
I do have re-direct non matched rows to no match output enabled inside the Lookup on screen 1.
Kindly help me with this.
The solution is to change the Lookup operation to add as a New Column :-
That's not the problem. The problem is when a date gets past the lookup and is a duplicate.
Run it through an Aggregate [group by] on the date column before inserting into the dimension.
Make sure you are using the correct date. There will be no lookup matches for the records you want to insert (therefore that column is null by default). You shouldn't even add any columns from lookup for this use.
Related
In my database table, I have a LastUpdated column that describes when the current row was last updated.
What the customer has now asked for is to have a few more DateTime columns in the table to keep track of individual values within the same row
E.g. there's a column called Address and they would like to have an extra column AddressLastUpdated to know when it was last changed.
For some reason, this does not look like a good solution. It is certainly doable. But I am wondering if there's a better way of implementing this. Since if we have this in place for one column, chances are they are going to want to have a LastUpdated column for every column in the table.
Keeping a bridge table with below structure will help.
Structure :
Key Column of the table (e.g. Customer Key / Customer No)
Updated Column Name
Last Updated Date / DateTime
Above solution will help in 2 ways :
Keep the existing table structure intact.
All the future such requests can be easily managed.
I am given with table shown in Image 1
How to use order by statement so that i can get resultant table. I don't know how to solve it. I just try to use order by C, D column but all numm comes upwards irrespective of B Column.
Result given in image 2
Updated
Sorry, I just forgot to mention this table also contain id and this table is already sorted on the basis of id. So i am not even able to sort it by column A. Due to this SQL think whole table is already sorted but I still want to sort on the basis of column
I am working on creating a unique key to find the rows that are changed after the last refresh in the table. So my approach here is to take the PK in the table and also create a md5 column for each row and based on PK and md5, check to see if any of the rows in the table have changed since last time.
What is the best method to create md5 in MS SQL based on query itself? that will take care of all the datatype and null columns also.
I have a 55Gb Fact table where I have to delete some records which later on can be reverted back. Number of deleted records vary between 10 to 100 thousand.
Currently my delete strategy is based on this:
I update the dateKey for the records to be deleted e.g. from positive int 20080122 to negative int -20080122 so that current date filters don't include it.
My thinking here is that instead of moving data out and back in the fact table I make the date out of filter date range and then move it back into filterable date range by use of updates on dateKey.
I would like to hear your views on this delete strategy especially around NCI (non clustered index) behavior. Do you think updating the indexed dateKey is better than moving actual data?
Rather than re-purpose the dateKey column, our standard practice is to add a "soft delete" column to the table, either an "is_deleted" bit column or a "deleted_on" datetime column and use that column to filter out "deleted" rows.
This requires more work on your part as all of your existing queries will have to be modified to use this new column, but now your database doesn't have to do the work of re-indexing or deleting/inserting actual data.
I have simple SSIS data flow which extract records from table A and load them to the table B. Table A and Table B has unique key.
What is the best way to extract and load only new records?
A)if records are on sequential unique key, check the max record in table B and then select record from table A greater than this value.
B)After each row of iteration, save the unique key in some table/xml file. Use this value at the start of task to select records from table A.
I have found the LOOKUP-task solution, but I am still searching another good solutions.