SQL Server - Slowly Changing dimension join - sql-server

I have a fact table and employee "tier" table, let's say.
So the fact table looks sorta like
employee_id call date
Mark 1 1-1-2017
Mark 2 1-2-2017
John 3 1-2-2017
Then there needs to be a data structure for 'tier level' - a slowly changing dimension table. I want to keep this simple -- I can change the structure of this table to whatever, but for now I've created it as such.
employee_id tier1_start ... tier2_start ... tier3_start
Mark 5-1-2016
John 6-1-2016 8-1-2016
Lucy 6-1-2016 10-1-2016
Two important notes. This table sort of operates under the assumption that a promotion will only occur once - aka no demotions and repromotions will occur. Also, it's possible one can jump from tier 1 to tier 3.
I was trying to come up with the best possible query for coming up with a 'tier' dimension (denormalization) for the fact table.
For instance, I want to see the Tier 1 metrics for February, or the Tier 2 metrics for February. Obviously the historically-changing tier dimension must be linked.
The clumsiest way I can think of doing this for now ... is simply joining the fact table on the tier table using employee_id.
Then, doing an even clumsier case statement:
case
when isnull(tier3_start,'0') < date then 'T3'
when isnull(tier2_start, '0') < date then 'T2'
when isnull(tier1_start, '0') < date then 'T1'
else 'other'
end as tier_level
Yes, as you can see this is very clumsy.
I'm thinking maybe I need to change the structure of this a bit.

You're probably better off splitting your tier table in two.
So have a Tier table like this:
TierID Tier
------------------
1 Tier 1
2 Tier 2
3 Tier 3
And an EmployeeTier table:
ID EmpID TierID TierDate
---------------------------------------
1 1 1 Jun 1, 2016
2 1 3 Oct 2, 2016
3 2 1 Jul 10, 2016
4 2 2 Nov 11, 2016
Now you can query the EmployeeTier table and filter on the TierID you're looking for.
This also gives you the ability to promote/demote multiple times. You simply filter by the employee and sort by date to find the current tier.

Related

Editing MS SQL DB with 20 tables to 1 table without data lost

Hello I don't know how to do changes of my MS SQL DB to integrate it to work with new software.
The case is related with software limitations.
Our old software can write, read and work with MS SQL DB with multiple tables but the new software understand only from one table or one view table.
My question is how can I edit my MS SQL DB form 20 tables to do it to be one DB with one table with all data from 20 tables and columns without data lost?
And one last question is true about View Tables in MS SQL that they are read only for applications and software?
Why is it not a good idea to put data from all your 20 tables into one table ?
I will try to explain with an example, since I do not know your database I just think of some tables here
suppose you have a table Clients
ClientID Name Street City
1 John ChuchStreet Denver
2 Anna FlowerStreet Boston
and a table Products
ProductID Name Price
1 Mouse 10
2 Keyboard 30
3 Usb Cable 10
and table Orders
OrderID OrderNumber CLientID TotalAmount
1 123 1 10
2 345 1 20
3 678 2 30
and finally table OrderDetail
OrderDetailID OrderID ProductID Quantity
1 1 1 1
2 2 1 1
3 2 3 1
4 3 2 1
Now to put this into one table, you could do this
ID ClientName ClientStreet ClientCity OrderNumber TotalAmount ProductName ProductPrice ProductQuantity
1 John ChurchStreet Denver 123 10 Mouse 10 1
1 John ChurchStreet Denver 345 20 Mouse 10 1
2 John ChurchStreet Denver 345 20 Usb Cable 10 1
3 Anna FlowerStreet Boston 678 30 Keyboard 30 1
Now you can already see the redundancy,
you need to repeat the address of each customer, time and time again in your table
you need to repeat the ordernumber and total amount time and time again
you need to repeat the productname and price time and time again
Now suppose that John moves to another address, now you have to search for John in every row in the table, and adjust the address
Now suppose a productname changes, again you have to search all rows and update
That is lots of work, very inefficient, and guaranteed to go wrong at some point
Now I only used 4 tables in this example, can you image what will happen if you would merge 20 tables into 1 ?
And the redundancy is not your only problem, what if you want to look at a client, what row should you use ?
What if you want to look at an order, what row should you use ?
What if you want to look at a product, what row should you use ?
In this one table design, you cannot identify a single row for customer, or order anymore. That is because each row contains everyting, there is no distinct row anymore for a customer, or a product, or an order...
Merging all tables into one big table is simply not possible to maintain
Hi thank you for your answers!
I am trying to integrate old ms sql db to new IDFLOW Software.
This software understand from ms sql db but only from one table but my old db contains 18 tables....
IDFLOW understand from views, but it is not good to work with views, they are okay for card design in IDFLOW, but views are not okay to write new data in db from IDFLOW interface , because in views it is not possible to write new data in db!
And now I started to think to create one db with one table with all columns, and I completed it successfully!
Now in my sql server I am with two db, old db and new db.
Now I don't know how to export data from old db and import it to new db?
I am talking about export column2 from table1 from db1 and import it in column2 from table1 from db2 ....
and so and so ........... column2 from table2 from db1 and import it in column3 from table1 from db2 ............
and so and so .... column 5 from table18 from db1 and import it in column10 form table1 from db2.
Is it possible to do that?
Okay, but IDFLOW works only with one table...
And I asked Jolly and they say utilize your DB for one table.
IDFLOW is a software for ID CARDS .
I think it is possible to use one db with one table and all columns for employee information.
The goal is.. in IDFLOW enter all data for new employee, when you enter data from IDFLOW interface they go into MS SQL DB and all fields from DB are in card design configured, and when you select records for one employee from IDFLOW interface (from db) you can print ID CARD with all data for employee.
The question is how to migrate old records from old db in new.
It is not big database, it is only 6 GB from 2006.
And we use it only for printing id cards.

SQL Optimize Group By Query

I have a table here with following fields:
Id, Name, kind. date
Data:
id name kind date
1 Thomas 1 2015-01-01
2 Thomas 1 2015-01-01
3 Thomas 2 2014-01-01
4 Kevin 2 2014-01-01
5 Kevin 2 2014-01-01
5 Kevin 2 2014-01-01
5 Kevin 2 2014-01-01
6 Sasha 1 2014-01-01
I have an SQL statement like this:
Select name,kind,Count(*) AS RecordCount
from mytable
group by kind, name
I want to know how many records there are for any name and kind. Expected results:
name kind count
Thomas 1 2
Thomas 2 1
Kevin 2 2
Sasha 1 4
The problem is that it is a big table, with more than 50 Million records.
Also I'd like to know the result within the last hour, last day, last week and so on, for which I need to add this WHERE clause this:
Select name,kind,Count(*) AS RecordCount
from mytable
WHERE Date > '2015-26-07'
group by kind, name
I use T-SQL with the SQL Server Management Studio. All of the relevant columns have a non clustered index and the primary key is a clustered index.
Does somebody have ideas how to make this faster?
Update:
The execution plan says:
Select, Compute Scalar, Stream Aggregate, Sort, Parallelism: 0% costs.
Hash Match (Partial Aggregate): 12%.
Clustered Index Scan: 88%
Sorry, I forgot to check the SQL-statements.
50 million is just lot of rows
Not anything you can do to optimize that query that I can see
Possibly a composite index on kind, name
Or try name, kind
Or name only
I think the query optimizer is smart enough for this to not be a factor but but switch the group by to name, kind as name is more unique
If kind is not very unique (just 1 and 2) then you may be better off no index on that
I defrag the indexes you have
To query the last day is no big deal because you already have a date column on witch you can put an index on.
For last week I would create a seperate date-table witch contains one row per day with columns id, date, week
You have to pre-calculate the week. And now if you want to query a specific week you can look in the date table, get the Dates and query only those dates from your tabele mytable
You should test if it is more performant to join the date columns or if you better put the id column in your myTable an join with id. For big tables id might be the better choice.
To query last hour you could add the column [hour] in myTable an query it in combination with the date

T-SQL Pivoting approach

I have a View that has a structure similar to the following:
Id Name State ZipCode #Requests AmtReq Price Month Year
1 John IN 46202 203 33 $300 1 2015
1 Jane IN 46202 200 45 $100 2 2015
...
Queries require reports to be generated for given quarters (1st quarter will include the first three months ...) grouped by state
The result should look like this:
Ist Quarter ...
January February ...
State ZipCode #Requests AmtReq Price #Requests AmtReq Price ...
IN 46202 203 33 45 200 45 100
I feel that this can be done using pivoting but I do not have experience with it. I tried with single column pivoting and had some success, but not in this scale.
Another approach would be to create a stored procedure that will generate the data for me and then just fix some formating (e.g., the first two rows) in the client. Any suggestions on how to approach this problem?
I am using SQL Server as a DBMS.
If you have MS Excel on your machine then you can export the view to Excel and summarize it to a pivot table. From there you can create table and diagrams as you needed.

SQL Server, can SQL Server store 10 pieces of information of item in a row?

I'm wondering if SQL Server can store 10 pieces of information of item in a row?
Because I want to make a table of Date, Item_Name, Quantity
but I want to make in a row that input only 1 date (ex. 21 November 2014) but have Item name such as (chicken, rabbit, cow) that have quantity of (2, 4, 3)
Can SQL do that ??
If not, can you recommend me, because I want to make a daily report of what items have sold on the day and the day before and so on.
Can you understand what I meant? Cause I'm not good with english.
You should probably do something like this:
Table Dates:
DateId Date
1 21/11/2014
2 23/11/2014
Table Items:
DateId Name Quantity
1 Chicken 2
1 Rabbit 4
1 Cow 3
2 Dinosaur 666
Dates.DateId should be Primary Key and, depending on your logic, perhaps also identity (it autogenerates the following id), and Items.DateId should have a Foreign Key with Dates.DateId.
More info about normalization here.

Repeate Parent column in child table

I have following Three tables
Periods
--------------------------------
ID StartDate EndDate Type
--------------------------------
1 2013-01-01 2013-01-01 D
2 2013-01-02 2013-01-02 D
Attendance
---------------------------------------------------
ID PeriodID UploadedBy uploadDateTime Approved
--------------------------------------------------
1 1 25 2013-01-01-11:00 1
2 1 54 2013-01-01-10:00 1
Attendance Detail
---------------------------------------------
ID EmployeeID AttendanceTime Status AttendanceID
---------------------------------------------
1 24 2013-01-01 09:05 CheckIn 1
1 28 2013-01-01 09:08 CheckOut 2
Attendance data is filled through biomatric machined generated CSV files. Attendancedetail may group over time as there are multiple checkin out per employee per day. Attendance is approved for each period period.
Qustion
I need attendance data per period basis. I know I can achieve this though joins. but i have to use between filter on AttendenceTime. I was thinking to add PeriodID in AttendenceDetail table also to simplify queries and future performance issue. should I go for it or there is better solution available
If you often need Attendance details based per Period, so you usually need to join the three tables but the Attendance data (from the Attendance table) are not so important for you then the PeriodID in the Attendance Detail table will help you for sure.
Even if you need all three tables, a where condition on PeriodID will narrow down the number of rows from Attendance Detail, so it will be again helpful in terms of performance.
Maybe it can be a bit annoying to maintain a not fully normalized schema, but if it's not a big hassle and this doesn't impact your writing performance go for the PeriodID in the Attendance Detail. Your selects will thank you :)

Resources