Repeate Parent column in child table - sql-server

I have following Three tables
Periods
--------------------------------
ID StartDate EndDate Type
--------------------------------
1 2013-01-01 2013-01-01 D
2 2013-01-02 2013-01-02 D
Attendance
---------------------------------------------------
ID PeriodID UploadedBy uploadDateTime Approved
--------------------------------------------------
1 1 25 2013-01-01-11:00 1
2 1 54 2013-01-01-10:00 1
Attendance Detail
---------------------------------------------
ID EmployeeID AttendanceTime Status AttendanceID
---------------------------------------------
1 24 2013-01-01 09:05 CheckIn 1
1 28 2013-01-01 09:08 CheckOut 2
Attendance data is filled through biomatric machined generated CSV files. Attendancedetail may group over time as there are multiple checkin out per employee per day. Attendance is approved for each period period.
Qustion
I need attendance data per period basis. I know I can achieve this though joins. but i have to use between filter on AttendenceTime. I was thinking to add PeriodID in AttendenceDetail table also to simplify queries and future performance issue. should I go for it or there is better solution available

If you often need Attendance details based per Period, so you usually need to join the three tables but the Attendance data (from the Attendance table) are not so important for you then the PeriodID in the Attendance Detail table will help you for sure.
Even if you need all three tables, a where condition on PeriodID will narrow down the number of rows from Attendance Detail, so it will be again helpful in terms of performance.
Maybe it can be a bit annoying to maintain a not fully normalized schema, but if it's not a big hassle and this doesn't impact your writing performance go for the PeriodID in the Attendance Detail. Your selects will thank you :)

Related

SQL Server - Slowly Changing dimension join

I have a fact table and employee "tier" table, let's say.
So the fact table looks sorta like
employee_id call date
Mark 1 1-1-2017
Mark 2 1-2-2017
John 3 1-2-2017
Then there needs to be a data structure for 'tier level' - a slowly changing dimension table. I want to keep this simple -- I can change the structure of this table to whatever, but for now I've created it as such.
employee_id tier1_start ... tier2_start ... tier3_start
Mark 5-1-2016
John 6-1-2016 8-1-2016
Lucy 6-1-2016 10-1-2016
Two important notes. This table sort of operates under the assumption that a promotion will only occur once - aka no demotions and repromotions will occur. Also, it's possible one can jump from tier 1 to tier 3.
I was trying to come up with the best possible query for coming up with a 'tier' dimension (denormalization) for the fact table.
For instance, I want to see the Tier 1 metrics for February, or the Tier 2 metrics for February. Obviously the historically-changing tier dimension must be linked.
The clumsiest way I can think of doing this for now ... is simply joining the fact table on the tier table using employee_id.
Then, doing an even clumsier case statement:
case
when isnull(tier3_start,'0') < date then 'T3'
when isnull(tier2_start, '0') < date then 'T2'
when isnull(tier1_start, '0') < date then 'T1'
else 'other'
end as tier_level
Yes, as you can see this is very clumsy.
I'm thinking maybe I need to change the structure of this a bit.
You're probably better off splitting your tier table in two.
So have a Tier table like this:
TierID Tier
------------------
1 Tier 1
2 Tier 2
3 Tier 3
And an EmployeeTier table:
ID EmpID TierID TierDate
---------------------------------------
1 1 1 Jun 1, 2016
2 1 3 Oct 2, 2016
3 2 1 Jul 10, 2016
4 2 2 Nov 11, 2016
Now you can query the EmployeeTier table and filter on the TierID you're looking for.
This also gives you the ability to promote/demote multiple times. You simply filter by the employee and sort by date to find the current tier.

SQL delete rows based on date difference

The situation is quite complicated to express in the title. An example should be much easier to understand.
My table A:
uid id ticket created_date
001 1 movie 2015-01-23 08:23:16
002 25 TV 2012-01-13 12:02:20
003 1 movie 2015-02-01 07:15:36
004 1 movie 2014-02-15 15:38:40
What I need to achieve is to remove duplicate records that appear within 31 days between each other and retain the record that appear first. So the above table would be reduced to B:
uid id ticket created_date
001 1 movie 2015-01-23 08:23:16
002 25 TV 2012-01-13 12:02:20
004 1 movie 2014-02-15 15:38:40
because the 3rd row in A were within 31 days of row 1 and it appeared later than row 1 (2015-02-01 vs 2015-01-23), so it gets removed.
Is there a clean way to do this?
I would suggest the following approach:
SELECT A.uid AS uid
INTO #tempA
FROM A
LEFT JOIN A AS B
ON A.id=B.id AND A.ticket=B.ticket
WHERE DATEDIFF(SECOND,B.date,A.date) > 0 AND
DATEDIFF(SECOND,B.date,A.date) < 31*24*60*60;
DELETE FROM A WHERE uid IN (SELECT uid FROM #tempA);
This is assuming that by 'duplicate records' you mean records that have both identical id as well as identical ticket fields. If that's not the case you should adjust the ON clause accordingly.

SQL Optimize Group By Query

I have a table here with following fields:
Id, Name, kind. date
Data:
id name kind date
1 Thomas 1 2015-01-01
2 Thomas 1 2015-01-01
3 Thomas 2 2014-01-01
4 Kevin 2 2014-01-01
5 Kevin 2 2014-01-01
5 Kevin 2 2014-01-01
5 Kevin 2 2014-01-01
6 Sasha 1 2014-01-01
I have an SQL statement like this:
Select name,kind,Count(*) AS RecordCount
from mytable
group by kind, name
I want to know how many records there are for any name and kind. Expected results:
name kind count
Thomas 1 2
Thomas 2 1
Kevin 2 2
Sasha 1 4
The problem is that it is a big table, with more than 50 Million records.
Also I'd like to know the result within the last hour, last day, last week and so on, for which I need to add this WHERE clause this:
Select name,kind,Count(*) AS RecordCount
from mytable
WHERE Date > '2015-26-07'
group by kind, name
I use T-SQL with the SQL Server Management Studio. All of the relevant columns have a non clustered index and the primary key is a clustered index.
Does somebody have ideas how to make this faster?
Update:
The execution plan says:
Select, Compute Scalar, Stream Aggregate, Sort, Parallelism: 0% costs.
Hash Match (Partial Aggregate): 12%.
Clustered Index Scan: 88%
Sorry, I forgot to check the SQL-statements.
50 million is just lot of rows
Not anything you can do to optimize that query that I can see
Possibly a composite index on kind, name
Or try name, kind
Or name only
I think the query optimizer is smart enough for this to not be a factor but but switch the group by to name, kind as name is more unique
If kind is not very unique (just 1 and 2) then you may be better off no index on that
I defrag the indexes you have
To query the last day is no big deal because you already have a date column on witch you can put an index on.
For last week I would create a seperate date-table witch contains one row per day with columns id, date, week
You have to pre-calculate the week. And now if you want to query a specific week you can look in the date table, get the Dates and query only those dates from your tabele mytable
You should test if it is more performant to join the date columns or if you better put the id column in your myTable an join with id. For big tables id might be the better choice.
To query last hour you could add the column [hour] in myTable an query it in combination with the date

T-SQL Pivoting approach

I have a View that has a structure similar to the following:
Id Name State ZipCode #Requests AmtReq Price Month Year
1 John IN 46202 203 33 $300 1 2015
1 Jane IN 46202 200 45 $100 2 2015
...
Queries require reports to be generated for given quarters (1st quarter will include the first three months ...) grouped by state
The result should look like this:
Ist Quarter ...
January February ...
State ZipCode #Requests AmtReq Price #Requests AmtReq Price ...
IN 46202 203 33 45 200 45 100
I feel that this can be done using pivoting but I do not have experience with it. I tried with single column pivoting and had some success, but not in this scale.
Another approach would be to create a stored procedure that will generate the data for me and then just fix some formating (e.g., the first two rows) in the client. Any suggestions on how to approach this problem?
I am using SQL Server as a DBMS.
If you have MS Excel on your machine then you can export the view to Excel and summarize it to a pivot table. From there you can create table and diagrams as you needed.

RANKX not working when data summarized in power pivot table

I am trying to rank records in power pivot table using DAX as below in MSSQL analysis service tabular model.
Example details:
I have a shop sales detail in table.
e.g.
ShopNo date sales
-----------------
1 2014-11-09 120
1 2014-11-09 130
2 2014-11-10 130
2 2014-11-10 135
In pivot table data is analyzed month and year wise.
I want to see result like
ShopNo sales rank
-----------------
2 265 1
3 250 2
Any solution is there to display statewise population automatically.
Thanks
You should be able to achieve the ranking quite easily with PowerPivot using this formula:
RankShop:=RANKX(ALL(SalesTable[ShopNo]), [Sum of sales],,,Dense)
With SalesTable being your shops sales table. If you then create a pivot table - drag ShopNo onto Rows and add new Measure (Excel 2010, in 2013 it's Calculated Field). The resulting table could then look like this:
To find out more about RANK function, I suggest this article.
In order to hide the rank value in Grand Total row, add a simple condition that puts blank values in case of grandtotals:
=IF(HASONEVALUE(SalesTable[ShopNo]), [RankShop], BLANK())
Hope this helps.

Resources