SQL Optimize Group By Query

SQL Optimize Group By Query - sql-server

I have a table here with following fields:
Id, Name, kind. date
Data:
id name kind date
1 Thomas 1 2015-01-01
2 Thomas 1 2015-01-01
3 Thomas 2 2014-01-01
4 Kevin 2 2014-01-01
5 Kevin 2 2014-01-01
5 Kevin 2 2014-01-01
5 Kevin 2 2014-01-01
6 Sasha 1 2014-01-01
I have an SQL statement like this:
Select name,kind,Count(*) AS RecordCount
from mytable
group by kind, name
I want to know how many records there are for any name and kind. Expected results:
name kind count
Thomas 1 2
Thomas 2 1
Kevin 2 2
Sasha 1 4
The problem is that it is a big table, with more than 50 Million records.
Also I'd like to know the result within the last hour, last day, last week and so on, for which I need to add this WHERE clause this:
Select name,kind,Count(*) AS RecordCount
from mytable
WHERE Date > '2015-26-07'
group by kind, name
I use T-SQL with the SQL Server Management Studio. All of the relevant columns have a non clustered index and the primary key is a clustered index.
Does somebody have ideas how to make this faster?
Update:
The execution plan says:
Select, Compute Scalar, Stream Aggregate, Sort, Parallelism: 0% costs.
Hash Match (Partial Aggregate): 12%.
Clustered Index Scan: 88%
Sorry, I forgot to check the SQL-statements.

50 million is just lot of rows
Not anything you can do to optimize that query that I can see
Possibly a composite index on kind, name
Or try name, kind
Or name only
I think the query optimizer is smart enough for this to not be a factor but but switch the group by to name, kind as name is more unique
If kind is not very unique (just 1 and 2) then you may be better off no index on that
I defrag the indexes you have

To query the last day is no big deal because you already have a date column on witch you can put an index on.
For last week I would create a seperate date-table witch contains one row per day with columns id, date, week
You have to pre-calculate the week. And now if you want to query a specific week you can look in the date table, get the Dates and query only those dates from your tabele mytable
You should test if it is more performant to join the date columns or if you better put the id column in your myTable an join with id. For big tables id might be the better choice.
To query last hour you could add the column [hour] in myTable an query it in combination with the date

Related

SQL Server, can SQL Server store 10 pieces of information of item in a row?

I'm wondering if SQL Server can store 10 pieces of information of item in a row?
Because I want to make a table of Date, Item_Name, Quantity
but I want to make in a row that input only 1 date (ex. 21 November 2014) but have Item name such as (chicken, rabbit, cow) that have quantity of (2, 4, 3)
Can SQL do that ??
If not, can you recommend me, because I want to make a daily report of what items have sold on the day and the day before and so on.
Can you understand what I meant? Cause I'm not good with english.

You should probably do something like this:
Table Dates:
DateId Date
1 21/11/2014
2 23/11/2014
Table Items:
DateId Name Quantity
1 Chicken 2
1 Rabbit 4
1 Cow 3
2 Dinosaur 666
Dates.DateId should be Primary Key and, depending on your logic, perhaps also identity (it autogenerates the following id), and Items.DateId should have a Foreign Key with Dates.DateId.
More info about normalization here.

RANKX not working when data summarized in power pivot table

I am trying to rank records in power pivot table using DAX as below in MSSQL analysis service tabular model.
Example details:
I have a shop sales detail in table.
e.g.
ShopNo date sales
-----------------
1 2014-11-09 120
1 2014-11-09 130
2 2014-11-10 130
2 2014-11-10 135
In pivot table data is analyzed month and year wise.
I want to see result like
ShopNo sales rank
-----------------
2 265 1
3 250 2
Any solution is there to display statewise population automatically.
Thanks

You should be able to achieve the ranking quite easily with PowerPivot using this formula:
RankShop:=RANKX(ALL(SalesTable[ShopNo]), [Sum of sales],,,Dense)
With SalesTable being your shops sales table. If you then create a pivot table - drag ShopNo onto Rows and add new Measure (Excel 2010, in 2013 it's Calculated Field). The resulting table could then look like this:
To find out more about RANK function, I suggest this article.
In order to hide the rank value in Grand Total row, add a simple condition that puts blank values in case of grandtotals:
=IF(HASONEVALUE(SalesTable[ShopNo]), [RankShop], BLANK())
Hope this helps.

Why sort on sorted non clustered index field?

Say I have a table with ID, Name, and Date.
And I have a non-clustered index like,
CREATE NONCLUSTERED INDEX IX_Test_NameDate ON [dbo].[Test] (Name, Date)
When I run the query,
select
[Name], [Date]
from
[dbo].[Test] WITH (INDEX(IX_Test_NameDate))
where
[Name] like 'A%'
order by
[Date] asc
I get in SQL Server's execution plan,
Select <-- Sort <-- Index Seek (NonClustered)
Why the sort? Isn't the date already sorted in the non-clustered index? What would a better non-clustered index look like that doesn't require a sort (only an index seek).
(Can't use a clustered index as this example is a condensed version of a bigger example with multiple rows/indexes).
For example, I get the execution plan (with sort) for a table that looks like this,
ID Name Date
1 A 2014-01-01
2 A 2014-02-01
3 A 2014-03-01
4 A 2014-04-01
5 B 2014-01-01
6 B 2014-02-01
7 B 2014-03-01
8 B 2014-04-01
9 B 2014-05-01
10 B 2014-06-01
Shouldn't the dates be sorted in this case?

No, the Date column is not "already sorted in the non-clustered index", at least, not by itself. It is sorted after Name.
Consider the following trivial table data:
Name Date
----- --------
Allen 1/1/2014
Barb 1/1/2013
Charlie 1/1/2015
Darlene 1/1/2012
Ernie 1/1/2016
Faith 1/1/2011
Once you've sorted by Name, the Date columns are potentially out of order. Dates are guaranteed in order only for rows that have the same Name.
Your goals are at cross-purposes to each other. You want multiple names--so the data is best ordered by name so that the seek is possible, but then you want to sort by Date. How would you propose storing the above six-row table so that it is sorted by Date for every possible range of names?
If there is some kind of regularity or pattern about the ranges of names (perhaps, for example, you always pull names by first letter only) then there is a possible workaround.
ALTER TABLE dbo.Test ADD NamePrefix AS (Left(Name, 1)) PERSISTED;
CREATE NONCLUSTERED INDEX IX_Test_NamePrefix_Date ON dbo.Test (NamePrefix, Date);
Now this query theoretically should not need to perform the sort:
SELECT Name, Date
FROM dbo.Test
WHERE NamePrefix = 'A'
ORDER BY Date;
Be aware that there are some likely gotchas with adding a persisted computed column like this: increased data size, the fact that such a design is almost certainly wrong in almost every case, that the proliferation of computed columns would be very bad, among others.
P.S. It is generally not best practice to force indexes manually--let the optimizer choose.

Repeate Parent column in child table

I have following Three tables
Periods
--------------------------------
ID StartDate EndDate Type
--------------------------------
1 2013-01-01 2013-01-01 D
2 2013-01-02 2013-01-02 D
Attendance
---------------------------------------------------
ID PeriodID UploadedBy uploadDateTime Approved
--------------------------------------------------
1 1 25 2013-01-01-11:00 1
2 1 54 2013-01-01-10:00 1
Attendance Detail
---------------------------------------------
ID EmployeeID AttendanceTime Status AttendanceID
---------------------------------------------
1 24 2013-01-01 09:05 CheckIn 1
1 28 2013-01-01 09:08 CheckOut 2
Attendance data is filled through biomatric machined generated CSV files. Attendancedetail may group over time as there are multiple checkin out per employee per day. Attendance is approved for each period period.
Qustion
I need attendance data per period basis. I know I can achieve this though joins. but i have to use between filter on AttendenceTime. I was thinking to add PeriodID in AttendenceDetail table also to simplify queries and future performance issue. should I go for it or there is better solution available

If you often need Attendance details based per Period, so you usually need to join the three tables but the Attendance data (from the Attendance table) are not so important for you then the PeriodID in the Attendance Detail table will help you for sure.
Even if you need all three tables, a where condition on PeriodID will narrow down the number of rows from Attendance Detail, so it will be again helpful in terms of performance.
Maybe it can be a bit annoying to maintain a not fully normalized schema, but if it's not a big hassle and this doesn't impact your writing performance go for the PeriodID in the Attendance Detail. Your selects will thank you :)

MS Access row number, specify an index

Is there a way in MS access to return a dataset between a specific index?
So lets say my dataset is:
rank | first_name | age
1 Max 23
2 Bob 40
3 Sid 25
4 Billy 18
5 Sally 19
But I only want to return those records between 'rank' 2 and 4, so my results set is Bob, Sid and Billy? However, Rank is not part of the table, and this should be generated when the query is run. Why don't I use an autogenerated number, because if a record is deleted, this will be inconsistent, and what if I wanted the results in reverse!
This obviously very simple, and the reason I ask is because I am working on a product catalogue and I am looking for a more efficient way of paging through the returned dataset, so if I only return 1 page worth of data from the database this is obviously going to be quicker then return a complete set of 3000 records and then having to subselect from that set!
Thanks R.

Original suggestion:
SELECT * from table where rank BETWEEN 2 and 4;
Modified after comment, that rank is not existing in structure:
Select top 100 * from table;
And if you want to choose subsequent results, you can choose the ID of the last record from the first query, say it was ID 101, and use a WHERE clause to get the next 100;
Select top 100 * from table where ID > 100;
But these won't give you what you're looking for either, I bet.

How are you calculating rank? I assume you are basing it on some data in another dataset somewhere. If so, create a function, do a table join, or do something that can calculate rank based on values in other table(s), then you can do queries based on the rank() function.
For example:
select *
from table
where rank() between 2 and 4
If you are not calculating rank based on some data somewhere, there really isn't a way to write this query, and you might as well be returning three random rows from the table.

I think you need to use a correlated subquery to calculate the rank on the fly e.g. I'm guessing the rank is based on name:
SELECT T1.first_name, T1.age,
(
SELECT COUNT(*) + 1
FROM MyTable AS T2
WHERE T1.first_name > T2.first_name
) AS rank
FROM MyTable AS T1;
The bad news is the Access data engine is poorly optimized for this kind of query; in my experience, performace will start to noticeably degrade beyond a few hundred rows.
If it is not possible to maintain the rank on the db side of the house (e.g. high insertion environment) consider doing the paging on the client side. For example, an ADO classic recordset object has properties to support paging (PageCount, PageSize, AbsolutePage, etc), something for which DAO recordsets (being of an older vintage) have no support.
As always, you'll have to perform your own timings but I suspect that when there are, say, 10K rows you will find it faster to take on the overhead of fetching all the rows to an ADO recordset then finding the page (then perhaps fabricate smaller ADO recordset consisting of just that page's worth of rows) than it is to perform a correlated subquery to only fetch the number of rows for the page.

Unfortunately the LIMIT keyword isn't available in MS Access -- that's what is used in MySQL for a multi-page presentation. If you can write an order key into the results table, then you can use it something like this:
SELECT TOP 25 MyOrder, Etc FROM Table1 WHERE MyOrder in
(SELECT TOP 55 MyOrder FROM Table1 ORDER BY MyOrder DESC)
ORDER BY MyOrder ASCENDING

If I understand you correctly, there is ionly first_name and age columns in your table. If this is the case, then there is no way to return Bob, Sid, and Billy with a single query. Unless you do something like
SELECT * FROM Table
WHERE FirstName = 'Bob'
OR FirstName = 'Sid'
OR FirstName = 'Billy'
But I think that this is not what you are looking for.
This is because SQL databases make no guarantee as to the order that the data will come out of the database unless you specify an ORDER BY clause. It will usually come out in the same order it was added, but there are no guarantees, and once you get a lot of rows in your table, there's a reasonably high probability that they won't come out in the order you put them in.
As a side note, you should probably add a "rank" column (this column is usually called id) to your table, and make it an auto incrementing integer (see Access documentation), so that you can do the query mentioned by Sev. It's also important to have a primary key so that you can be certain which rows are being updated when you are running an update query, or which rows are being deleted when you run a delete query. For example, if you had 2 people named Max, and they were both 23, how you delete 1 row without deleting the other. If you had another auto incrementing unique column in there, you could specify the unique ID in your query to delete only one.
[ADDITION]
Upon reading your comment, If you add an autoincrement field, and want to read 3 rows, and you know the ID of the first row you want to read, then you can use "TOP" to read 3 rows.
Assuming your data looks like this
ID | first_name | age
1 Max 23
2 Bob 40
6 Sid 25
8 Billy 18
15 Sally 19
You can wuery Bob, Sid and Billy with the following QUERY.
SELECT TOP 3 FirstName, Age
From Table
WHERE ID >= 2
ORDER BY ID