How to determine data distribution in SQL Server columns using T-SQL

How to determine data distribution in SQL Server columns using T-SQL - sql-server

Can someone show me how to compile code in T-SQL that will allow me to view the distribution of data in columns?
For example in the sample table, there is a column called model. In that column, 50% of the values are Fiestas. I would like to a query that will help determine the distribution of in data in columns.
I have included some sample code to help:
CREATE TABLE #tmpTable
(
registration varchar(50),
make varchar(50),
model varchar(50),
engine_size float
)
INSERT INTO #tmpTable VALUES
('JjFw5a0','SKODA','OCTAVIA',1.8),
('VkfCDpZ','FORD','FIESTA',1.7),
('5E93ZEq','SKODA','OCTAVIA',1.3),
('L2PPN0m','FORD','FIESTA',1.1),
('9xKghxp','FORD','FIESTA',1.5),
('WHShdBm','FORD','FIESTA',1.4),
('TNRHyy7','NISSAN','QASHQAI',1.2),
('6RNX0XG','SKODA','OCTAVIA',1.4),
('tJ9bOD8','FORD','FIESTA',1.1),
('ablFUSC','FORD','FIESTA',1),
('4B7RLYL','MERCEDED_BENZ','E CLASS',1.3),
('tlJiwVY','FORD','FIESTA',1),
('Fb9lcvG','FORD','FIESTA',1.4),
('nW4lqBC','FORD','FIESTA',1.6),
('LggTmL5','HYUNDAI','I20',1),
('2mGgSjS','FORD','FIESTA',1.1),
('IDvOzcM','FORD','FIESTA',1.3),
('JefpXK2','FORD','FIESTA',1.5),
('0h1uWfZ','MERCEDED_BENZ','E CLASS',1.4),
('ylBoGbV','MERCEDED_BENZ','E CLASS',1.7),
('XzoILDK','VAUXHALL','CORSA',1.8),
('Xhocs1Z','FORD','FIESTA',1.5),
('Lh2yWGa','KIA','RIO',1.5),
('hM5GWA0','FORD','FIESTA',1.3),
('PbpxkFt','FORD','FIESTA',1.7),
('SDHWV2r','FORD','FIESTA',1.2),
('n83Je2D','FORD','FIESTA',1.8),
('sDN0gex','FORD','FIESTA',1.2),
('7EICOZY','KIA','RIO',1.5),
('PUuMmIH','FORD','FIESTA',1),
('HiBwSg2','FORD','FIESTA',1.8),
('1yk1vDm','KIA','RIO',1.7),
('cMpH72R','HYUNDAI','I20',1.1),
('ZgQL0gt','MERCEDED_BENZ','E CLASS',1.3),
('jhpamQG','KIA','RIO',1.1),
('pk0lU2F','VAUXHALL','CORSA',1.4),
('fDCUeq1','FORD','FIESTA',1.1),
('ono5QFC','FORD','FIESTA',1.7),
('VohWwGR','FORD','FIESTA',1.5),
('Hih8dKc','SUZUKI','SWIFT',1.2),
('D2RNn3h','SUZUKI','SWIFT',1.2),
('QaYQulE','FORD','FIESTA',1.1),
('xmQPxAG','FORD','FIESTA',1.8),
('vmTqkTO','FORD','FIESTA',1.2),
('lvUtVUA','MERCEDED_BENZ','E CLASS',1),
('SFoj00d','FORD','FIESTA',1),
('9S6wrWV','MERCEDED_BENZ','E CLASS',1),
('0SBnW0z','FORD','FIESTA',1.1),
('HnDHdfj','MERCEDED_BENZ','E CLASS',1),
('RV7q947','FORD','FIESTA',1.4),
('JZqCtTg','FORD','FIESTA',1.7),
('XVgBwgi','FORD','FIESTA',1.8),
('iqJDsIF','FORD','FIESTA',1.6),
('CMbpRFa','FORD','FIESTA',1.6),
('vF7K5Xg','SUZUKI','SWIFT',1.1),
('3j6XGDH','FORD','FIESTA',1.5),
('ommqugM','FORD','FIESTA',1.1),
('LMQkPnw','NISSAN','QASHQAI',1.4),
('1dKgcdd','FORD','FIESTA',1.5),
('hC8BxiP','MERCEDED_BENZ','E CLASS',1.1),
('wLTWol7','FORD','FIESTA',1.6),
('TY8ChYN','FORD','FIESTA',1.6),
('Gw1CpI8','FORD','FIESTA',1.4),
('L4OPAJq','FORD','FIESTA',1.1),
('6TyYpfi','NISSAN','QASHQAI',1.6),
('ozoOcGL','FORD','FIESTA',1.4),
('6IME19U','FORD','FIESTA',1.4),
('BxpmJO5','FORD','FIESTA',1.4),
('0zc2n5A','FORD','FIESTA',1.3),
('FqbBZE2','FIAT','500',1.7),
('2EkTOTz','FORD','FIESTA',1.4),
('fNBvIvg','MERCEDED_BENZ','E CLASS',1.2),
('u5j4R4S','KIA','RIO',1.4),
('zpWaUZo','FORD','FIESTA',1.1),
('FQPVQYc','NISSAN','QASHQAI',1.7),
('8RBQADq','KIA','RIO',1.7),
('TOz2bcT','HYUNDAI','I20',1.7),
('jebhCex','FORD','FIESTA',1.3),
('cdHA1gL','FORD','FIESTA',1.2),
('FoaN4AT','FORD','FIESTA',1.7),
('atGn288','FORD','FIESTA',1.5),
('es8VNdW','FIAT','500',1.3),
('hDWoMXa','KIA','RIO',1.4),
('Q9C6Br1','KIA','RIO',1.5),
('mFSy4aF','FORD','FIESTA',1.6),
('bbbKnrM','SKODA','OCTAVIA',1.5),
('qY7lz6I','FORD','FIESTA',1),
('8Ch2OeU','VAUXHALL','CORSA',1.3),
('dcWsjJv','VAUXHALL','CORSA',1.3),
('bnnoBPg','SKODA','OCTAVIA',1.8),
('mvDyYkK','FORD','FIESTA',1.4),
('KpWDYap','FORD','FIESTA',1.3),
('7EK9K4z','FORD','FIESTA',1.3),
('ZPLHtlP','FORD','FIESTA',1.6),
('4EpYeSB','FORD','FIESTA',1.6),
('O1eZ20M','FORD','FIESTA',1),
('WfVntKk','FORD','FIESTA',1.7),
('6VlkBdi','FORD','FIESTA',1.1),
('hFQfKjk','KIA','RIO',1.4),
('3Y4njNP','KIA','RIO',1),
('3UuNqG0','FORD','FIESTA',1.7),
('qpvMYAu','FORD','FIESTA',1.1),
('NCYJUqx','FORD','FIESTA',1.3),
('M0AvWzg','FORD','FIESTA',1.6),
('XbVmtFf','FORD','FIESTA',1.3),
('l8qZy0H','SKODA','OCTAVIA',1.3),
('EDUbxaU','MERCEDED_BENZ','E CLASS',1.6),
('nWLd82o','FORD','FIESTA',1.7),
('4AkoyWx','FORD','FIESTA',1),
('nOoO25v','FORD','FIESTA',1.3),
('VAm5aV8','NISSAN','QASHQAI',1.4),
('zbd3cie','FORD','FIESTA',1.5),
('hyAN71W','NISSAN','QASHQAI',1),
('FxACHDf','FIAT','500',1.7),
('wOZdaeV','FORD','FIESTA',1.6),
('gfxZl99','VAUXHALL','CORSA',1.1),
('06HhwEJ','SKODA','OCTAVIA',1.7),
('PCTgYiG','KIA','RIO',1.7),
('U54WXZQ','KIA','RIO',1.6),
('FHgrRiF','FORD','FIESTA',1.6),
('R3jP73p','SKODA','OCTAVIA',1.5),
('etVPKX9','SUZUKI','SWIFT',1.1),
('BE3yReB','FORD','FIESTA',1.7),
('zXmX878','FORD','FIESTA',1.6),
('wdM3P2m','FORD','FIESTA',1.7),
('tb727BM','FORD','FIESTA',1.1)
SELECT * FROM #tmpTable

You can apply a Windowed Aggregate to get the overall count:
SELECT make
, model
, count(*) as cnt -- count per Model
, cast(count(*) * 100.0 -- compared to all counts
/ sum(count(*))
over () as dec(5,2)) as distribution
FROM #tmptable
group by make
, model
order by distribution desc;
See fiddle
If you want the percentage of the Model for each Make you need to add PARTITION BY:
SELECT make
, model
, count(*) as cnt -- count per Model
, cast(count(*) * 100.0
/ sum(count(*)) -- compared to all counts per Make
over (partition by Make) as dec(5,2)) as distribution
FROM #tmptable
group by make
, model
order by make, distribution desc;

You can use conditional aggregation to get the ratio of the count of Ford Fiestas and the total count.
SELECT 100.0
* count(CASE
WHEN make = 'FORD'
AND model = 'FIESTA' THEN
1
END)
/ count(*)
FROM #tmptable;
Edit:
If you want the figures for all car models you can simply aggregate and group to get the count for each car model and divide that by the total count which you can get via a subquery.
SELECT make,
model,
100.0
* count(*)
/ (SELECT count(*)
FROM #tmptable)
FROM #tmptable
GROUP BY make,
model;

Related

T-SQL Max Weight Query

I have a table with these columns:
BatchNumber, BagNumber, BagWeight, CumulativeWeight
Each batch can have up to 30 bags and the other columns are self-explanatory.
What I need is a query which finds the maximum cumulative weight for each batch, here is what I have so far.
DECLARE #HighestBagNumber INT;
DECLARE #BatchNumber CHAR(8);
SET #BatchNumber = 37708;
SELECT #HighestBagNumber = MAX(BagNumber)
FROM FSD3BagLog
WHERE BatchNumber = #BatchNumber
SELECT BatchNumber, BagNumber, CumulativeWeight
FROM FSD3BagLog
WHERE BagNumber = #HighestBagNumber
AND BatchNumber = #BatchNumber
This works for one batch at a time but I need it to look at all batches in the table. As you might be able to tell, I am a total beginner so please be as critical as you want, its all good.

Yes GROUP BY seems right, with the proper storage model it should be:
SELECT BatchNumber, COUNT(BagNumber), SUM(BagWeight)
FROM FSD3BagLog
GROUP BY BatchNumber
Result: 100, 30, 600
(Where batch number = 100, there are 30 per batch, and weight of each bag = 20)
But based on you current working query it looks like you are storing denomalizing the data and storing cumulative weight as you go, probably using triggers or some other code that fires when the table is updated.
So if cumulative weight represents the total weight of a given batch, you can get rid of it and use the query above.
If cumulative weight is something else, such as total of all bags up to a certain point in time, you can still get rid of it. In that case you would simple do something like:
SELECT BatchNumber, SUM(BagWeight) AS CumulativeWeight
FROM FSD3BagLog
WHERE ModifiedDate <= '2018-08-11 06:18:00'
Given you are storing ModifiedDate as a column on your table, this will give you the cumulative weight of all bags up to today at 6:18 AM.

Simple GROUP BY should do the job:
SELECT BatchNumber, MAX(CumulativeWeight)
FROM my_table
GROUP BY BatchNumber

with batches_ranked as
(
select BatchNumber, BagNumber,
CumulativeWeight = sum(Weight) over (partition by BatchNumber order by BagNumber),
[Rank] = row_number() over (partition by BatchNumber order by BagNumber desc)
from FSD3BagLog
)
select * from batches_ranked where [Rank] = 1
sounds like you have CumulativeWeight stored in the table, if that is always increasing with BagNumber then you can simplify the query to just:
select BatchNumber, max(BagNumber), max(CumulativeWeight)
from FSD3BagLog group by BatchNumber

Get random data from SQL Server without performance impact

I need to select random rows from my sql table, when search this cases in google, they suggested to ORDER BY NEWID() but it reduces the performance. Since my table has more than 2'000'000 rows of data, this solution does not suit me.
I tried this code to get random data :
SELECT TOP 10 *
FROM Table1
WHERE (ABS(CAST((BINARY_CHECKSUM(*) * RAND()) AS INT)) % 100) < 10
It also drops performance sometimes.
Could you please suggest good solution for getting random data from my table, I need minimum rows from that tables like 30 rows for each request. I tried TableSAMPLE to get the data, but it returns nothing once I added my where condition because it return the data by the basis of page not basis of row.

Try to calc the random ids before to filter your big table.
since your key is not identity, you need to number records and this will affect performances..
Pay attention, I have used distinct clause to be sure to get different numbers
EDIT: I have modified the query to use an arbitrary filter on your big table
declare #n int = 30
;with
t as (
-- EXTRACT DATA AND NUMBER ROWS
select *, ROW_NUMBER() over (order by YourPrimaryKey) n
from YourBigTable t
-- SOME FILTER
WHERE 1=1 /* <-- PUT HERE YOUR COMPLEX FILTER LOGIC */
),
r as (
-- RANDOM NUMBERS BETWEEN 1 AND COUNT(*) OF FILTERED TABLE
select distinct top (#n) abs(CHECKSUM(NEWID()) % n)+1 rnd
from sysobjects s
cross join (SELECT MAX(n) n FROM t) t
)
select t.*
from t
join r on r.rnd = t.n

If your uniqueidentifier key is a random GUID (not generated with NEWSEQUENTIALID() or UuidCreateSequential), you can use the method below. This will use the clustered primary key index without sorting all rows.
SELECT t1.*
FROM (VALUES(
NEWID()),(NEWID()),(NEWID()),(NEWID()),(NEWID()),(NEWID()),(NEWID()),(NEWID()),(NEWID()),(NEWID())
,(NEWID()),(NEWID()),(NEWID()),(NEWID()),(NEWID()),(NEWID()),(NEWID()),(NEWID()),(NEWID()),(NEWID())
,(NEWID()),(NEWID()),(NEWID()),(NEWID()),(NEWID()),(NEWID()),(NEWID()),(NEWID()),(NEWID()),(NEWID())) AS ThirtyKeys(ID)
CROSS APPLY(SELECT TOP (1) * FROM dbo.Table1 WHERE ID >= ThirtyKeys.ID) AS t1;

how to get certain sql results

i'm looking to get certain sql results from a query depending on where they are positioned, for example, consider this code
SELECT * FROM Product ORDER BY id asc
which could return at least 100 or so results.
the question is though, how can i get the first 1 - 10 results of that, and then in another different, separate query, how can i get the results that are 11 - 20 or even get the results that are positioned 51 - 60 of that query?

Use a CTE to get the row number and then query by the row column
with your_query as(
SELECT ROW_NUMBER() OVER(ORDER BY ID ASC) AS Row, *
FROM Product
)
select * from your_query
where Row >=5 and Row<=10

There are a number of ways, here's one approach using ROW_NUMBER:
DECLARE #StartRow INTEGER = 11
DECLARE #EndRow INTEGER = 20
;WITH Data AS
(
SELECT TOP(#EndRow) ROW_NUMBER() OVER (ORDER BY id) AS RowNo, *
FROM Product
)
SELECT *
FROM Data
WHERE RowNo BETWEEN #StartRow AND #EndRow
ORDER BY Id

Anyway to get a value similar to ##ROWCOUNT when TOP is used?

If I have a SQL statement such as:
SELECT TOP 5
*
FROM Person
WHERE Name LIKE 'Sm%'
ORDER BY ID DESC
PRINT ##ROWCOUNT
-- shows '5'
Is there anyway to get a value like ##ROWCOUNT that is the actual count of all of the rows that match the query without re-issuing the query again sans the TOP 5?
The actual problem is a much more complex and intensive query that performs beautifully since we can use TOP n or SET ROWCOUNT n but then we cannot get a total count which is required to display paging information in the UI correctly. Presently we have to re-issue the query with a #Count = COUNT(ID) instead of *.

Whilst this doesn't exactly meet your requirement (in that the total count isn't returned as a variable), it can be done in a single statement:
;WITH rowCTE
AS
(
SELECT *
,ROW_NUMBER() OVER (ORDER BY ID DESC) AS rn1
,ROW_NUMBER() OVER (ORDER BY ID ASC) AS rn2
FROM Person
WHERE Name LIKE 'Sm%'
)
SELECT *
,(rn1 + rn2) - 1 as totalCount
FROM rowCTE
WHERE rn1 <=5
The totalCount column will have the total number of rows matching the where filter.
It would be interesting to see how this stacks up performance-wise against two queries on a decent-sized data-set.

you'll have to run another COUNT() query:
SELECT TOP 5
*
FROM Person
WHERE Name LIKE 'Sm%'
ORDER BY ID DESC
DECLARE #r int
SELECT
#r=COUNT(*)
FROM Person
WHERE Name LIKE 'Sm%'
select #r

Something like this may do it:
SELECT TOP 5
*
FROM Person
cross join (select count(*) HowMany
from Person
WHERE Name LIKE 'Sm%') tot
WHERE Name LIKE 'Sm%'
ORDER BY ID DESC
The subquery returns one row with one column containing the full count; the cross join includes it with all rows returned by the "main" query"; and "SELECT *" would include new column HowMany.
Depending on your needs, the next step might be to filter out that column from your return set. One way would be to load the data from the query into a temp table, and then return just the desired columns, and get rowcount from the HowMany column from any row.

How do I select last 5 rows in a table without sorting?

I want to select the last 5 records from a table in SQL Server without arranging the table in ascending or descending order.

This is just about the most bizarre query I've ever written, but I'm pretty sure it gets the "last 5" rows from a table without ordering:
select *
from issues
where issueid not in (
select top (
(select count(*) from issues) - 5
) issueid
from issues
)
Note that this makes use of SQL Server 2005's ability to pass a value into the "top" clause - it doesn't work on SQL Server 2000.

Suppose you have an index on id, this will be lightning fast:
SELECT * FROM [MyTable] WHERE [id] > (SELECT MAX([id]) - 5 FROM [MyTable])

The way your question is phrased makes it sound like you think you have to physically resort the data in the table in order to get it back in the order you want. If so, this is not the case, the ORDER BY clause exists for this purpose. The physical order in which the records are stored remains unchanged when using ORDER BY. The records are sorted in memory (or in temporary disk space) before they are returned.
Note that the order that records get returned is not guaranteed without using an ORDER BY clause. So, while any of the the suggestions here may work, there is no reason to think they will continue to work, nor can you prove that they work in all cases with your current database. This is by design - I am assuming it is to give the database engine the freedom do as it will with the records in order to obtain best performance in the case where there is no explicit order specified.
Assuming you wanted the last 5 records sorted by the field Name in ascending order, you could do something like this, which should work in either SQL 2000 or 2005:
select Name
from (
select top 5 Name
from MyTable
order by Name desc
) a
order by Name asc

You need to count number of rows inside table ( say we have 12 rows )
then subtract 5 rows from them ( we are now in 7 )
select * where index_column > 7
select * from users
where user_id >
( (select COUNT(*) from users) - 5)
you can order them ASC or DESC
But when using this code
select TOP 5 from users order by user_id DESC
it will not be ordered easily.

select * from table limit 5 offset (select count(*) from table) - 5;

Without an order, this is impossible. What defines the "bottom"? The following will select 5 rows according to how they are stored in the database.
SELECT TOP 5 * FROM [TableName]

Well, the "last five rows" are actually the last five rows depending on your clustered index. Your clustered index, by definition, is the way that he rows are ordered. So you really can't get the "last five rows" without some order. You can, however, get the last five rows as it pertains to the clustered index.
SELECT TOP 5 * FROM MyTable
ORDER BY MyCLusteredIndexColumn1, MyCLusteredIndexColumnq, ..., MyCLusteredIndexColumnN DESC

Search 5 records from last records you can use this,
SELECT *
FROM Table Name
WHERE ID <= IDENT_CURRENT('Table Name')
AND ID >= IDENT_CURRENT('Table Name') - 5

If you know how many rows there will be in total you can use the ROW_NUMBER() function.
Here's an examble from MSDN (http://msdn.microsoft.com/en-us/library/ms186734.aspx)
USE AdventureWorks;
GO
WITH OrderedOrders AS
(
SELECT SalesOrderID, OrderDate,
ROW_NUMBER() OVER (ORDER BY OrderDate) AS 'RowNumber'
FROM Sales.SalesOrderHeader
)
SELECT *
FROM OrderedOrders
WHERE RowNumber BETWEEN 50 AND 60;

In SQL Server 2012 you can do this :
Declare #Count1 int ;
Select #Count1 = Count(*)
FROM [Log] AS L
SELECT
*
FROM [Log] AS L
ORDER BY L.id
OFFSET #Count - 5 ROWS
FETCH NEXT 5 ROWS ONLY;

Try this, if you don't have a primary key or identical column:
select [Stu_Id],[Student_Name] ,[City] ,[Registered],
RowNum = row_number() OVER (ORDER BY (SELECT 0))
from student
ORDER BY RowNum desc

You can retrieve them from memory.
So first you get the rows in a DataSet, and then get the last 5 out of the DataSet.

There is a handy trick that works in some databases for ordering in database order,
SELECT * FROM TableName ORDER BY true
Apparently, this can work in conjunction with any of the other suggestions posted here to leave the results in "order they came out of the database" order, which in some databases, is the order they were last modified in.

select *
from table
order by empno(primary key) desc
fetch first 5 rows only

Last 5 rows retrieve in mysql
This query working perfectly
SELECT * FROM (SELECT * FROM recharge ORDER BY sno DESC LIMIT 5)sub ORDER BY sno ASC
or
select sno from(select sno from recharge order by sno desc limit 5) as t where t.sno order by t.sno asc

When number of rows in table is less than 5 the answers of Matt Hamilton and msuvajac is Incorrect.
Because a TOP N rowcount value may not be negative.
A great example can be found Here.

i am using this code:
select * from tweets where placeID = '$placeID' and id > (
(select count(*) from tweets where placeID = '$placeID')-2)

In SQL Server, it does not seem possible without using ordering in the query.
This is what I have used.
SELECT *
FROM
(
SELECT TOP 5 *
FROM [MyTable]
ORDER BY Id DESC /*Primary Key*/
) AS T
ORDER BY T.Id ASC; /*Primary Key*/

DECLARE #MYVAR NVARCHAR(100)
DECLARE #step int
SET #step = 0;
DECLARE MYTESTCURSOR CURSOR
DYNAMIC
FOR
SELECT col FROM [dbo].[table]
OPEN MYTESTCURSOR
FETCH LAST FROM MYTESTCURSOR INTO #MYVAR
print #MYVAR;
WHILE #step < 10
BEGIN
FETCH PRIOR FROM MYTESTCURSOR INTO #MYVAR
print #MYVAR;
SET #step = #step + 1;
END
CLOSE MYTESTCURSOR
DEALLOCATE MYTESTCURSOR

Thanks to #Apps Tawale , Based on his answer, here's a bit of another (my) version,
To select last 5 records without an identity column,
select top 5 *,
RowNum = row_number() OVER (ORDER BY (SELECT 0))
from [dbo].[ViewEmployeeMaster]
ORDER BY RowNum desc
Nevertheless, it has an order by, but on RowNum :)
Note(1): The above query will reverse the order of what we get when we run the main select query.
So to maintain the order, we can slightly go like:
select *, RowNum2 = row_number() OVER (ORDER BY (SELECT 0))
from (
select top 5 *, RowNum = row_number() OVER (ORDER BY (SELECT 0))
from [dbo].[ViewEmployeeMaster]
ORDER BY RowNum desc
) as t1
order by RowNum2 desc
Note(2): Without an identity column, the query takes a bit of time in case of large data

Get the count of that table
select count(*) from TABLE
select top count * from TABLE where 'primary key row' NOT IN (select top (count-5) 'primary key row' from TABLE)

If you do not want to arrange the table in ascending or descending order. Use this.
select * from table limit 5 offset (select count(*) from table) - 5;

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to determine data distribution in SQL Server columns using T-SQL - sql-server

Related

T-SQL Max Weight Query

Get random data from SQL Server without performance impact

how to get certain sql results

Anyway to get a value similar to ##ROWCOUNT when TOP is used?

How do I select last 5 rows in a table without sorting?

Categories

Resources