I am just curios about something I've never come across in sql server before.
This query:
SELECT N FROM (VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) T(N)
gives me result:
+---+
| N |
+---+
| 0 |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
+---+
What is the rule here? Obviously this is aligning all values into one column. Is sql server's grammar that defines this with T(N)?
On the other side, this query gives results by separate columns:
select 0,1,2,3,4,5,6,7,8,9
I just don't understand why results from the first query aligned all into one column?
The values clause is similar what you can use in the insert statement, and it's called Table Value Constructor. Your example has only one column and several rows, but you can also have multiple columns separated by comma. The T(N) define you the alias name for the table (T) and name for the column (N).
James Z is right on the money, but to expand on what it does in the answer you were referencing:
In the code that is pulled from, that section is used to start numbers table for a stacked cte. The numbers themselves don't matter, but I like them like that. They could all be 1, or 0, it would not change how it is used in this instance.
Basically we have 10 rows, and then we are going to cross join it to self N number of times to increase the row count until as many or more than we need. In the cross join I alias n with the resulting amount of rows deka is 10, hecto is 100, kilo is 1,000, et cetera.
Here is a similar query outside of the function that you were referencing:
declare #fromdate date = '20000101';
declare #years int = 30;
;with n as (select n from (values(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) t(n))
, dates as (
select top (datediff(day, #fromdate,dateadd(year,#years,#fromdate)))
[Date]=convert(date,dateadd(day,row_number() over(order by (select 1))-1,#fromdate))
from n as deka cross join n as hecto cross join n as kilo
cross join n as tenK cross join n as hundredK
order by [Date]
)
select [Date]
from dates;
The stacked cte is very efficient for generating or simulating a numbers or dates table, though using an actual numbers or calendar table will perform better as the scale increases.
Check these out for related benchmarks:
Generate a set or sequence without loops - 1 - Aaron Bertrand
Generate a set or sequence without loops - 2 - Aaron Bertrand
Generate a set or sequence without loops - 3 - Aaron Bertrand
In hist articles, Aaron Bertrand creates a stacked cte using
;WITH e1(n) AS
(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),
e2(n) AS (SELECT 1 FROM e1 CROSS JOIN e1 AS b),
....
Related
I have a problem that I am trying to solve with t-sql but I cant figure it out by myself.
I have a simple query:
select StartDate, EndDate
from ProductTable
where Site = 'X' and Product_ID = '1'
The result can look like this (there can be one or more rows with start and end dates):
StartDate
EndDate
2019-06-01
2019-09-30
2019-12-01
2020-04-30
2020-11-30
2020-12-31
What I want to do then is that for each row in this resultset, I want to create a list of months between the dates, on the format "yyyymm", and then union the result of these lists to one resultset.
So for the 3 rows in the first resultset the first step should give:
ROW 1: 201906, 201907, 201908, 201909
ROW 2: 201912, 202001, 202002, 202003, 202004
ROW 3: 202011, 202012
And the final expected result is then of course:
Months
201906
201907
201908
201909
201912
202001
202002
202003
202004
202011
202012
I have experimented a bit with CTEs and Cursors but I haven't real had any success yet.
Can someone help me out? :-)
If you have a "Tally" or "Nums" function, this becomes child's play.
SELECT Months = CONVERT(CHAR(6),DATEADD(mm,t.N,StartDate),112)
FROM dbo.ProductTable
CROSS APPLY dbo.fnTally(0,DATEDIFF(mm,StartDate,EndDate))t
WHERE Site = 'X'
AND Product_ID = '1'
ORDER BY Site,Product_ID,Months --Just in case we expand on this later.
;
It also consumes 1 read instead of 64 like the rCTE method does, which is also slower than a While Loop.
I know a lot of people don't care about that kind of performance for such small sets of data but that's also how they end up with a slow server due to "Death by a Thousand Cuts".
You can search the web for such a function but I can save you some time by posting the link to the one I use. I know the author. :D
https://www.sqlservercentral.com/scripts/create-a-tally-function-fntally
A recursive CTE works well to unfold date ranges.
;WITH RCTE_DATES AS (
SELECT
DATEADD(month, -1, DATEADD(day, 1, EOMONTH(StartDate))) AS StartDate
, DATEADD(month, -1, DATEADD(day, 1, EOMONTH(EndDate))) AS EndDate
FROM ProductTable
WHERE Site = 'X' AND Product_ID = '1'
UNION ALL
SELECT DATEADD(month, 1, StartDate), EndDate
FROM RCTE_DATES
WHERE StartDate < EndDate
)
, CTE_YEARMONTHS AS (
SELECT DISTINCT
YEAR(StartDate)*100+MONTH(StartDate) AS YearMonth
FROM RCTE_DATES
)
SELECT *
FROM CTE_YEARMONTHS
ORDER BY YearMonth;
| YearMonth |
| --------: |
| 201906 |
| 201907 |
| 201908 |
| 201909 |
| 201912 |
| 202001 |
| 202002 |
| 202003 |
| 202004 |
| 202011 |
| 202012 |
Test on db<>fiddle here
If you have a numbers table that contains tinyints to represent months (and assuming start dates can't be more than 20 years apart), this allocates a single data page:
CREATE TABLE dbo.Months
(
m tinyint NOT NULL PRIMARY KEY
);
INSERT dbo.Months SELECT TOP (256) ROW_NUMBER() OVER
(ORDER BY [object_id])-1 FROM sys.all_objects
ORDER BY [object_id];
(You may already have a numbers table of some kind, which is always a good idea to have around - more background here and here.)
Now you can use CROSS APPLY:
SELECT m = CONVERT(char(6), DATEADD(MONTH, m.m, p.StartDate), 112)
FROM dbo.ProductTable AS p
CROSS APPLY (SELECT m FROM dbo.Months
WHERE m <= DATEDIFF(MONTH, p.StartDate, p.EndDate)) AS m
WHERE Product_ID = '1' AND Site = 'X';
Or a simple join:
SELECT m = CONVERT(char(6), DATEADD(MONTH, m.m, p.StartDate), 112)
FROM dbo.ProductTable AS p
INNER JOIN dbo.Months AS m
ON m.m <= DATEDIFF(MONTH, p.StartDate, p.EndDate)
WHERE p.Product_ID = '1' AND p.Site = 'X';
I'll leave the performance tuning and analysis to you but I'll just share what I observed comparing the initial RCTE, Jeff's fnTally, and the above:
Yes, fnTally has fewer total reads, but higher estimates (30% higher, which could affect memory grants at scale), and a higher compile cost. Which of those is more important to you depends on your workload, the size of the product table, the skew in (a) matching rows and (b) max datediff, and your hardware.
Now, you can do this without a recursive CTE, a helper table, or a helper function, but it leads to higher reads:
;WITH m(m) AS
(
SELECT TOP (256) m = ROW_NUMBER() OVER (ORDER BY [object_id])-1
FROM sys.all_objects ORDER BY [object_id]
)
SELECT CONVERT(char(6), DATEADD(MONTH, m.m, p.StartDate), 112)
FROM dbo.ProductTable AS p
INNER JOIN m ON m.m <= DATEDIFF(MONTH, p.StartDate, p.EndDate)
WHERE Product_ID = '1' AND Site = 'X';
You can squeeze those read numbers down by reducing 256 in the TOP clause if you know your datediffs can be < 256 months, but it's hard to get faster than 0. Here are the comparison results:
Example db<>fiddle
And finally another good resource for number generator functions is this series from Itzik Ben-Gan (work backward and read all the comments).
I'm quite new to SQL but use it a lot now in my work now (Microsoft SQL Server).
So the issue is this: I collect data that is atypical for a certain column.
Let's say I got different Burgers and they should have a standardized calories value. So I did this with a query
------------------------------------------
| Burger | calories | numBurgers | Rank |
------------------------------------------
| Chicken| 600 | 20 | 1 |
| Chicken| 400 | 3 | 2 |
| Beef | 700 | 35 | 1 |
| Beef | 850 | 4 | 2 |
-------------------------------------------
To get a list of all the "wrong" burgers I use a temporary table and filter out GroupRank = 1
USE database;
GO
WITH GapRanking AS
(
SELECT TOP 100 PERCENT Burger, calories, COUNT(calories),
ROW_NUMBER() OVER(PARTITION BY Burger ORDER BY COUNT(calories) DESC) AS Rank
)
SELECT * FROM GapRanking
WHERE Rank <> 1
...
I get all combinations of Burgers and calories that are not "standard"
Then I do an Inner Join with the original table and all columns on the one above.
SELECT * FROM BaseTable as base
INNER JOIN
(SELECT * FROM GapRanking
WHERE Rank <> 1) AS err
ON (base.Burgers = err.Burgers
AND base.calories = err.calories)
This way I get a table with complete information about the "not-standard" burgers. So far so good.
Now I want to add other rows where there is a deviation in another criteria, price for example, not just calories and add it to the list if its not already there.
So I thought of UNION or JOIN.
So what is the best approach. UNION the above query with the same query just different column (price instead of calories)?
Or do a JOIN with the same query just different column (price instead of calories)?
The code gets quite "ugly" and I'm not sure if I do the right approach here.
Also because of me using the temporary table using WITH a UNION does not seem possible so easily.
I'm really glad for any ideas here. Cheers
use sub-query and join below is just sudo-code not actual you can follow like this way
select t1.*, t2.required_colum
(SELECT TOP 100 PERCENT Burger, calories, COUNT(calories),
ROW_NUMBER() OVER(PARTITION BY Burger ORDER BY COUNT(calories) DESC) AS Rank
) as t1
join
(SELECT TOP 100 PERCENT Burger, calories, COUNT(calories),
ROW_NUMBER() OVER(PARTITION BY Burger ORDER BY COUNT(calories) DESC) AS Rank
) as t2
on t1.colname = t2.colname
where t1.Rank != 1 and t2.Rank != 1
How can I take sum of each rows by two row sum in 3rd column?
Here's a screenshot to illustrate:
You can see for id 1 sum is 10 but for id 2 sum is 10+50 = 60
and third sum is 60+100 = 160 and so on.
With Cte it is working fine for me. I need with out ;with cte means though code I need the sum
Example will as shown below
DECLARE #t TABLE(ColumnA INT, ColumnB VARCHAR(50));
INSERT INTO #t
VALUES (10,'1'), (50,'2'), (100,'3'), (5,'4'), (45,'5');
;WITH cte AS
(
SELECT ColumnB, SUM(ColumnA) asum
FROM #t
GROUP BY ColumnB
), cteRanked AS
(
SELECT asum, ColumnB, ROW_NUMBER() OVER(ORDER BY ColumnB) rownum
FROM cte
)
SELECT
(SELECT SUM(asum)
FROM cteRanked c2
WHERE c2.rownum <= c1.rownum) AS ColumnA,
ColumnB
FROM
cteRanked c1;
One option, which doesn't require explicit analytic functions, would be to use a correlated subquery to calculate the running total:
SELECT
t1.ID,
t1.Currency,
(SELECT SUM(t2.Currency) FROM yourTable t2 WHERE t2.ID <= t1.ID) AS Sum
FROM yourTable t1
Output:
Demo here:
Rextester
It looks like you need a simple running total.
There is an easy and efficient way to calculate running total in SQL Server 2012 and later. You can use SUM(...) OVER (ODER BY ...), like in the example below:
Sample data
DECLARE #t TABLE(ColumnA INT, ColumnB VARCHAR(50));
INSERT INTO #t
VALUES (10,'1'), (50,'2'), (100,'3'), (5,'4'), (45,'5');
Query
SELECT
ColumnB
,ColumnA
,SUM(ColumnA) OVER (ORDER BY ColumnB
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS SumColumnA
FROM #t
ORDER BY ColumnB;
Result
+---------+---------+------------+
| ColumnB | ColumnA | SumColumnA |
+---------+---------+------------+
| 1 | 10 | 10 |
| 2 | 50 | 60 |
| 3 | 100 | 160 |
| 4 | 5 | 165 |
| 5 | 45 | 210 |
+---------+---------+------------+
For SQL Server 2008 and below you need to use either correlated sub-queries as you do already or a simple cursor, which may be faster if the table is large.
I have a table where I want to select the maximum of a column but based on when the date difference is equal or small (lets say 3 days). When two subsequent dates are very close, the data are likely spurious and I want to get the highest state when that happens.
My data looks similar to this
DECLARE #TestingResults TABLE (
IDNumber varchar(100),
DateSeen date,
[state] int)
INSERT INTO #TestingResults VALUES
('A','2015-04-21',2),
('A','2015-05-08',2),
('A','2015-07-01',3),
('B','2014-06-18',100), -- this is the one I want
('B','2014-06-19',2),
('B','2014-07-31',2),
('B','2014-08-11',3),
('B','2014-09-24',3),
('B','2014-10-24',3),
('B','2014-11-24',3),
('B','2014-12-15',3),
('B','2015-01-12',3),
('B','2015-01-13',400), -- this is the one I want
('B','2015-04-06',10), -- either will do
('B','2015-04-07',10),
('B','2015-07-06',3), -- either will do
('B','2015-07-07',3),
('B','2015-10-12',3),
('C','2012-02-20',3),
('C','2012-03-12',3),
('C','2012-04-02',3),
('C','2012-11-21',3)
What I really want is something like this where I take the maximum of state when the difference between dates is < 3 (note, some of the data may have the same state even when the differences in date are small ...) :
IDNumber DateSeen state
A 2015-04-21 2
A 2015-05-08 2
A 2015-07-01 3
-- if there are observations < 3 days apart, take MAX
B 2014-06-18 100
B 2014-07-31 2
B 2014-08-11 3
B 2014-09-24 3
B 2014-10-24 3
B 2014-11-24 3
B 2014-12-15 3
-- if there are observations < 3 days apart, take MAX
B 2015-01-13 400
-- if there are observations < 3 days apart, take MAX
B 2015-04-07 10
-- if there are observations < 3 days apart, take MAX
B 2015-07-07 3
B 2015-10-12 3
C 2012-02-20 3
C 2012-03-12 3
C 2012-04-02 3
C 2012-11-21 3
I guess I could create another variable table to hold it and then query it but there are a couple of problems. First as you can see, IDNumber='B' has a couple of triggers in its sequences of dates so I am thinking there should be an 'smarter' way.
Thanks!
After your clarifying comments (thanks for that!), I would do this as follows:
SELECT ISNULL(high.IDNumber, results.IDNumber) AS IDNumber,
ISNULL(high.DateSeen, results.DateSeen) AS DateSeen,
ISNULL(high.[state], results.[state]) AS [state]
FROM #TestingResults results
OUTER APPLY
(
SELECT TOP 1 IDNumber, DateSeen, [state]
FROM #TestingResults highest
WHERE highest.DateSeen < results.DateSeen
AND highest.IDNumber = results.IDNumber
AND DATEDIFF(DAY,highest.DateSeen,results.DateSeen) <=3
ORDER BY [state] DESC, [DateSeen] DESC
) high
WHERE NOT EXISTS
(
SELECT 1
FROM #TestingResults nearFuture
WHERE nearFuture.DateSeen > results.DateSeen
AND nearFuture.IDNumber = results.IDNumber
AND DATEDIFF(DAY,results.DateSeen,nearFuture.DateSeen) <=3
)
This is almost certainly not the most elegant way to achieve this (I suspect this could be done more efficiently with Window Functions or a recursive CTE or similar), I believe it gives you the behaviour and results you desire.
This should do it using a recursive CTE:
WITH TestingResults AS (
SELECT
*
,ROW_NUMBER() OVER(ORDER BY IDNumber, DateSeen) AS RowNum
FROM #TestingResults
), Data AS (
SELECT
tmp1.IDNumber,
tmp1.DateSeen,
tmp1.state,
tmp1.RowNum,
tmp1.RowNum AS GroupID
FROM (
SELECT
*
,ABS(DATEDIFF(DAY, DateSeen, LAG(DateSeen, 1, NULL) OVER(PARTITION BY IDNumber ORDER BY DateSeen))) AS AbsPrev
FROM TestingResults
) AS tmp1
WHERE tmp1.AbsPrev IS NULL OR tmp1.AbsPrev >= 3 --the first date in a sequence
UNION ALL
SELECT
r.IDNumber,
r.DateSeen,
r.state,
r.RowNum,
d.GroupID
FROM Data d
INNER JOIN TestingResults r ON
r.IDNumber = d.IDNumber
AND DATEDIFF(DAY, d.DateSeen, r.DateSeen) < 3
AND d.RowNum+1 = r.RowNum
)
SELECT MIN(d.IDNumber) AS IDNumber, MAX(d.DateSeen) AS DateSeen, MAX(d.state) AS state
FROM Data d
GROUP BY d.GroupID
How do I find the unique groups that are present in my table, and display how often that type of group is used?
For example (SQL Server 2008R2)
So, I would like to find out how many times the combination of
PMI 100
RT 100
VT 100
is present in my table and for how many itemid's it is used;
These three form a group because together they are assigned to a single itemid. The same combination is assigned to id 2527 and 2529, so therefore this group is used at least twice. (usagecount = 2)
(and I want to know that for all types of groups that are appearing)
The entire dataset is quite large, about 5.000.000 records, so I'd like to avoid using a cursor.
The number of code/pct combinations per itemid varies between 1 and 6.
The values in the "code" field are not known up front, there are more than a dozen values on average
I tried using pivot, but I got stuck eventually and I also tried various combinations of GROUP-BY and counts.
Any bright ideas?
Example output:
code pct groupid usagecount
PMI 100 1 234
RT 100 1 234
VT 100 1 234
CD 5 2 567
PMI 100 2 567
VT 100 2 567
PMI 100 3 123
PT 100 3 123
VT 100 3 123
RT 100 4 39
VT 100 4 39
etc
Just using a simple group:
SELECT
code
, pct
, COUNT(*)
FROM myTable
GROUP BY
code
, pct
Not too sure if that's more like what you're looking for:
select
uniqueGrp
, count(*)
from (
select distinct
itemid
from myTable
) as I
cross apply (
select
cast(code as varchar(max)) + cast(pct as varchar(max)) + '_'
from myTable
where myTable.itemid = I.itemid
order by code, pct
for xml path('')
) as x(uniqueGrp)
group by uniqueGrp
Either of these should return each combination of code and percentage with a group id for the code and the total number of instances of the code against it. You can use them for also adding the number of instances of the specific code/pct combo too for determining % contribution etc
select
distinct
t.code, t.pct, v.groupcol, v.vol
from
[tablename] t
inner join (select code, rank() over(order by count(*)) as groupcol,
count(*) as vol from [tablename] s
group by code) v on v.code=t.code
or
select
t.code, t.pct, v.groupcol, v.vol
from
(select code, pct from [tablename] group by code, pct) t
inner join (select code, rank() over(order by count(*)) as groupcol,
count(*) as vol from [tablename] s
group by code) v on v.code=t.code
Grouping by Code, and Pct should be enough I think. See the following :
select code,pct,count(p.*)
from [table] as p
group by code,pct