SQL Server, subset a table and create new column based on condition - sql-server

If I have a table in SQL as:
id
code
1
6
1
8
1
4
2
3
2
7
2
4
3
7
3
6
3
7
What I need to do, logically is:
Get the top row of each group when grouped by id, ordered by code
Create a new column to show if the code column contained a 7 anywhere, within the group
Desired result:
id
code
c7
1
4
N
2
3
Y
3
6
Y
I think it needs a "CASE WHEN" statement in the SELECT, but have not worked it out. What query can I execute to get this?

Seems like you can use a MIN and a conditional aggregate:
SELECT id,
MIN(Code) AS Code,
CASE WHEN COUNT(CASE code WHEN 7 THEN 1 END) > 0 THEN 'Y' ELSE 'N' END AS C7
FROM dbo.YourTable
GROUP BY id;

There is probably a better way to do this, but what comes to mind is that first you have to partition the table to get the top one based on whatever that criteria is, then join back against itself to find the ones with the 7
declare #table1 table (id int not null, code int not null)
insert into #table1 (id, code)
values
(1,6),
(1,8),
(1,4),
(2,3),
(2,7),
(2,4),
(3,7),
(3,6),
(3,7)
select id, code, c7
from (
select t.id ,t.code
,(CASE WHEN c.id is null then 'N' else 'Y' END) as c7
,ROW_NUMBER() OVER (PARTITION BY t.id order by t.code) AS p
from #table1 t
left outer join (
select id, code, 'Y' as c7
from #table1
where code = 7) c on c.id = t.id
) sorted
where sorted.p = 1

Related

How to select the value from the table based on category_id USING SQL SERVER

How to select the value from the table based on category_id?
I have a table like this. Please help me.
Table A
ID Name category_id
-------------------
1 A 1
2 A 1
3 B 1
4 C 2
5 C 2
6 D 2
7 E 3
8 E 3
9 F 3
How to get the below mentioned output from table A?
ID Name category_id
--------------------
1 A 1
2 A 1
4 C 2
5 C 2
7 E 3
8 E 3
Give a row number for each row based on group by category_id and sort by ascending order of ID. Then select the rows having row number 1 and 2.
Query
;with cte as (
select [rn] = row_number() over(
partition by [category_id]
order by [ID]
), *
from [your_table_name]
)
select [ID], [Name], [category_id]
from cte
where [rn] < 3;
Kindly run this query It really help You Out.
SELECT tbl.id,tbl.name, tbl.category_id FROM TableA as tbl WHERE
tbl.name IN(SELECT tbl2.name FROM TableA tbl2 GROUP BY tbl2.name HAVING Count(tbl2.name)> 1)
Code select all category_id from TableA which has Name entries more then one. If there is single entry of any name group by category_id then such data will be excluded. In above example questioner want to eliminate those records that have single Name entity like wise category_id 1 has name entries A and B among which A has two entries and B has single entry so he want to eliminate B from result set.

TSQL - Find Customers who purchased two sets of Products

I have a table with values like this:
Customer Product#
A 1
A 3
A 5
A 7
A 11
A 13
B 5
B 11
B 13
C 4
C 5
C 6
C 11
C 14
C 42
C 60
I would like a query that will give me all customers who have:
a. Either Product# 1 OR 5
AND
b. BOTH Product# 11 AND 13.
I would appreciate any help or suggestions.
Thanks!
Try this:
select customer
from your_table
group by customer
having count(distinct case
when product# in (11, 13)
then product#
end) = 2
and count(case
when product# in (1, 5)
then 1
end) > 0
grouping by customer to find if both 11 and 13 are present, use conditional aggregation and similarly, check if there is atleast one row with 1 or 5 for the customer.
'HAVING' works but you could also do this with a couple of subselects- longer and less efficient but easier to document and/or use with more complex outputs.
SELECT DISTINCT
a.customer
FROM
your_table AS a
WHERE
a.customer IN ( SELECT customer FROM your_table WHERE product# IN (1,5) )
AND a.customer IN ( SELECT customer FROM your_table WHERE product# = 11)
AND a.customer IN ( SELECT customer FROM your_table WHERE product# = 11)

Performance issue with CTE SQL Server query

We have a table with a parent child relationship, that represents a deep tree structure.
We are using a view with a CTE to query the data but the performance is poor (see code and execution plan below).
Is there any way we can improve the performance?
WITH cte (ParentJobTypeId, Id) AS
(
SELECT
Id, Id
FROM
dbo.JobTypes
UNION ALL
SELECT
e.Id, cte.Id
FROM
cte
INNER JOIN
dbo.JobTypes AS e ON e.ParentJobTypeId = cte.ParentJobTypeId
)
SELECT
ISNULL(Id, 0) AS ParentJobTypeId,
ISNULL(ParentJobTypeId, 0) AS Id
FROM
cte
A quick example of using the range keys. As I mentioned before, hierarchies were 127K points and some sections where 15 levels deep
The cte Builds, let's assume the hier results will be will be stored in a table (indexed as well)
Declare #Table table(ID int,ParentID int,[Status] varchar(50))
Insert #Table values
(1,101,'Pending'),
(2,101,'Complete'),
(3,101,'Complete'),
(4,102,'Complete'),
(101,null,null),
(102,null,null)
;With cteOH (ID,ParentID,Lvl,Seq)
as (
Select ID,ParentID,Lvl=1,cast(Format(ID,'000000') + '/' as varchar(500)) from #Table where ParentID is null
Union All
Select h.ID,h.ParentID,cteOH.Lvl+1,Seq=cast(cteOH.Seq + Format(h.ID,'000000') + '/' as varchar(500)) From #Table h INNER JOIN cteOH ON h.ParentID = cteOH.ID
),
cteR1 as (Select ID,Seq,R1=Row_Number() over (Order by Seq) From cteOH),
cteR2 as (Select A.ID,R2 = max(B.R1) From cteOH A Join cteR1 B on (B.Seq Like A.Seq+'%') Group By A.ID)
Select B.R1
,C.R2
,A.Lvl
,A.ID
,A.ParentID
Into #TempHier
From cteOH A
Join cteR1 B on (A.ID=B.ID)
Join cteR2 C on (A.ID=C.ID)
Select * from #TempHier
Select H.R1
,H.R2
,H.Lvl
,H.ID
,H.ParentID
,Total = count(*)
,Complete = sum(case when D.Status = 'Complete' then 1 else 0 end)
,Pending = sum(case when D.Status = 'Pending' then 1 else 0 end)
,PctCmpl = format(sum(case when D.Status = 'Complete' then 1.0 else 0.0 end)/count(*),'##0.00%')
From #TempHier H
Join (Select _R1=B.R1,A.* From #Table A Join #TempHier B on A.ID=B.ID) D on D._R1 between H.R1 and H.R2
Group By H.R1
,H.R2
,H.Lvl
,H.ID
,H.ParentID
Order By 1
Returns the hier in a #Temp table for now. Notice the R1 and R2, I call these the range keys. Data (without recursion) can be selected and aggregated via these keys
R1 R2 Lvl ID ParentID
1 4 1 101 NULL
2 2 2 1 101
3 3 2 2 101
4 4 2 3 101
5 6 1 102 NULL
6 6 2 4 102
VERY SIMPLE EXAMPLE: Illustrates the rolling the data up the hier.
R1 R2 Lvl ID ParentID Total Complete Pending PctCmpl
1 4 1 101 NULL 4 2 1 50.00%
2 2 2 1 101 1 0 1 0.00%
3 3 2 2 101 1 1 0 100.00%
4 4 2 3 101 1 1 0 100.00%
5 6 1 102 NULL 2 1 0 50.00%
6 6 2 4 102 1 1 0 100.00%
The real beauty of the the range keys, is if you know an ID, you know where it exists (all descendants and ancestors).

Moving Median, Mode in T-SQL

I am using SQL Server 2012 and I know it is quite simple to calculate moving averages.
But what I need is to get the mode and the median for a defined window frame like so (with a window of 2 preceding to current row; month unique):
MONTH | CODE | MEDIAN | MODE
1 0 0 0
2 3 1.5 0
3 2 2 0
4 2 2 2
5 2 2 2
6 5 2 2
7 3 3 2
If several values qualify as mode, than pick the first.
I commented my code thoroughly. Read my comments on my Mode calculations and let me know it needs tweaking. Overall, it's a relatively simple query. It just has a lot of ugly subqueries and it has a lot of comments. Check it out:
DECLARE #Table TABLE ([Month] INT,[Code] INT);
INSERT INTO #Table
VALUES (1,0),
(2,3),
(3,2),
(4,2), --Try commenting this out to test my special mode thingymajig
(5,2),
(6,5),
(7,3);
WITH CTE
AS
(
SELECT ROW_NUMBER() OVER (ORDER BY [Month]) row_num,
[Month],
CAST(Code AS FLOAT) Code
FROM #Table
)
SELECT [Month],
Code,
ISNULL((
SELECT CASE
--When there is only one previous value at row_num = 2, find Mean of first two codes
WHEN A.row_num = 2 THEN (LAG(B.code,1) OVER (ORDER BY [Code]) + B.Code)/2.0
--Else find middle code value of current and previous two rows
ELSE B.Code
END
FROM CTE B
--How subquery relates to outer query
WHERE B.row_num BETWEEN A.row_num - 2 AND A.row_num
ORDER BY B.[Code]
--Order by code and offset by 1 so don't select the lowest value, but fetch the one above the lowest value
OFFSET 1 ROW FETCH NEXT 1 ROW ONLY),
0) AS Median,
--I did mode a little different
--Instead of Avg(D.Code) you could list the values because with mode,
--If there's a tie with more than one of each number, you have multiple modes
--Instead of doing that, I simply return the mean of the tied modes
--When there's one, it doesn't change anything.
--If you were to delete the month 4, then your number of Codes 2 and number of Codes 3 would be the same in the last row.
--Proper mode would be 2,3. I instead average them out to be 2.5.
ISNULL((
SELECT AVG(D.Code)
FROM (
SELECT C.Code,
COUNT(*) cnt,
DENSE_RANK() OVER (ORDER BY COUNT(*) DESC) dnse_rank
FROM CTE C
WHERE C.row_num <= A.row_num
GROUP BY C.Code
HAVING COUNT(*) > 1) D
WHERE D.dnse_rank = 1),
0) AS Mode
FROM CTE A
Results:
Month Code Median Mode
----------- ---------------------- ---------------------- ----------------------
1 0 0 0
2 3 1.5 0
3 2 2 0
4 2 2 2
5 2 2 2
6 5 2 2
7 3 3 2
If I understood your requirements correctly, your source table contains MONTH and CODE columns, and you want to calculate MEDIAN and MODE.
The query below calculates MEDIAN and MODE with moving window <= than 3 month ("2 preceding to current row") and returns the results matching your example.
-----------------------------------------------------
--Demo data
-----------------------------------------------------
CREATE TABLE #Data(
[Month] INT NOT NULL,
[Code] INT NOT NULL,
CONSTRAINT [PK_Data] PRIMARY KEY CLUSTERED
(
[Month] ASC
));
INSERT #Data
([Month],[Code])
VALUES
(1,0),
(2,3),
(3,2),
(4,2),
(5,2),
(6,5),
(7,3);
-----------------------------------------------------
--Query
-----------------------------------------------------
DECLARE #PrecedingRowsLimit INT = 2;
WITH [MPos] AS
(
SELECT [R].[Month]
, [RB].[Month] AS [SubId]
, [RB].[Code]
, ROW_NUMBER() OVER(PARTITION BY [R].[Month] ORDER BY [RB].[Code]) AS [RowNumberInPartition]
, CASE
WHEN [R].[Count] % 2 = 1 THEN ([R].[Count] + 1) / 2
ELSE NULL
END AS [MedianPosition]
, CASE
WHEN [R].[Count] % 2 = 0 THEN [R].[Count] / 2
ELSE NULL
END AS [MedianPosition1]
, CASE
WHEN [R].[Count] % 2 = 0 THEN [R].[Count] / 2 + 1
ELSE NULL
END AS [MedianPosition2]
FROM
(
SELECT [RC].[Month]
, [RC].[RowNumber]
, CASE WHEN [RC].[Count] > #PrecedingRowsLimit + 1 THEN #PrecedingRowsLimit + 1 ELSE [RC].[Count] END AS [Count]
FROM
(
SELECT [Month]
, ROW_NUMBER() OVER(ORDER BY [Month]) AS [RowNumber]
, ROW_NUMBER() OVER(ORDER BY [Month]) AS [Count]
FROM #Data
) [RC]
) [R]
INNER JOIN #Data [RB]
ON [R].[Month] >= [RB].[Month]
AND [RB].[Month] >= [R].[RowNumber] - #PrecedingRowsLimit
)
SELECT DISTINCT [M].[Month]
, [ORIG].[Code]
, COALESCE([ME].[Code],([M1].[Code] + [M2].[Code]) / 2.0) AS [Median]
, [MOD].[Mode]
FROM [MPos] [M]
LEFT JOIN [MPOS] [ME]
ON [M].[Month] = [ME].[Month]
AND [M].[MedianPosition] = [ME].[RowNumberInPartition]
LEFT JOIN [MPOS] [M1]
ON [M].[Month] = [M1].[Month]
AND [M].[MedianPosition1] = [M1].[RowNumberInPartition]
LEFT JOIN [MPOS] [M2]
ON [M].[Month] = [M2].[Month]
AND [M].[MedianPosition2] = [M2].[RowNumberInPartition]
INNER JOIN
(
SELECT [MG].[Month]
, FIRST_VALUE([MG].[Code]) OVER (PARTITION BY [MG].[Month] ORDER BY [MG].[Count] DESC , [MG].[SubId] ASC) AS [Mode]
FROM
(
SELECT [Month] , MIN([SubId]) AS [SubId], [Code] , COUNT(1) AS [Count]
FROM [MPOS]
GROUP BY [Month] , [Code]
) [MG]
) [MOD]
ON [M].[Month] = [MOD].[Month]
INNER JOIN #Data [ORIG]
ON [ORIG].[Month] = [M].[Month]
ORDER BY [M].[Month];

Select a distinct record, filtering is not working

Hello EVery I am new to SQl. query to result in the following records.
I have a table with records as
c1 c2 c3 c4 c5 c6
1 John 2.3.2010 12:09:54 4 7 99
2 mike 2.3.2010 13:09:59 8 6 88
3 ahmad 2.3.2010 13:09:59 1 9 19
4 Jim 23.3.2010 16:35:14 4 5 99
5 run 23.3.2010 12:09:54 3 8 12
I want to fecth only records.
i.e only 1 latest record per day. If both of them happen at the same time, sort by C1.so in 1&3 it should fetch 3.
3 ahmad 2.3.2010 14:09:59 1 9 19
4 Jim 23.3.2010 16:35:14 4 5 99
I have got a new problem in this.
If i filter the records based on conditions the last record is missing. I tried many ways but still it is failing. Here update_log is my table.
SELECT * FROM update_log t1
WHERE (t1.c3) =
(
SELECT MAX(t2.c3)
FROM update_log t2
WHERE DATEDIFF(dd,t2.c3, t1.c3) = 0
)
and t1.c3 > '02.03.2010' and t1.modified_at <= '22.03.2010'
ORDER BY t1.c3 ASC. But i am not able to retrieve the record
4 Jim 23.3.2010 16:35:14 4 5 99
I dont know this query results in only
3 ahmad 2.3.2010 14:09:59 1 9 19
The format of the column c3 is datetime. I am pumping the data into the column as
using $date = date("d.m.Y H:i",time()); -- simple date fetch of today.
Another query that i tried for the same purpose.
select * from (select convert(varchar(10), c3,104) as date, max(c3) as max_date, max(c1) as Nr from update_log group by convert(varchar(10), c3,104)) as t2 inner join update_log as t1 on (t2.max_date = t1.c3 and convert(varchar(10), c3,104) = date and t1.[c1]= Nr) WHERE t1.c3 >= '02.03.2010' and t1.c3 <= '16.04.2010' . I even tried this way..the same error last record is not coming..
Following steps should produce the results you are after
Find max c3 for every day.
Join the results with your original table, witholding only the max c1 values.
SQL Statement (Edited)
DECLARE #update_log TABLE (c1 INTEGER, c3 DATETIME)
INSERT INTO #update_log
SELECT 1, '3.2.2010 12:09:54'
UNION ALL SELECT 2, '3.2.2010 13:09:59'
UNION ALL SELECT 3, '3.2.2010 13:09:59'
UNION ALL SELECT 4, '3.23.2010 16:35:14'
UNION ALL SELECT 5, '3.23.2010 12:09:54'
SELECT c1 = MAX(l.c1), l.c3
FROM #update_log l
INNER JOIN (
SELECT c3_max = MAX(c3)
FROM #update_log
WHERE c3 > '3.2.2010 00:00:00'
AND c3 < '3.24.2010 00:00:00'
GROUP BY
CONVERT(VARCHAR(10), c3, 101)
) l_maxdate ON l_maxdate.c3_max = l.c3
GROUP BY
l.c3
Notes
You should read the FAQ regarding as to how this site operates. As has been mentioned, Stack Overflow is not intented to be used like one would use a newsreader where you ask follow-up questions after an answer has been given by creating a new answer.
You should edit your question for any additional information or use the comments. If the additional information is that much that it in effect changes the entire question, you should consider making a new question out of it.
That being said, enjoy SO.
Assuming c1 is unique, hopefully even the primary key. This also assumes the use of SQL Server 2008 for the DATE data type.
SELECT t1.*
FROM update_log t1
WHERE t1.c3 > '02.03.2010'
AND t1.modified_at <= '22.03.2010'
AND t1.c1 IN
( SELECT TOP 1 c1
FROM update_log t2
WHERE CAST(t1.c3 As DATE) = CAST(t2.c3 As DATE)
ORDER BY c3 DESC, c1 DESC
)
OK, Now i understood. actually i wrote the query for the same purpose this way.
select * from #temp
select * from
(select max(c1) as nr from
(select convert(varchar(10), c3,104) as date, max(c3) as max_date
from #temp where
convert(varchar(10),c3,104) >= '02.02.2010' and
convert(varchar(10),c3,104) <= '23.02.2010'
group by convert(varchar(10), c3,104))
as t2 inner join #temp as t1 on (t2.max_date = t1.c3 and
convert(varchar(10), c3,104) = date)
group by convert(varchar(10),max_date,104))
as t3 inner join #temp as t4 on (t3.nr = t4.c1 )
If i change these 2 lines to c3 >= '02.02.2010' and
c3 <= '24.02.2010'. It is working fine. but the query that i have posted is not able to filter the records properly based on dates.
I want to know where i went to wrong to enhance my knoweldge rather than just copying ur query:-)

Resources