SQLServer Order By IGNORE NULL values - sql-server

For some reason I can't change the table or update the data for now. Here the problem :
I have menu_user table below :
userID menuID
(null) 2
(null) 3
1 3
2 1
3 2
4 5
5 0
userID and menuID not duplicated. The problem is how to ORDER BY userID, menuID but when userID has NULL value, it will look for another row that has same menuID and place it after this row. menuID just have max 2 same value and if it have, another one must be NULL
The order result expected :
userID menuID
1 3
(null) 3
2 1
3 2
(null) 2
4 5
5 0
Here the script sample :
CREATE TABLE [dbo].[menu_user](
[userID] [int] NULL,
[menuID] [int] NULL
);
INSERT [dbo].[menu_user] ([userID], [menuID]) VALUES (NULL, 3);
INSERT [dbo].[menu_user] ([userID], [menuID]) VALUES (1, 3);
INSERT [dbo].[menu_user] ([userID], [menuID]) VALUES (2, 1);
INSERT [dbo].[menu_user] ([userID], [menuID]) VALUES (3, 2);
INSERT [dbo].[menu_user] ([userID], [menuID]) VALUES (4, 5);
INSERT [dbo].[menu_user] ([userID], [menuID]) VALUES (5, 0);
INSERT [dbo].[menu_user] ([userID], [menuID]) VALUES (NULL, 2);
ADDED
If possible I want this script as View (just SELECT with No Variable).

This seems to do the trick. You need to do something to relate multiple rows together. Here I've chosen to use a left join:
select
m1.*
from
menu_user m1
left join
menu_user m2
on
m1.userID is null and
m1.menuID = m2.menuID and
m2.userID is not null
order by
COALESCE(m1.userID,m2.userID),m1.userID desc
Result:
userID menuID
----------- -----------
1 3
NULL 3
2 1
3 2
NULL 2
4 5
5 0
Hopefully you can see how it's achieving its aims.

Check this, ordering is bit messed up but this gives you your desired result.
SELECT * FROM menu_user mu
ORDER BY mu.menuID,
CASE WHEN mu.userID IS NULL THEN mu.menuID END

Using the left join solution will produce duplicates when there are more than 1 non-null Users for a menuID. This is another method.
select userID, menuID
From (
select *, Case when a.UseriD is not null then cast(a.userID as float) else
(select max(b.userID) + 0.1 from menu_user b where a.menuID = b.menuID and a.userID is null) end as SortCol
from menu_user a
) c Order by SortCol

Try this query:
SELECT *
FROM menu_user mu
WHERE userID IS NOT NULL
ORDER BY mu.menuID

Related

Identify duplicates based on multiple columns and parent row

This is an example of table data that I am working on (the table contained a lot of columns, I am showing here only the relevant ones):
Id
job_number
status
parent_id
1
42FWD-42
0
0
2
42FWD-42
1
1
3
42FWD-42
5
1
Id is auto generated. parent_id links the job using the id.
When a new job is created via the app, a new row is created (with status "0"). The auto-generated Id is then used for subsequent rows of same job, and set as parent id.
Another record with status "1" (which is code for started) is also created just after parent record.
Explanation of the problem: due to a bug in the app, there are duplicate set of rows for the same job.
Example of problem
Id
job_number
status
parent_id
1
42FWD-42
0
0
2
42FWD-42
0
0
3
42FWD-42
1
1
4
42FWD-42
1
2
5
42FWD-42
5
1
As you can see from this example, due to the bug, there are 2 rows with "0" status for the same job, and 2 rows with "1" status.
This creates a lot of problems in operation in app where the job is updated using the job number.
The status number should not repeat for a specific job.
What I want to do is to find all duplicates like those in example. For example, I want a query where I can find all duplicates which have same job number, but different parent_id and NO "5" status.
Example result using the example table above, I need the query to return:
Id
job_number
status
parent_id
2
42FWD-42
0
0
4
42FWD-42
1
2
Explanation of this result:
Row with Id=1 is considered the correct record because it has an associated record with status "5"
Row with Id=2 is considered duplicate and its associated records are also considered duplicate
Another possible case: there are duplicate rows, but none have status=5. These rows can be discarded, ie need not be shown in results.
A brief explanation of how the query works will be appreciated.
EDIT:
I forgo to add an important information:
job_number is case sensitive.
ie: 42FWD-42 and 42fwd-42 are different and valid job number. They should not be considered duplicates, and are 2 separate jobs.
The reason for this is the actual job number is not small text as in my example. It is a long string like a guid.
First I must mention you should block identical rows by means of a unique constraint. I suggest that once you have eliminated all duplicates you put up a such a constraint to keep this from happening again.
Now for your question, you can do this by grouping on the duplicate columns, and have only those that count more than one.
Here is an example
declare #t table (id int, job_number varchar(10), status int, parent_id int)
insert into #t
values (1, '42FWD-42', 0, 0), (2, '42FWD-42', 0, 0), (3, '42FWD-42', 1, 1), (4, '42FWD-42', 1, 2), (5, '42FWD-42', 5, 1)
select max(t.id) as id, t.job_number, t.status
from #t t
group by t.job_number, t.status
having count(*) > 1
the result is
id job_number status
2 42FWD-42 0
4 42FWD-42 1
and to get also the parent_id you can add a self join
select max(t.id) as id,
t.job_number,
t.status,
(select t2.parent_id from #t t2 where t2.id = max(t.id)) as parent_id
from #t t
group by t.job_number, t.status
having count(*) > 1
this returns
id job_number status parent_id
2 42FWD-42 0 0
4 42FWD-42 1 2
EDIT
To solve the addional problem in the edit of your question, about the case sensitive, you can fix that by using a COLLATE in your field retrieval and your comparision
this should do it
declare #t table (id int, job_number varchar(10), status int, parent_id int)
insert into #t
values (1, '42FWD-42', 0, 0),
(2, '42FWD-42', 0, 0),
(3, '42FWD-42', 1, 1),
(4, '42fwd-42', 1, 2), -- LOWERCASE !!!
(5, '42FWD-42', 5, 1)
select max(t.id) as id,
t.job_number COLLATE Latin1_General_CS_AS,
t.status,
(select t2.parent_id from #t t2 where t2.id = max(t.id)) as parent_id
from #t t
group by t.job_number COLLATE Latin1_General_CS_AS, t.status
having count(*) > 1
and now the result will be
id job_number status parent_id
2 42FWD-42 0 0
Yet another edit
Now, suppose you need to use the result of these duplicate id's in another query, you could do something like this
select t.*
from #t t
where t.id in ( select max(t.id) as id
from #t t
group by t.job_number COLLATE Latin1_General_CS_AS, t.status
having count(*) > 1
)
What I am doing here is getting only the duplicate id's in a form that can be used to feed a where clause in another query.
This way you can use the result set in any way you wish.
Also note that for this we don't need the self join to retrieve the parent_id anymore.
One possible use of this could be to delete duplicate rows, you can write
delete from yourtable
where id in ( select max(t.id) as id
from #t t
group by t.job_number COLLATE Latin1_General_CS_AS, t.status
having count(*) > 1
)
you can try to use ROW_NUMBER window function to get duplicate row data and its id by job_number, then using cte recursive to find all error records by this id
Query 1:
;WITH CTE AS (
SELECT *,ROW_NUMBER() OVER (PARTITION BY job_number ORDER BY Id) rn
FROM T
WHERE status = 0
), CTE1 AS (
SELECT id,job_number,status,parent_id
FROM CTE
WHERE rn > 1
UNION ALL
SELECT t.id,t.job_number,t.status,t.parent_id
FROM CTE1 c INNER JOIN T t
ON c.id = t.parent_id
)
SELECT *
FROM CTE1
Results:
| id | job_number | status | parent_id |
|----|------------|--------|-----------|
| 2 | 42FWD-42 | 0 | 0 |
| 4 | 42FWD-42 | 1 | 2 |

How to fill ParentID column in one update statement?

My table is like this:
ID Code ParentID
-------------------
1 A01 NULL
2 B83 NULL
3 H92 NULL
15 A013 NULL
23 A018 NULL
33 A01899 NULL
44 B8329 NULL
67 B83293 NULL
What I want is to update ParentID to match the ID of the parent code.
A01 is the parent for A013
A01 is the parent for A018
A018 is the parent for A01899
and so on.
You can see the length of A01 is 3 while the child A013 length is 4, the length of A018 is 4 and the child A01899 length is 6.
I can do that with multiple update statements and repeat that for each case.
UPDATE A
SET ParentID = B.ID
FROM Table A
INNER JOIN Table B ON A.Code like B.Code + '%'
WHERE LEN(A.Code) = 4 AND LEN(B.Code) = 3
But the question is how to do that in a single update statement?
You can first find all the relevant matches - similar to what you have - but also the length of the code it matched to. Then find the one with the longest matched code.
CREATE TABLE #TableA (ID int, Code varchar(10), ParentID int);
INSERT INTO #TableA (ID, Code, ParentID)
VALUES
(1 , 'A01' , NULL),
(2 , 'B83' , NULL),
(3 , 'H92' , NULL),
(15, 'A013' , NULL),
(23, 'A018' , NULL),
(33, 'A01899', NULL),
(44, 'B8329' , NULL),
(67, 'B83293', NULL);
WITH A AS
(SELECT TA.ID, TA.ParentID, TB.ID AS TB_ID,
ROW_NUMBER() OVER (PARTITION BY TA.ID ORDER BY TB.Len_Code DESC) AS rn
FROM #TableA TA
INNER JOIN
(SELECT ID, Code, LEN(CODE) AS Len_Code
FROM #TableA
) TB ON TA.Code LIKE TB.Code + '%'
WHERE TA.ID <> TB.ID
)
UPDATE A
SET A.ParentId = A.TB_ID
WHERE A.rn = 1;
Result
ID Code ParentID
1 A01 NULL
2 B83 NULL
3 H92 NULL
15 A013 1
23 A018 1
33 A01899 23
44 B8329 2
67 B83293 44
Just another option.
Significant Digits can be a risky business in the long run. It has been my experience that they tend to have a rather short shelf-life until an exception needs to be made.
In the example below, we allow for up to three characters in distance. We apply the closest first via the coalesce()
Just to be clear, Left Join D may not be necessary if Parents are within 1 or 2 characters. Conversely, this could be expanded if needed Left Join D ...
Example
Declare #YourTable Table ([ID] int,[Code] varchar(50),[ParentID] int)
Insert Into #YourTable Values
(1,'A01',NULL)
,(2,'B83',NULL)
,(3,'H92',NULL)
,(15,'A013',NULL)
,(23,'A018',NULL)
,(33,'A01899',NULL)
,(44,'B8329',NULL)
,(67,'B83293',NULL)
;with cte as (
Select A.*
,PtNr = coalesce(B.ID,C.ID,D.ID)
From #YourTable A
Left Join #YourTable B on left(A.[Code],len(A.[Code])-1)=B.[Code]
Left Join #YourTable C on left(A.[Code],len(A.[Code])-2)=C.[Code]
Left Join #YourTable D on left(A.[Code],len(A.[Code])-3)=D.[Code]
)
Update cte set ParentID=PtNr
Select * From #YourTable
The Update Table
ID Code ParentID
1 A01 NULL
2 B83 NULL
3 H92 NULL
15 A013 1
23 A018 1
33 A01899 23
44 B8329 2
67 B83293 44

Selecting the smallest value in one column per group

I have a table that looks like the following which was created using the following code...
SELECT Orders.ID, Orders.CHECKIN_DT_TM, Orders.CATALOG_TYPE,
Orders.ORDER_STATUS, Orders.ORDERED_DT_TM, Orders.COMPLETED_DT_TM,
Min(DateDiff("n",Orders.ORDERED_DT_TM,Orders.COMPLETED_DT_TM)) AS
Time_to_complete
FROM Orders
GROUP BY Orders.ORDER_ID, Orders.ID,
Orders.CHECKIN_DT_TM, Orders.CATALOG_TYPE, Orders.ORDERED_DT_TM,
Orders.COMPLETED_DT_TM, HAVING (((Orders.CATALOG_TYPE)="radiology");
ID Time_to_complete ... .....
1 5
1 7
1 8
2 23
2 6
3 7
4 16
4 14
I'd like to add to this code which would select the smallest Time_to_complete value per subject ID. Leaving the desired table:
ID Time_to_complete ... .....
1 5
2 6
3 7
4 14
I'm using Access and prefer to continue using Access to finish this code but I do have the option to use SQL Server if this is not possible in Access. Thanks!
I suspect you need correlated subquery :
SELECT O.*, DateDiff("n", O.ORDERED_DT_TM, O.COMPLETED_DT_TM) AS Time_to_complete
FROM Orders O
WHERE DateDiff("n", O.ORDERED_DT_TM, O.COMPLETED_DT_TM) = (SELECT Min(DateDiff("n", O1.ORDERED_DT_TM, O1.COMPLETED_DT_TM))
FROM Orders O1
WHERE O1.ORDER_ID = O.ORDER_ID AND . . .
);
EDIT : If you want unique records then you can do instead :
SELECT O.*, DateDiff("n", O.ORDERED_DT_TM, O.COMPLETED_DT_TM) AS Time_to_complete
FROM Orders O
WHERE o.pk = (SELECT TOP (1) o1.pk
FROM Orders O1
WHERE O1.ORDER_ID = O.ORDER_ID AND . . .
ORDER BY DateDiff("n", O.ORDERED_DT_TM, O.COMPLETED_DT_TM) ASC
);
pk is your identity column that specifies unique entry in Orders table, so you can change it accordingly.
Have a look at this:
DECLARE #myTable AS TABLE (ID INT, Time_to_complete INT);
INSERT INTO #myTable
VALUES (1, 5)
, (1, 7)
, (1, 8)
, (2, 23)
, (2, 6)
, (3, 7)
, (4, 16)
, (4, 14);
WITH cte AS
(SELECT *
, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Time_to_complete) AS RN
FROM #myTable)
SELECT cte.ID
, cte.Time_to_complete
FROM cte
WHERE RN = 1;
Results :
ID Time_to_complete
----------- ----------------
1 5
2 6
3 7
4 14
It uses row numbers over groups, then selects the first row for each group. You should be able to adjust your code to use this technique. If in doubt wrap your entire query in a cte first then apply the technique here.
It's worth becoming familiar with this process as it gets used in a lot of places - especially around de-duping data.
Try This
DECLARE #myTable AS TABLE (ID INT, Time_to_complete INT);
INSERT INTO #myTable
VALUES (1, 5)
, (1, 7)
, (1, 8)
, (2, 23)
, (2, 6)
, (3, 7)
, (4, 16)
, (4, 14);
SELECT O.ID, O.Time_to_complete
FROM #myTable O
WHERE o.Time_to_complete = (Select min(m.Time_to_complete) FROM #myTable m
Where o.id=m.ID
);
Result :
ID Time_to_complete
1 5
2 6
3 7
4 14

Compare the most recent row with the immediate previous in the same table

I am facing this problem where I need to compare the most recent row with the immediate previous one based on the same criteria (it will be trader in this case).
Here is my table:
ID Trader Price
-----------------
1 abc 5
2 xyz 5.2
3 abc 5.7
4 xyz 5
5 abc 5.2
6 abc 6
Here is the script
CREATE TABLE Sale
(
ID int not null PRIMARY KEY ,
trader varchar(10) NOT NULL,
price decimal(2,1),
)
INSERT INTO Sale (ID,trader, price)
VALUES (1, 'abc', 5), (2, 'xyz', 5.2),
(3, 'abc', 5.7), (4, 'xyz', 5),
(5, 'abc', 5.2), (6, 'abc', 6);
So far I am working with this solution that is not perfect yet
select
a.trader,
(a.price - b.price ) New_price
from
sale a
join
sale b on a.trader = b.trader and a.id > b.ID
left outer join
sale c on a.trader = c.trader and a.id > c.ID and b.id < c.ID
where
c.ID is null
Above is not perfect because I want to compare only the most recent with the immediate previous on... In this sample for example
Trader abc : I will compare only id = 6 and id = 5
Trader xyz : id = 4 and id = 2
Thanks for any help!
If you are using SQL Server 2012 or later, you can use functions LEAD and LAG to join previous and next data. Unfortunately these function can only be used in SELECT or ORDER BY clause, so you will need to use subquery to get the data you need:
SELECT t.trader, t.current_price - t.previous_price as difference
FROM (
SELECT
a.trader,
a.price as current_price,
LAG(a.price) OVER(PARTITION BY a.trader ORDER BY a.ID) as previous_price,
LEAD(a.price) OVER(PARTITION BY a.trader ORDER BY a.ID) as next_price
FROM sale a
) t
WHERE t.next_price IS NULL
Here in your subquery you create additional columns for previous and next value. Then in your main query you filter only these rows where next price is NULL - that indicates this is the last row for the specific trader.

Tsql group by clause with exceptions

I have a problem with a query.
This is the data (order by Timestamp):
Data
ID Value Timestamp
1 0 2001-1-1
2 0 2002-1-1
3 1 2003-1-1
4 1 2004-1-1
5 0 2005-1-1
6 2 2006-1-1
7 2 2007-1-1
8 2 2008-1-1
I need to extract distinct values and the first occurance of the date. The exception here is that I need to group them only if not interrupted with a new value in that timeframe.
So the data I need is:
ID Value Timestamp
1 0 2001-1-1
3 1 2003-1-1
5 0 2005-1-1
6 2 2006-1-1
I've made this work by a complicated query, but am sure there is an easier way to do it, just cant think of it. Could anyone help?
This is what I started with - probably could work with that. This is a query that should locate when a value is changed.
> SELECT * FROM Data d1 join Data d2 ON d1.Timestamp < d2.Timestamp and
> d1.Value <> d2.Value
It probably could be done with a good use of row_number clause but cant manage it.
Sample data:
declare #T table (ID int, Value int, Timestamp date)
insert into #T(ID, Value, Timestamp) values
(1, 0, '20010101'),
(2, 0, '20020101'),
(3, 1, '20030101'),
(4, 1, '20040101'),
(5, 0, '20050101'),
(6, 2, '20060101'),
(7, 2, '20070101'),
(8, 2, '20080101')
Query:
;With OrderedValues as (
select *,ROW_NUMBER() OVER (ORDER By TimeStamp) as rn --TODO - specific columns better than *
from #T
), Firsts as (
select
ov1.* --TODO - specific columns better than *
from
OrderedValues ov1
left join
OrderedValues ov2
on
ov1.Value = ov2.Value and
ov1.rn = ov2.rn + 1
where
ov2.ID is null
)
select * --TODO - specific columns better than *
from Firsts
I didn't rely on the ID values being sequential and without gaps. If that's the situation, you can omit OrderedValues (using the table and ID in place of OrderedValues and rn). The second query simply finds rows where there isn't an immediate preceding row with the same Value.
Result:
ID Value Timestamp rn
----------- ----------- ---------- --------------------
1 0 2001-01-01 1
3 1 2003-01-01 3
5 0 2005-01-01 5
6 2 2006-01-01 6
You can order by rn if you need the results in this specific order.

Resources