Select unique rows with where in clause (SQL Server) - sql-server

I am trying to return one json per unique value in a column.
However, when I try to select from a sub-query the entire result set is returned.
Example Table
| json | column |
|--------|--------|
| {obj1} | 1 |
| {obj2} | 1 |
| {obj3} | 2 |
Example Query
select distinct
[json]
from someTable
where [column] in (
select distinct
[column]
from someTable
)
Actual Output
| json |
|--------|
| {obj1} |
| {obj2} |
| {obj3} |
Expected Output
| json |
|--------|
| {obj1} |
| {obj3} |
How can I select just 1 json per unique column value?

If doesn't Matter output Related To Column is obj1 Or obj2 you must Use Group By Like this :
SELECT
Min(JSon) AS Json ,Column
FROM
someTable
Group By column

If you want the 1st [json] of each column then use row_number():
select t.[json] from (
select
[json],
row_number() over (partition by column order by id) rn
from someTable
) t
where t.rn = 1

Related

How to identify valid records based on column values in snowflake

I have a table as below
I want output like below
This means I have few predefined pairs, example
if one employee is coming from both HR_INTERNAL and HR_EXTERNAL, take only that record which is from HR_INTERNAL
if one employee is coming from both SALES_INTERNAL and SALES_EXTERNAL, take only that record which is from SALES_INTERNAL
etc.
Is there a way to achieve this?
I used ROW_NUMBER to rank
ROW_NUMBER() OVER(PARTITION BY "EMPID" ORDER BY SOURCESYSTEM ASC) AS RANK_GID
I just put them on a table like this:
create or replace table predefined_pairs ( pairs ARRAY );
insert into predefined_pairs select [ 'HR_INTERNAL', 'HR_EXTERNAL' ] ;
insert into predefined_pairs select [ 'SALES_INTERNAL', 'SALES_EXTERNAL' ] ;
Then I use the following query to produce the output you wanted:
select s.sourcesystem, s.empid,
CASE WHEN COUNT(1) OVER(PARTITION BY EMPID) = 1 THEN 'ValidRecord'
WHEN p.pairs[0] IS NULL THEN 'ValidRecord'
WHEN p.pairs[0] = s.sourcesystem THEN 'ValidRecord'
ELSE 'InvalidRecord'
END RecordValidity
from source s
left join predefined_pairs p on array_contains( s.sourcesystem::VARIANT, p.pairs ) ;
+-------------------+--------+----------------+
| SOURCESYSTEM | EMPID | RECORDVALIDITY |
+-------------------+--------+----------------+
| HR_INTERNAL | EMP001 | ValidRecord |
| HR_EXTERNAL | EMP001 | InvalidRecord |
| SALES_INTERNAL | EMP002 | ValidRecord |
| SALES_EXTERNAL | EMP002 | InvalidRecord |
| HR_EXTERNAL | EMP004 | ValidRecord |
| SALES_INTERNAL | EMP005 | ValidRecord |
| PURCHASE_INTERNAL | EMP003 | ValidRecord |
+-------------------+--------+----------------+

SQL query to get value having multiple in the same table in SQL Server

Let's say I have a table with many columns like col1, col2, col3, id, variantId, col4, col5 etc
However I am only interested in id, variantId which look like this:
+----------+-----------+
| id | variantId |
+----------+-----------+
| a | 11 |
| a | 12 |
| b | 31 |
| c | 41 |
| c | 54 |
| d | abc |
| e | xyz |
| e | xyz |
+----------+-----------+
I need distinct ids which having count of distinct variantId more than once
In this case I would only get a and c
You can use group by and having:
select id
from t
group by id
having min(variant_id) <> max(variant_id);
You can also use:
having count(distinct variant_id) > 1
Try with group by having clause
select id
from table
group by id
having count(distinct variant_id) > 1
You can do it more efficiently with EXISTS:
select distinct t.id
from tablename t
where exists (
select 1 from tablename
where id = t.id and variantid <> t.variantid
)

Find records of nearest date SQL

I have a table dbo.X with DateTime column lastUpdated and a code product column CodeProd which may have hundreds of records, with CodeProd duplicated because the table is used as "stock history"
My Stored Procedure has parameter #Date, I want to get all CodeProd nearest to that date so for example if I have:
+----------+--------------+--------+
| CODEPROD | lastUpdated | STATUS |
+----------+--------------+--------+
| 10 | 2-1-2019 | C1 |
| 10 | 1-1-2019 | C2 |
| 10 | 31-12-2019 | C1 |
| 11 | 31-12-2018 | C1 |
| 11 | 30-12-2018 | C1 |
| 12 | 30-8-2018 | C3 |
+----------+--------------+--------+
and #Date= '1-1-2019'
I wanna get:
+----+--------------+------+
| 10 | 1-1-2019 | C2 |
| 11 | 31-12-2018 | C1 |
| 12 | 30-8-2018 | C3 |
+----+--------------+------+
How to find it?
You can use TOP(1) WITH TIES to get one row with nearest date for each CODEPROD which should be less than provided date.
Try like following code.
SELECT TOP(1) WITH TIES *
FROM [YourTableName]
WHERE lastupdated <= #date
ORDER BY Row_number()
OVER (
partition BY [CODEPROD]
ORDER BY lastupdated DESC);
You can use apply :
select distinct t.CODEPROD, t1.lastUpdated, t1.STATUS
from table t cross apply
( select top (1) t1.*
from table t1
where t1.CODEPROD = t.CODEPROD and t1.lastUpdated <= #date
order by t1.lastUpdated desc
) t1;

Where clause if there are multiple of the same ID

I have following table:
ID | source | Name | Age | ... | ...
1 | SQL | John | 18 | ... | ...
2 | SAP | Mike | 21 | ... | ...
2 | SQL | Mike | 20 | ... | ...
3 | SAP | Jill | 25 | ... | ...
I want to have one record for each ID. The idea behind this is that if the ID comes only once (no matter the Source), that record will be taken. But, If there are 2 records for one ID, the one containing SQL as source will be the used record here.
So, In this case, the result will be:
ID | source | Name | Age | ... | ...
1 | SQL | John | 18 | ... | ...
2 | SQL | Mike | 20 | ... | ...
3 | SAP | Jill | 25 | ... | ...
I did this with a partition over (ordered by Source desc), but that wouldn't work well if a third source will be added one day.
Any other options/ideas?
The easiest approach(in my opinion) is using a CTE with a ranking function:
with cte as
(
select ID, source, Name, Age, ... ,
rn = row_number() over (partition by ID order by case when source = 'sql'
then 0 else 1 end asc)
from dbo.tablename
)
select ID, source, Name, Age, ...
from cte
where rn = 1
You can use ROW_NUMBER:
WITH CTE AS
(
SELECT *,
RN = ROW_NUMBER() OVER( PARTITION BY ID
ORDER BY CASE WHEN [Source] = 'SQL' THEN 1 ELSE 2 END)
FROM dbo.YourTable
)
SELECT *
FROM CTE
WHERE RN = 1;
You can use the WITH TIES clause and the window function Row_Number()
Select Top 1 With Ties *
From YourTable
Order By Row_Number() over (Partition By ID Order By Case When Source = 'SQL' Then 0 Else 1 End)
How about
SELECT *
FROM table
WHERE ID in (
SELECT ID FROM test
group by ID
having count(ID) = 1)
OR source = 'SQL'

Removing lines from a query

I have the following table:
OrderID | OldOrderID | Action | EntryDate | Source
1 | NULL | Insert | 2016-01-12| A
1 | NULL | Remove | 2016-01-13| A
2 | NULL | Insert | 2016-01-12| B
3 | NULL | Insert | 2016-01-12| C
4 | 3 | Insert | 2016-01-13| C
4 | NULL | Remove | 2016-01-14| C
I want to query all orders that are currently active orders - they dont have the action remove. Currently I do it with this query :
WITH Active AS
(
SELECT *, rn = ROW_NUMBER()
OVER (PARTITION BY OrderID,Source ORDER BY EntryDate DESC)
FROM Orders
)
SELECT *
FROM Active WHERE [Action] <> 'Remove' AND rn = 1;
The problem is that some orders get child orders (OrderID 3 gets a child OrderID 4) and if a child ever gets the Action Remove the query should also ignore the parent, but with the current query it dosent.
In short the current query gets me this result:
OrderID | OldOrderID | Action | EntryDate | Source
2 | NULL | Insert | 2016-01-12| B
3 | NULL | Insert | 2016-01-12| C
But I need this result:
OrderID | OldOrderID | Action | EntryDate | Source
2 | NULL | Insert | 2016-01-12| B
Is it possible to fix the query to get a result like this?
Try this:
;WITH CTE AS (
SELECT OrderID, OldOrderID, Action, EntryDate, Source,
COUNT(CASE WHEN Action = 'Remove' THEN 1 END)
OVER (PARTITION BY OrderID) AS IsRemoved,
ROW_NUMBER() OVER (PARTITION BY OrderID ORDER BY EntryDate) AS rn
FROM Orders
)
SELECT c1.*
FROM CTE AS c1
LEFT JOIN CTE AS c2 ON c1.OrderID = c2.OldOrderID AND c2.IsRemoved >= 1
WHERE c1.rn = 1 AND c1.IsRemoved = 0 AND c2.IsRemoved IS NULL
The above query uses COUNT() OVER() in order to count the number of occurrences of Action = 'Remove' within each OrderID partition. Hence, a value of IsRemoved that is equal to or greater than 1 identifies a 'removed' order.
I also asked the question on dba stackexchange and got the following answer, which works well.

Resources