mssql - retrieve unique values of a column based on another column

mssql - retrieve unique values of a column based on another column - sql-server

I have a table with two columns: ColumnA, ColumnB, with rows:
| A | 1 |
| B | 1 |
| B | 2 |
| C | 1 |
| C | 1 |
| C | 1 |
| A | 2 |
| B | 1 |
| A | 2 |
| A | 1 |
I would like to write a query that would return all unique values for ColumnB, for each unique value of ColumnA, where ColumnA has more than 1 value in ColumnB i.e.
| A | 1 |
| A | 2 |
| B | 1 |
| B | 2 |
C 1 should be omitted because there is only one distinct value for ColumnA = 'C'

There might be a simpler approach but this works:
SELECT t.ColumnA, t2.ColumnB
FROM ( select ColumnA
from dbo.TableName t
group by t.ColumnA
having count(distinct t.ColumnB) > 1) t
CROSS APPLY ( select distinct t2.ColumnB
from dbo.TableName t2
where t.ColumnA=t2.ColumnA ) t2
The first subquery returns all unique ColumnA values that have multiple (different) ColumnB values. The 2nd subquery returns all distinct ColumnB values of those ColumnA-values with CROSS APPLY.

SELECT DISTINCT * FROM x WHERE ColumnA IN(
SELECT xd.ColumnA
FROM (
SELECT DISTINCT ColumnA, ColumnB FROM x
) xd
GROUP BY xd.ColumnA HAVING COUNT(*) > 1
)
SELECT y.ColumnA, y.ColumnB
FROM (
SELECT ColumnA, ColumnB, COUNT(*) OVER (PARTITION BY ColumnA) m
FROM x
GROUP BY ColumnA, ColumnB
) y
WHERE m > 1

Related

SQL query to get value having multiple in the same table in SQL Server

Let's say I have a table with many columns like col1, col2, col3, id, variantId, col4, col5 etc
However I am only interested in id, variantId which look like this:
+----------+-----------+
| id | variantId |
+----------+-----------+
| a | 11 |
| a | 12 |
| b | 31 |
| c | 41 |
| c | 54 |
| d | abc |
| e | xyz |
| e | xyz |
+----------+-----------+
I need distinct ids which having count of distinct variantId more than once
In this case I would only get a and c

You can use group by and having:
select id
from t
group by id
having min(variant_id) <> max(variant_id);
You can also use:
having count(distinct variant_id) > 1

Try with group by having clause
select id
from table
group by id
having count(distinct variant_id) > 1

You can do it more efficiently with EXISTS:
select distinct t.id
from tablename t
where exists (
select 1 from tablename
where id = t.id and variantid <> t.variantid
)

Return column names based on which holds the maximum value in the record

I have a table with the following structure ...
+--------+------+------+------+------+------+
| ID | colA | colB | colC | colD | colE | [...] etc.
+--------+------+------+------+------+------+
| 100100 | 15 | 100 | 90 | 80 | 10 |
+--------+------+------+------+------+------+
| 100200 | 10 | 80 | 90 | 100 | 10 |
+--------+------+------+------+------+------+
| 100300 | 100 | 90 | 10 | 10 | 80 |
+--------+------+------+------+------+------+
I need to return a concatenated value of column names which hold the maximum 3 values per row ...
+--------+----------------------------------+
| ID | maxCols |
+--------+----------------------------------+
| 100100 | colB,colC,colD |
+--------+------+------+------+------+------+
| 100200 | colD,colC,colB |
+--------+------+------+------+------+------+
| 100300 | colA,colB,colE |
+--------+------+------+------+------+------+
It's okay to not concatenate the column names, and have maxCol1 | maxCol2 | maxCol3 if that's simpler
The order of the columns is important when concatenating them
The number of columns is limited and not dynamic
The number of rows is many

You could use UNPIVOT and get TOP 3 for each ID
;with temp AS
(
SELECT ID, ColValue, ColName
FROM #SampleData sd
UNPIVOT
(
ColValue For ColName in ([colA], [colB], [colC], [colD], [colE])
) unp
)
SELECT sd.ID, ca.ColMax
FROM #SampleData sd
CROSS APPLY
(
SELECT STUFF(
(
SELECT TOP 3 WITH TIES
',' + t.ColName
FROM temp t
WHERE t.ID = sd.ID
ORDER BY t.ColValue DESC
FOR XML PATH('')
)
,1,1,'') AS ColMax
) ca
See demo here: http://rextester.com/CZCPU51785

Here is one trick to do it using Cross Apply and Table Valued Constructor
SELECT Id,
maxCols= Stuff(cs.maxCols, 1, 1, '')
FROM Yourtable
CROSS apply(SELECT(SELECT TOP 3 ',' + NAME
FROM (VALUES (colA,'colA'),(colB,'colB'),(colC,'colC'),
(colD,'colD'),(colE,'colE')) tc (val, NAME)
ORDER BY val DESC
FOR xml path, type).value('.[1]', 'nvarchar(max)')) cs (maxCols)
If needed it can be made dynamic using Information_schema.Columns

Update several columns with latest values from another table

Here's the data:
[ TABLE_1 ]
id | prod1 | date1 | prod2 | date2 | prod3 | date3 |
---|--------|--------|--------|--------|--------|-------|
1 | null | null | null | null | null | null |
2 | null | null | null | null | null | null |
3 | null | null | null | null | null | null |
[ TABLE_2 ]
id | date | product |
-----|-------------|-----------|
1 | 20140101 | X |
1 | 20140102 | Y |
1 | 20140103 | Z |
2 | 20141201 | data |
2 | 20141201 | Y |
2 | 20141201 | Z |
3 | 20150101 | data2 |
3 | 20150101 | data3 |
3 | 20160101 | X |
Both tables have other columns not listed here.
date is formatted: yyyymmdd and datatype is int.
[ TABLE_2 ] doesn't have empty rows, just tried to make sample above more readable.
Here's the Goal:
I need to update [ TABLE_1 ] prod1,date1,prod2,date2,prod3,date3
with product collected from [ TABLE_2 ] with corresponding date values.
Data must be sorted so that "latest" product becomes prod1,
2nd latest product will be prod2 and 3rd is prod3.
Latest product = biggest date (int).
If dates are equal, order doesn't matter. (see id=2 and id=3).
Updated [ TABLE_1 ] should be:
id | prod1 | date1 | prod2 | date2 | prod3 | date3 |
---|--------|----------|--------|----------|--------|----------|
1 | Z | 20140103 | Y | 20140102 | X | 20140101 |
2 | data | 20141201 | Y | 20141201 | Z | 20141201 |
3 | X | 20160101 | data2 | 20150101 | data3 | 20150101 |
Ultimate goal is to get the following :
[ TABLE_3 ]
id | order1 | order2 | order3 | + Columns from [ TABLE_1 ]
---|--------------------|----------------------|------------|--------------------------
1 | 20140103:Z | 20140102:Y | 20140103:Z |
2 | 20141201:data:Y:Z | NULL | NULL |
3 | 20160101:X | 20150101:data2:data3 | NULL |
I have to admit this exceeds my knowledge and I haven't tried anything.
Should I do it with JOIN or SELECT subquery?
Should I try to make it in one SQL -clause or perhaps in 3 steps,
each prod&date -pair at the time ?
What about creating [ TABLE_3 ] ?
It has to have columns from [ TABLE_1 ].
Is it easiest to create it from [ TABLE_2 ] -data or Updated [ TABLE_1 ] ?
Any help would be highly appreciated.
Thanks in advance.
I'll post some of my own shots on comments.

After looking into it (after my comment), a stored procedure would be best, that you can call to view the data as a pivot, and do away with TABLE_1. Obviously if you need to make this dynamic, you'll need to look into dynamic pivots, it's a bit of a hack with CTEs:
CREATE PROCEDURE DBO.VIEW_AS_PIVOTED_DATA
AS
;WITH CTE AS (
SELECT ID, [DATE], 'DATE' + CAST(ROW_NUMBER() OVER(PARTITION BY ID ORDER BY [DATE] DESC) AS VARCHAR) AS [RN]
FROM TABLE_2)
, CTE2 AS (
SELECT ID, PRODUCT, 'PROD' + CAST(ROW_NUMBER() OVER(PARTITION BY ID ORDER BY [DATE] DESC) AS VARCHAR) AS [RN]
FROM TABLE_2)
, CTE3 AS (
SELECT ID, [DATE1], [DATE2], [DATE3]
FROM CTE
PIVOT(MAX([DATE]) FOR RN IN ([DATE1],[DATE2],[DATE3])) PIV)
, CTE4 AS (
SELECT ID, [PROD1], [PROD2], [PROD3]
FROM CTE2
PIVOT(MAX(PRODUCT) FOR RN IN ([PROD1],[PROD2],[PROD3])) PIV)
SELECT A.ID, [PROD1], [DATE1], [PROD2], [DATE2], [PROD3], [DATE3]
FROM CTE3 AS A
JOIN CTE4 AS B
ON A.ID=B.ID

Construction:
WITH ranked AS (
SELECT [id]
,[date]
,[product]
,row_number() over (partition by id order by date desc) rn
FROM [sistemy].[dbo].[TABLE_2]
)
SELECT id, [prod1],[date1],[prod2],[date2],[prod3],[date3]
FROM
(
SELECT id, type+cast(rn as varchar(1)) col, value
FROM ranked
CROSS APPLY
(
SELECT 'date', CAST([date] AS varchar(8))
UNION ALL
SELECT 'prod', product
) ca(type, value)
) unpivoted
PIVOT
(
max(value)
for col IN ([prod1],[date1],[prod2],[date2],[prod3],[date3])
) pivoted
You need to take a few steps to achive the aim.
Rank your products by date:
SELECT [id]
,[date]
,[product]
,row_number() over (partition by id order by date desc) rn
FROM [sistemy].[dbo].[TABLE_2]
Unpivot your date and product columns into one column. You can use UNPIVOT OR CROSS APPLY statements. I prefer CROSS APPLY
SELECT id, type+cast(rn as varchar(1)) col, value
FROM ranked
CROSS APPLY
(
SELECT 'date', CAST([date] AS varchar(8))
UNION ALL
SELECT 'prod', product
) ca(type, value)
or the same result using UNPIVOT
SELECT id, type+cast(rn as varchar(1)) col, value
FROM (
SELECT [id],
rn,
CAST([date] AS varchar(500)) date,
CAST([product] AS varchar(500)) prod
FROM ranked) t
UNPIVOT
(
value FOR type IN (date, product)
) unpvt
and at last you use PIVOTE and get a result.

SUM On Column With Group By SQL

I have following data:
+----------------+--------------+-----+
| StgDescription | ID | Amt |
+----------------+--------------+-----+
| A | OA17 | 11 |
| A | OA17 | 11 |
| A | OA17 | 11 |
| A | OA17 | 11 |
| B | ZA47/ A | 12 |
| B | ZA47/ A | 12 |
| B | ZA47/ B | 10 |
| B | ZA47/ B | 10 |
| B | ZA48/ A | 14 |
| B | ZA48/ F | 10 |
| B | ZA48 /G | 13 |
| B | ZA48 /H | 10 |
| B | ZA48/ I | 15 |
| B | ZA48/ J | 10 |
| B | ZA48/ K | 16 |
| B | ZA48/ L | 10 |
| c | FA01LM100340 | 10 |
| c | PA53 AE | 10 |
+----------------+--------------+-----+
I want to generate report in following format. The amount should be sum for ID for same StgDescription.
+----------------+-----+
| StgDescription | Amt |
+----------------+-----+
| a | 11 |
| b | 120 |
| c | 20 |
+----------------+-----+
I've written following query to get this result:
WITH CTE AS(
SELECT
distinct
s.StgDescription
,p.ID
,Amt
FROM [DinDb].[dbo].[tblTvlTransaction] t
JOIN tblstgmaster s on t.StgId=s.StgId
JOIN tblProjDocSt p on t.TDocID=p.DocId
JOIN [PdasDb].[dbo].[tblIDmaster] f ON p.ID=f.ID
where OptAuthoDateTime between '2015-07-27 00:00:00' and '2015-09-01 00:00:00')
select StgDescription,sum(AMT) from cte group by StgDescription
Is there any other efficient alternative to do this?

First in cte remove duplicates, then GROUP BY like:
WITH cte AS (
SELECT DISTINCT StgDescription, ID, Amt
FROM your_tab
)
SELECT
StgDescription,
Amt = SUM(Amt)
FROM cte
GROUP BY StgDescription;
OR:
WITH cte AS (
SELECT StgDescription, ID, Amt
FROM your_tab
GROUP BY StgDescription, ID, Amt
)
SELECT
StgDescription,
Amt = SUM(Amt)
FROM cte
GROUP BY StgDescription;

I hope that you get the data from a query, not from a table. It would not be good to store data thus redundantly. And it would not be gould to name a column ID which is not the unique identifier for a row in a table.
Your problem with the data is that you have duplicates, which prevents you from getting the sum directly. So use DISTINCT to make your data unique first.
If this data is from a query then simply add DISTINCT after the SELECT keyword. If not, use a derived table (i.e. a subquery) where you select distinct records from the table.
select stgdescription, sum(amt)
from
(
select distinct stgdescription, id, amt
from mydata
) distinct_data
group by stgdescription;
You may want to replace stgdescription with lower(stgdescription), though, if stgdescription can be 'A' or 'a' and you want to treat them the same.

I'd keep it as simple as possible, like this:
select StgDescription, sum(Amt) from
(
select distinct StgDescription, ID, Amt from tablename
) a
group by StgDescription
Hope it helps!

I suspect your duplicates are coming from [tblTvlTransaction], therefore, I would remove this table as a JOIN and use EXISTS to just check a record is there. So essentially the only tables in the FROM clause are those you actually need data from:
SELECT s.StgDescription, p.ID, s.Amt
FROM tblstgmaster AS s
INNER JOIN tblProjDocSt p on
t.TDocID = p.DocId
INNER JOIN [PdasDb].[dbo].[tblIDmaster] AS f
ON p.ID = f.ID
WHERE EXISTS
( SELECT 1
FROM [DinDb].[dbo].[tblTvlTransaction] AS t
WHERE t.OptAuthoDateTime BETWEEN '2015-07-27 00:00:00' AND '2015-09-01 00:00:00'
AND t.StgId = s.StgId
);
The advantage of EXISTS is that it can use a semi-join, which essentially means rather than pulling back all the rows from the transaction table, it will stop the seek/scan as soon as it finds one matching record. This should leave you without duplicates so you can do the SUM directly:
SELECT s.StgDescription, Amount = SUM(s.Amt)
FROM tblstgmaster AS s
INNER JOIN tblProjDocSt p on
t.TDocID = p.DocId
INNER JOIN [PdasDb].[dbo].[tblIDmaster] AS f
ON p.ID = f.ID
WHERE EXISTS
( SELECT 1
FROM [DinDb].[dbo].[tblTvlTransaction] AS t
WHERE t.OptAuthoDateTime BETWEEN '2015-07-27 00:00:00' AND '2015-09-01 00:00:00'
AND t.StgId = s.StgId
)
GROUP BY s.StgDescription;

Sort String Containing number

Current OutPut:
PM_DIG_OUTPUT_1_CLOSED
PM_DIG_OUTPUT_10_CLOSED
PM_DIG_OUTPUT_14_CLOSED
PM_DIG_OUTPUT_15_CLOSED
PM_DIG_OUTPUT_16_CLOSED
PM_DIG_OUTPUT_2_CLOSED
PM_DIG_OUTPUT_3_CLOSED
Expected Output:
PM_DIG_OUTPUT_1_CLOSED
PM_DIG_OUTPUT_2_CLOSED
PM_DIG_OUTPUT_3_CLOSED
PM_DIG_OUTPUT_10_CLOSED
PM_DIG_OUTPUT_14_CLOSED
PM_DIG_OUTPUT_15_CLOSED
PM_DIG_OUTPUT_16_CLOSED
Index of Number is not fixed
What is the best way to achieve this order?
EDIT:
Some records also contain following data
PRM_CODE_MIO_DIGITALOUT_WRITE_LOGIC_CARD2_DO1
PRM_CODE_MIO_DIGITALOUT_WRITE_LOGIC_CARD2_DO2
PRM_CODE_MIO_DIGITALOUT_WRITE_LOGIC_CARD2_DO3
PRM_CODE_MIO_DIGITALOUT_WRITE_LOGIC_CARD2_DO4

Query:
SQLFIDDLEExample
SELECT *
FROM Table1 t1
ORDER BY CAST(REPLACE(REPLACE(col, 'PM_DIG_OUTPUT_', ''),'_CLOSED', '') AS int)
Result:
| COL |
|-------------------------|
| PM_DIG_OUTPUT_1_CLOSED |
| PM_DIG_OUTPUT_2_CLOSED |
| PM_DIG_OUTPUT_3_CLOSED |
| PM_DIG_OUTPUT_10_CLOSED |
| PM_DIG_OUTPUT_14_CLOSED |
| PM_DIG_OUTPUT_15_CLOSED |
| PM_DIG_OUTPUT_16_CLOSED |
EDITED ANSWER
You could use query:
SQLFIDDLEExample
SELECT *
FROM Table1 t1
ORDER BY CASE WHEN LEFT(col, 2) = 'PM'
THEN CAST(REPLACE(REPLACE(col, 'PM_DIG_OUTPUT_', ''),'_CLOSED', '') AS int)
ELSE RIGHT(col,1) END
Result:
| COL |
|-----------------------------------------------|
| PM_DIG_OUTPUT_1_CLOSED |
| PRM_CODE_MIO_DIGITALOUT_WRITE_LOGIC_CARD2_DO1 |
| PRM_CODE_MIO_DIGITALOUT_WRITE_LOGIC_CARD2_DO2 |
| PM_DIG_OUTPUT_2_CLOSED |
| PM_DIG_OUTPUT_3_CLOSED |
| PRM_CODE_MIO_DIGITALOUT_WRITE_LOGIC_CARD2_DO3 |
| PRM_CODE_MIO_DIGITALOUT_WRITE_LOGIC_CARD2_DO4 |
| PM_DIG_OUTPUT_10_CLOSED |
| PM_DIG_OUTPUT_14_CLOSED |
| PM_DIG_OUTPUT_15_CLOSED |
| PM_DIG_OUTPUT_16_CLOSED |

Assuming that after the number there's only one underscore and some text (then no matter what is before the number, if it's at least one underscore and some text).
Edit for new values. It will sort by the numbers if it can find them and won't break if no numbers were found by matching the pattern:
select
*,
substring(y,len(y)-charindex('_',reverse(y))+2,100) as num
from (
select
*,
substring(x,1,len(x)-charindex('_',reverse(x))) as y
from (
select 'PM_DIG_OUTPUT_1_CLOSED' as x union all
select 'PM_DIG_OUTPUT_10_CLOSED' union all
select 'PM_DIG_OUTPUT_14_CLOSED' union all
select 'PM_DIG_OUTPUT_15_CLOSED' union all
select 'PM_DIG_OUTPUT_16_CLOSED' union all
select 'PM_DIG_OUTPUT_2_CLOSED' union all
select 'PM_DIG_OUTPUT_3_CLOSED' union all
select 'PRM_CODE_MIO_DIGITALOUT_WRITE_LOGIC_CARD2_DO1' union all
select 'PRM_CODE_MIO_DIGITALOUT_WRITE_LOGIC_CARD2_DO2' union all
select 'PRM_CODE_MIO_DIGITALOUT_WRITE_LOGIC_CARD2_DO3' union all
select 'PRM_CODE_MIO_DIGITALOUT_WRITE_LOGIC_CARD2_DO4'
) x
) y
order by
case when isnumeric(substring(y,len(y)-charindex('_',reverse(y))+2,100))=1 then cast(substring(y,len(y)-charindex('_',reverse(y))+2,100) as int) end

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

mssql - retrieve unique values of a column based on another column - sql-server

SELECT DISTINCT * FROM x WHERE ColumnA IN( SELECT xd.ColumnA FROM ( SELECT DISTINCT ColumnA, ColumnB FROM x ) xd GROUP BY xd.ColumnA HAVING COUNT() > 1 ) SELECT y.ColumnA, y.ColumnB FROM ( SELECT ColumnA, ColumnB, COUNT() OVER (PARTITION BY ColumnA) m FROM x GROUP BY ColumnA, ColumnB ) y WHERE m > 1

Related

SQL query to get value having multiple in the same table in SQL Server

Return column names based on which holds the maximum value in the record

Update several columns with latest values from another table

SUM On Column With Group By SQL

Sort String Containing number

Categories

Resources