sql selection of one value from several identical

sql selection of one value from several identical - sql-server

I have the result of executing a query. it collects data from several tables. he is such a:
|Name|date |number|Id
|alex|01-01-2021 |1111 | 1
|mike|01-01-2021 |2222 | 2
|alex|02-01-2021 |1111 | 3
|alex|03-01-2021 |1111 | 4
|john|04-01-2021 |3333 | 5
i need to get the following result:
|Name|date |number| Id
|mike|01-01-2021|2222 | 2
|alex|any value |1111 | Any value
|john|04-01-2021|3333 | 5
I need to select one of the repeated values and show it.I have a large query with many columns. here I gave only a short version to explain the essence of the problem

select Name,max(date) as date,number
from atable
group by Name, number

You may use this CTE and manage which date (first or last) you will get
WITH data AS (
SELECT
Name,
date,
number,
row_number() OVER (PARTITION BY Name ORDER BY date) AS row_num
FROM test01
)
SELECT
Name,
date,
number
FROM data
WHERE row_num = 1

Related

PostgreSQL - Filtering result set by array column

I have a function which returns a table. One of the columns happens to be a text array. Currently the values in this array column will only ever have at most 2 elements, however there is the instance when the same row will be returned twice with the duplicate row elements in the opposite order. I'm hoping to find a way to only return 1 of these rows and discard the other. To give an example, I run a function as
SELECT * FROM schema.function($1,$2,$3)
WHERE conditions...;
which returns me something like this
ID | ARRAY_COL | ...
1 | {'Good','Day'} | ...
2 | {'Day','Good'} | ...
3 | {'Stuck'} | ...
4 | {'with'} | ...
5 | {'array'} | ...
6 | {'filtering'} | ...
So in this example, I want to return the whole result set with the exception that I only want either row 1 or 2 as they have the same elements in the array (albeit inverted with respect to each other). I'm aware this is probably a bit of a messy problem, but it's something I need to get to the bottom of. Ideally I would like to stick a WHERE clause at the end of my function call which forced the result set to ignore any array value that had the same elements as a previous row. Pseudo code might be something like
SELECT * FROM schema.function($1,$2,$3)
WHERE NOT array_col #> (any previous array_col value);
Any pointers in the right direction would be much appreciated, thanks.

Not sure is the best solution, but could work especially in cases where you might have partially overlapping arrays.
The solution is in 3 steps:
unnest the array_col column and order by id and the item value
select
id,
unnest(array_col) array_col_val
from dataset
order by
id,
array_col_val
regroup by id, now row with id 1 and 2 have the same array_col value
select id,
array_agg(array_col_val) array_col
from ordering
group by id
Select the min id grouping by array_col
select array_col, min(id) from regrouping group by array_col
Full statement
with dataset as (
select 1 id, ARRAY['Good','Day'] array_col UNION ALL
select 2 id, ARRAY['Day','Good'] array_col UNION ALL
select 3 id, ARRAY['Stuck'] array_col UNION ALL
select 4 id, ARRAY['with'] array_col UNION ALL
select 5 id, ARRAY['array_colay'] array_col UNION ALL
select 6 id, ARRAY['filtering'] array_col
)
, ordering as
(select id,
unnest(array_col) array_col_val
from dataset
order by id,
array_col_val)
, regrouping as
(
select id,
array_agg(array_col_val) array_col
from ordering
group by id
)
select array_col, min(id) from regrouping group by array_col;
Result
array_col | min
---------------+-----
{Stuck} | 3
{array_colay} | 5
{filtering} | 6
{with} | 4
{Day,Good} | 1
(5 rows)

Using STRING_SPLIT for 2 columns in a single table

I've started from a table like this
ID | City | Sales
1 | London,New York,Paris,Berlin,Madrid| 20,30,,50
2 | Istanbul,Tokyo,Brussels | 4,5,6
There can be an unlimited amount of cities and/or sales.
I need to get each city and their salesamount their own record. So my result should look something like this:
ID | City | Sales
1 | London | 20
1 | New York | 30
1 | Paris |
1 | Berlin | 50
1 | Madrid |
2 | Istanbul | 4
2 | Tokyo | 5
2 | Brussels | 6
What I got so far is
SELECT ID, splitC.Value, splitS.Value
FROM Table
CROSS APLLY STRING_SPLIT(Table.City,',') splitC
CROSS APLLY STRING_SPLIT(Table.Sales,',') splitS
With one cross apply, this works perfectly. But when executing the query with a second one, it starts to multiply the number of records a lot (which makes sense I think, because it's trying to split the sales for each city again).
What would be an option to solve this issue? STRING_SPLIT is not neccesary, it's just how I started on it.

STRING_SPLIT() is not an option, because (as is mentioned in the documantation) the output rows might be in any order and the order is not guaranteed to match the order of the substrings in the input string.
But you may try with a JSON-based approach, using OPENJSON() and string transformation (comma-separated values are transformed into a valid JSON array - London,New York,Paris,Berlin,Madrid into ["London","New York","Paris","Berlin","Madrid"]). The result from the OPENJSON() with default schema is a table with columns key, value and type and the key column is the 0-based index of each item in this array:
Table:
CREATE TABLE Data (
ID int,
City varchar(1000),
Sales varchar(1000)
)
INSERT INTO Data
(ID, City, Sales)
VALUES
(1, 'London,New York,Paris,Berlin,Madrid', '20,30,,50'),
(2, 'Istanbul,Tokyo,Brussels', '4,5,6')
Statement:
SELECT d.ID, a.City, a.Sales
FROM Data d
CROSS APPLY (
SELECT c.[value] AS City, s.[value] AS Sales
FROM OPENJSON(CONCAT('["', REPLACE(d.City, ',', '","'), '"]')) c
LEFT OUTER JOIN OPENJSON(CONCAT('["', REPLACE(d.Sales, ',', '","'), '"]')) s
ON c.[key] = s.[key]
) a
Result:
ID City Sales
1 London 20
1 New York 30
1 Paris
1 Berlin 50
1 Madrid NULL
2 Istanbul 4
2 Tokyo 5
2 Brussels 6

STRING_SPLIT has no context of what oridinal positions are. In fact, the documentation specifically states that it doesn't care about it:
The order of the output may vary as the order is not guaranteed to match the order of the substrings in the input string.
As a result, you need to use something that is aware of such basic things, such as DelimitedSplit8k_LEAD.
Then you can do something like this:
WITH Cities AS(
SELECT ID,
DSc.Item,
DSc.ItemNumber
FROM dbo.YourTable YT
CROSS APPLY dbo.DelimitedSplit8k_LEAD(YT.City,',') DSc)
Sales AS(
SELECT ID,
DSs.Item,
DSs.ItemNumber
FROM dbo.YourTable YT
CROSS APPLY dbo.DelimitedSplit8k_LEAD(YT.Sales,',') DSs)
SELECT ISNULL(C.ID,S.ID) AS ID,
C.Item AS City,
S.Item AS Sale
FROM Cities C
FULL OUTER JOIN Sales S ON C.ItemNumber = S.ItemNumber;
Of course, however, the real solution is fix your design. This type of design is going to only cause you 100's of problems in the future. Fix it now, not later; you'll reap so many rewards sooner the earlier you do it.

SQL: doing several joins to evaluate different columns

I'm quite new to SQL but use it a lot now in my work now (Microsoft SQL Server).
So the issue is this: I collect data that is atypical for a certain column.
Let's say I got different Burgers and they should have a standardized calories value. So I did this with a query
------------------------------------------
| Burger | calories | numBurgers | Rank |
------------------------------------------
| Chicken| 600 | 20 | 1 |
| Chicken| 400 | 3 | 2 |
| Beef | 700 | 35 | 1 |
| Beef | 850 | 4 | 2 |
-------------------------------------------
To get a list of all the "wrong" burgers I use a temporary table and filter out GroupRank = 1
USE database;
GO
WITH GapRanking AS
(
SELECT TOP 100 PERCENT Burger, calories, COUNT(calories),
ROW_NUMBER() OVER(PARTITION BY Burger ORDER BY COUNT(calories) DESC) AS Rank
)
SELECT * FROM GapRanking
WHERE Rank <> 1
...
I get all combinations of Burgers and calories that are not "standard"
Then I do an Inner Join with the original table and all columns on the one above.
SELECT * FROM BaseTable as base
INNER JOIN
(SELECT * FROM GapRanking
WHERE Rank <> 1) AS err
ON (base.Burgers = err.Burgers
AND base.calories = err.calories)
This way I get a table with complete information about the "not-standard" burgers. So far so good.
Now I want to add other rows where there is a deviation in another criteria, price for example, not just calories and add it to the list if its not already there.
So I thought of UNION or JOIN.
So what is the best approach. UNION the above query with the same query just different column (price instead of calories)?
Or do a JOIN with the same query just different column (price instead of calories)?
The code gets quite "ugly" and I'm not sure if I do the right approach here.
Also because of me using the temporary table using WITH a UNION does not seem possible so easily.
I'm really glad for any ideas here. Cheers

use sub-query and join below is just sudo-code not actual you can follow like this way
select t1.*, t2.required_colum
(SELECT TOP 100 PERCENT Burger, calories, COUNT(calories),
ROW_NUMBER() OVER(PARTITION BY Burger ORDER BY COUNT(calories) DESC) AS Rank
) as t1
join
(SELECT TOP 100 PERCENT Burger, calories, COUNT(calories),
ROW_NUMBER() OVER(PARTITION BY Burger ORDER BY COUNT(calories) DESC) AS Rank
) as t2
on t1.colname = t2.colname
where t1.Rank != 1 and t2.Rank != 1

TSQL Conditional Where or Group By?

I have a table like the following:
id | type | duedate
-------------------------
1 | original | 01/01/2017
1 | revised | 02/01/2017
2 | original | 03/01/2017
3 | original | 10/01/2017
3 | revised | 09/01/2017
Where there may be either one or two rows for each id. If there are two rows with same id, there would be one with type='original' and one with type='revised'. If there is one row for the id, type will always be 'original'.
What I want as a result are all the rows where type='revised', but if there is only one row for a particular id (thus type='original') then I want to include that row too. So desired output for the above would be:
id | type | duedate
1 | revised | 02/01/2017
2 | original | 03/01/2017
3 | revised | 09/01/2017
I do not know how to construct a WHERE clause that conditionally checks whether there are 1 or 2 rows for a given id, nor am I sure how to use GROUP BY because the revised date could be greater than or less than than the original date so use of aggregate functions MAX or MIN don't work. I thought about using CASE somehow, but also do not know how to construct a conditional that chooses between two different rows of data (if there are two rows) and display one of them rather than the other.
Any suggested approaches would be appreciated.
Thanks!

you can use row number for this.
WITH T AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Type DESC) AS RN
FROM YourTable
)
SELECT *
FROM T
WHERE RN = 1

Is something like this sufficient?
SELECT *
FROM mytable m1
WHERE type='revised'
or 1=(SELECT COUNT(*) FROM mytable m2 WHERE m2.id=m1.id)

You could use a subquery to take the MAX([type]). In this case it works for [type] since alphabetically we want revised first, then original and "r" comes after "o" in the alphabet. We can then INNER JOIN back on the same table with the matching conditions.
SELECT T2.*
FROM (
SELECT id, MAX([type]) AS [MAXtype]
FROM myTABLE
GROUP BY id
) AS dT INNER JOIN myTable T2 ON dT.id = T2.id AND dT.[MAXtype] = T2.[type]
ORDER BY T2.[id]
Gives output:
id type duedate
1 revised 2017-02-01
2 original 2017-03-01
3 revised 2017-09-01
Here is the sqlfiddle: http://sqlfiddle.com/#!6/14121f/6/0

Multiple Column Duplicate Criteria

I am using SQL Server. This is my sample data set:
IDNO| Consigment | SO_Number | Acc Number | OfficeNumber|PL9 |Remarks
--- | -----------| ----------| -----------| ------------|-------|-------
1 | AA12345MY | 1024450191| 8800400431 |B213 |W449401|Stay
2 | AA12345MY | 1024450192| 8800400431 |B213 |W449401|Remove
3 | BA12345MY | 1024460121| 8800400726 |K678 |W229790|Stay
4 | BA12345MY | 1024460124| 8800400726 |K678 |W229790|Remove
I want to put a remarks on row 2 and 4 as it is a duplicates.
Duplicate criteria must match these 4 columns:
Consigment
Acc Number
OfficeNumber
PL9
I am removing the youngest SO number (which one is the latest)
I haven't got a clue on how to start as I never found a perfect reference
Regards,
Fadlisham Fadzil

One approach here to create a CTE which labels duplicate records and then delete from that CTE:
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY Consigment, [Acc Number], OfficeNumber, PL9
ORDER BY SO_Number) rn
FROM yourTable
)
DELETE FROM cte
WHERE rn > 1;

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

sql selection of one value from several identical - sql-server

select Name,max(date) as date,number from atable group by Name, number

You may use this CTE and manage which date (first or last) you will get WITH data AS ( SELECT Name, date, number, row_number() OVER (PARTITION BY Name ORDER BY date) AS row_num FROM test01 ) SELECT Name, date, number FROM data WHERE row_num = 1

Related

PostgreSQL - Filtering result set by array column

Using STRING_SPLIT for 2 columns in a single table

SQL: doing several joins to evaluate different columns

TSQL Conditional Where or Group By?

Multiple Column Duplicate Criteria

Categories

Resources