Which aggregate function to use in the following pivot clause?

Which aggregate function to use in the following pivot clause? - database

I have a table Players which has two columns : Name and Sport_Played.
Sample data would be like:
Name. Sport _played
Ravi Cricket
Raju Cricket
Ronaldo Football
Messi Football
Anand Chess
I want to pivot the table having columns as sport played and the columns should contain the names of players sorted ascendingly.
Cricket Football Chess
Raju Messi Anand
Ravi Ronaldo Null
The problem is that pivot requires an aggregate function. What aggregate function to use to display the names of players as part of column of sport played. Thanks.

Without an example of how you want your output it is difficult to know what you are intending but:
having columns as sport played and the columns should contain the names of players sorted ascendingly
You do not need to use PIVOT, you can use LISTAGG:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE players ( Name, Sport_played ) AS
SELECT 'Ravi', 'Cricket' FROM DUAL UNION ALL
SELECT 'Raju', 'Cricket' FROM DUAL UNION ALL
SELECT 'Ronaldo', 'Football' FROM DUAL UNION ALL
SELECT 'Messi', 'Football' FROM DUAL UNION ALL
SELECT 'Anand', 'Chess' FROM DUAL;
Query 1:
SELECT sport_played,
LISTAGG( name, ',' ) WITHIN GROUP ( ORDER BY name ) As names
FROM players
GROUP BY sport_played
Results:
| SPORT_PLAYED | NAMES |
|--------------|---------------|
| Chess | Anand |
| Cricket | Raju,Ravi |
| Football | Messi,Ronaldo |
Update:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE players ( Name, Sport_played ) AS
SELECT 'Ravi', 'Cricket' FROM DUAL UNION ALL
SELECT 'Raju', 'Cricket' FROM DUAL UNION ALL
SELECT 'Ronaldo', 'Football' FROM DUAL UNION ALL
SELECT 'Messi', 'Football' FROM DUAL UNION ALL
SELECT 'Anand', 'Chess' FROM DUAL;
Query 1:
SELECT *
FROM ( SELECT p.*,
ROW_NUMBER() OVER ( PARTITION BY Sport_played
ORDER BY name ) AS rn
FROM players p )
PIVOT (
MAX( Name )
FOR Sport_Played IN (
'Cricket' As Cricket,
'Football' As Football,
'Chess' AS Chess
)
)
Results:
| RN | CRICKET | FOOTBALL | CHESS |
|----|---------|----------|--------|
| 1 | Raju | Messi | Anand |
| 2 | Ravi | Ronaldo | (null) |
You can use any (string) aggregation function in the PIVOT including MAX(name), MIN(name) or even LISTAGG( name, ',' ) WITHIN GROUP ( ORDER BY Name ). The ROW_NUMBER() analytic function will generate a unique number-per-sport so the aggregation function will only ever work on a single value so it does not matter what aggregation function is used.

Related

Using STRING_SPLIT for 2 columns in a single table

I've started from a table like this
ID | City | Sales
1 | London,New York,Paris,Berlin,Madrid| 20,30,,50
2 | Istanbul,Tokyo,Brussels | 4,5,6
There can be an unlimited amount of cities and/or sales.
I need to get each city and their salesamount their own record. So my result should look something like this:
ID | City | Sales
1 | London | 20
1 | New York | 30
1 | Paris |
1 | Berlin | 50
1 | Madrid |
2 | Istanbul | 4
2 | Tokyo | 5
2 | Brussels | 6
What I got so far is
SELECT ID, splitC.Value, splitS.Value
FROM Table
CROSS APLLY STRING_SPLIT(Table.City,',') splitC
CROSS APLLY STRING_SPLIT(Table.Sales,',') splitS
With one cross apply, this works perfectly. But when executing the query with a second one, it starts to multiply the number of records a lot (which makes sense I think, because it's trying to split the sales for each city again).
What would be an option to solve this issue? STRING_SPLIT is not neccesary, it's just how I started on it.

STRING_SPLIT() is not an option, because (as is mentioned in the documantation) the output rows might be in any order and the order is not guaranteed to match the order of the substrings in the input string.
But you may try with a JSON-based approach, using OPENJSON() and string transformation (comma-separated values are transformed into a valid JSON array - London,New York,Paris,Berlin,Madrid into ["London","New York","Paris","Berlin","Madrid"]). The result from the OPENJSON() with default schema is a table with columns key, value and type and the key column is the 0-based index of each item in this array:
Table:
CREATE TABLE Data (
ID int,
City varchar(1000),
Sales varchar(1000)
)
INSERT INTO Data
(ID, City, Sales)
VALUES
(1, 'London,New York,Paris,Berlin,Madrid', '20,30,,50'),
(2, 'Istanbul,Tokyo,Brussels', '4,5,6')
Statement:
SELECT d.ID, a.City, a.Sales
FROM Data d
CROSS APPLY (
SELECT c.[value] AS City, s.[value] AS Sales
FROM OPENJSON(CONCAT('["', REPLACE(d.City, ',', '","'), '"]')) c
LEFT OUTER JOIN OPENJSON(CONCAT('["', REPLACE(d.Sales, ',', '","'), '"]')) s
ON c.[key] = s.[key]
) a
Result:
ID City Sales
1 London 20
1 New York 30
1 Paris
1 Berlin 50
1 Madrid NULL
2 Istanbul 4
2 Tokyo 5
2 Brussels 6

STRING_SPLIT has no context of what oridinal positions are. In fact, the documentation specifically states that it doesn't care about it:
The order of the output may vary as the order is not guaranteed to match the order of the substrings in the input string.
As a result, you need to use something that is aware of such basic things, such as DelimitedSplit8k_LEAD.
Then you can do something like this:
WITH Cities AS(
SELECT ID,
DSc.Item,
DSc.ItemNumber
FROM dbo.YourTable YT
CROSS APPLY dbo.DelimitedSplit8k_LEAD(YT.City,',') DSc)
Sales AS(
SELECT ID,
DSs.Item,
DSs.ItemNumber
FROM dbo.YourTable YT
CROSS APPLY dbo.DelimitedSplit8k_LEAD(YT.Sales,',') DSs)
SELECT ISNULL(C.ID,S.ID) AS ID,
C.Item AS City,
S.Item AS Sale
FROM Cities C
FULL OUTER JOIN Sales S ON C.ItemNumber = S.ItemNumber;
Of course, however, the real solution is fix your design. This type of design is going to only cause you 100's of problems in the future. Fix it now, not later; you'll reap so many rewards sooner the earlier you do it.

Split two columns in one row

I have a table with two columns: Salary and Department_id
|Salary|Department_id|
|---------------------
|1000 |10 |
|2000 |90 |
|3000 |10 |
|4000 |90 |
Now I need to split this colums in one row and calculate sum of salary for every department.
Output:
|Dep10|Dep90|
|-----------|
|4000 |6000 |
NOTE: "Dep10" and "Dep90" are aliases.
I try to use decode or case
SELECT DECODE(department_id, 10, SUM(salary),NULL) AS "Dep10",
DECODE(department_id, 90, SUM(salary), NULL) AS "Dep90"
FROM employees
GROUP BY department_id
but I obtain this:

select
sum(case when Department_id = '10' then Salary end) as Dep10,
sum(case when Department_id = '90' then Salary end) as Dep90
from employees

Use PIVOT:
Oracle Setup:
CREATE TABLE test_data ( Salary, Department_id ) AS
SELECT 1000, 10 FROM DUAL UNION ALL
SELECT 2000, 90 FROM DUAL UNION ALL
SELECT 3000, 10 FROM DUAL UNION ALL
SELECT 4000, 90 FROM DUAL
Query:
SELECT *
FROM test_data
PIVOT ( SUM( salary ) FOR Department_id IN ( 10 AS Dep10, 90 AS Dep90 ) )
Output:
DEP10 | DEP90
----: | ----:
4000 | 6000
db<>fiddle here

I think you should:
1 - use GROUP BY clause on your first table.
2 - use PIVOT feature you can learn about it here. In a few words, you can transpose columns and rows using it.
Good luck!

MS SQL - Concatenating results in one joined table/column where the rest are duplicate values

EDIT: Adding Code reduction to make query clearer:
Select
AMS.arecno As [RecNo],
Convert(DATE,AMS.adate) As [Date],
ac.acommentno as [commentno],
ac.acomment as [comments]
From
amain As AMS
left Join
asfat As ASF On AMS.arecno = ASF.afatrecno
left join
acomments as ac on ac.areportno = asf.afatrecno
Order By
AMS.arecno Desc
My first table has this type of info:
recno | date
1234 | 2017
6548 | 2018
I am then left joining on a table called comments.
Per record number (recno) there are multiple comments.
like this:
recno | commentno | comments
1234 | 1 | blah blah...
1234 | 2 | doot doot...
6548 | 1 | jib jab...
6548 | 2 | flib flob...
I'd like to show this:
recno | date | comments |
1234 | 2017 | Comment 1: blah blah... Comment 2: doot doot...
6548 | 2018 | Comment 1:jib jab... Comment 2: flib flob...
I've looked up and tried a few solutions but am really struggling. Any help would be much appreciated.
Note: I can't create any tables or stored procedures due to limitations on our ODBC setup and have a very limited sql knowledge compared to most.

you can try the following query statement
CREATE TABLE #RecordInfo(RecNo INT,RecDate VARCHAR(10))
INSERT INTO #RecordInfo
SELECT 1234,'2017' UNION ALL SELECT 6548,'2018'
CREATE TABLE #CommentsInfo(RecNo INT,CommentNo INT,Comments VARCHAR(MAX))
INSERT INTO #CommentsInfo
SELECT 1234, 1, 'blah blah' UNION ALL SElECT 1234, 2, 'doot doot'
UNION ALL SELECT 6548, 1, 'jib jab' UNION ALL SELECT 6548 ,2,'flib flob'
;With CTE AS
(
SELECT R.RecNo,R.RecDate,CONCAT('Comment ',C.CommentNo,': ', C.Comments) AS Comments
FROM #RecordInfo R
LEFT JOIN #CommentsInfo C ON R.RecNo = C.RecNo
)
SELECT
t1.RecNo, t1.RecDate,Comments=
STUFF((
SELECT ',' + t2.Comments
FROM CTE t2
WHERE t1.RecNo = t2.RecNo
FOR XML PATH(''), TYPE).value('.','varchar(1000)'),1,1,'')
FROM CTE as t1
GROUP BY t1.RecNo,t1.RecDate;
DROP TABLE #RecordInfo
DROP TABLE #CommentsInfo
I think your problem will be solved by this. Thanks.

Oracle Data Masking using random names from a temp table

We need to mask some Personally Identifiable Information in our Oracle 10g database. The process I'm using is based on another masking script that we are using for Sybase (which works fine), but since the information in the Oracle and Sybase databases is quite different, I've hit a bit of a roadblock.
The process is to select all data out of the PERSON table, into a PERSON_TRANSFER table. We then use a random number to select a random name from the PERSON_TRANSFER table, and then update the PERSON table with that random name. This works fine in Sybase because there is only one row per person in the PERSON table.
The issue I've encountered is that in the Oracle DB, there are multiple rows per PERSON, and the name may or may not be different for each row, e.g.
|PERSON|
:-----------------:
|PERSON_ID|SURNAME|
|1 |Purple |
|1 |Purple |
|1 |Pink | <--
|2 |Gray |
|2 |Blue | <--
|3 |Black |
|3 |Black |
The PERSON_TRANSFER is a copy of this table. The table is in the millions of rows, so I'm just giving a very basic example here :)
The logic I'm currently using would just update all rows to be the same for that PERSON_ID, e.g.
|PERSON|
:-----------------:
|PERSON_ID|SURNAME|
|1 |Brown |
|1 |Brown |
|1 |Brown | <--
|2 |White |
|2 |White | <--
|3 |Red |
|3 |Red |
But this is incorrect as the name that is different for that PERSON_ID needs to be masked differently, e.g.
|PERSON|
:-----------------:
|PERSON_ID|SURNAME|
|1 |Brown |
|1 |Brown |
|1 |Yellow | <--
|2 |White |
|2 |Green | <--
|3 |Red |
|3 |Red |
How do I get the script to update the distinct names separately, rather than just update them all based on the PERSON_ID? My script currently looks like this
DECLARE
v_SURNAME VARCHAR2(30);
BEGIN
select pt.SURNAME
into v_SURNAME
from PERSON_TRANSFER pt
where pt.PERSON_ID = (SELECT PERSON_ID FROM
( SELECT PERSON_ID FROM PERSON_TRANSFER
ORDER BY dbms_random.value )
WHERE rownum = 1);
END;
Which causes an error because too many rows are returned for that random PERSON_ID.
1) Is there a more efficient way to update the PERSON table so that names are randomly assigned?
2) How do I ensure that the PERSON table is masked correctly, in that the various surnames are kept distinct (or the same, if they are all the same) for any single PERSON_ID?
I'm hoping this is enough information. I've simplified it a fair bit (the table has a lot more columns, such as First Name, DOB, TFN, etc.) in the hope that it makes the explanation easier.
Any input/advice/help would be greatly appreciated :)
Thanks.

One of the complications is that the same surname may appear under different person_id's in the PERSON table. You may be better off using a separate, auxiliary table holding surnames that are distinct (for example you can populate it by selecting distinct surnames from PERSONS).
Setup:
create table persons (person_id, surname) as (
select 1, 'Purple' from dual union all
select 1, 'Purple' from dual union all
select 1, 'Pink' from dual union all
select 2, 'Gray' from dual union all
select 2, 'Blue' from dual union all
select 3, 'Black' from dual union all
select 3, 'Black' from dual
);
create table mask_names (person_id, surname) as (
select 1, 'Apple' from dual union all
select 2, 'Banana' from dual union all
select 3, 'Grape' from dual union all
select 4, 'Orange' from dual union all
select 5, 'Pear' from dual union all
select 6, 'Plum' from dual
);
commit;
CTAS to create PERSON_TRANSFER:
create table person_transfer (person_id, surname) as (
select ranked.person_id, rand.surname
from ( select person_id, surname,
dense_rank() over (order by surname) as rk
from persons
) ranked
inner join
( select surname, row_number() over (order by dbms_random.value()) as rnd
from mask_names
) rand
on ranked.rk = rand.rnd
);
commit;
Outcome:
SQL> select * from person_transfer order by person_id, surname;
PERSON_ID SURNAME
---------- -------
1 Pear
1 Pear
1 Plum
2 Banana
2 Grape
3 Apple
3 Apple
Added at OP's request: The scope has been extended - the requirement now is to update surname in the original table (PERSONS). This can be best done with the merge statement and the join (sub)query I demonstrated earlier. This works best when the PERSONS table has a PK, and indeed the OP said the real-life table PERSONS has such a PK, made up of the person_id column and an additional column, date_from. In the script below, I drop persons and recreate it to include this additional column. Then I show the query and the result.
Note - a mask_names table is still needed. A tempting alternative would be to just shuffle the surnames already present in persons so there would be no need for a "helper" table. Alas that won't work. For example, in a trivial example persons has only one row. To obfuscate surnames, one MUST come up with surnames not in the original table. More interestingly, assume every person_id has exactly two rows, with distinct surnames, but those surnames in every case are 'John' and 'Mary'. It doesn't help to just shuffle those two names. One does need a "helper" table like mask_names.
New setup:
drop table persons;
create table persons (person_id, date_from, surname) as (
select 1, date '2016-01-04', 'Purple' from dual union all
select 1, date '2016-01-20', 'Purple' from dual union all
select 1, date '2016-03-20', 'Pink' from dual union all
select 2, date '2016-01-24', 'Gray' from dual union all
select 2, date '2016-03-21', 'Blue' from dual union all
select 3, date '2016-04-02', 'Black' from dual union all
select 3, date '2016-02-13', 'Black' from dual
);
commit;
select * from persons;
PERSON_ID DATE_FROM SURNAME
---------- ---------- -------
1 2016-01-04 Purple
1 2016-01-20 Purple
1 2016-03-20 Pink
2 2016-01-24 Gray
2 2016-03-21 Blue
3 2016-04-02 Black
3 2016-02-13 Black
7 rows selected.
New query and result:
merge into persons p
using (
select ranked.person_id, ranked.date_from, rand.surname
from (
select person_id, date_from, surname,
dense_rank() over (order by surname) as rk
from persons
) ranked
inner join (
select surname, row_number() over (order by dbms_random.value()) as rnd
from mask_names
) rand
on ranked.rk = rand.rnd
) t
on (p.person_id = t.person_id and p.date_from = t.date_from)
when matched then update
set p.surname = t.surname;
commit;
select * from persons;
PERSON_ID DATE_FROM SURNAME
---------- ---------- -------
1 2016-01-04 Apple
1 2016-01-20 Apple
1 2016-03-20 Orange
2 2016-01-24 Plum
2 2016-03-21 Grape
3 2016-04-02 Banana
3 2016-02-13 Banana
7 rows selected.

How to remove a duplicate row in SQL with an older date field

I have two rows in my table which are exact duplicates with the exception of a date field. I want to find these records and delete the older record by hopefully comparing the dates.
For example I have the following data
ctrc_num | Ctrc_name | some_date
---------------------------------------
12345 | John R | 2011-01-12
12345 | John R | 2012-01-12
56789 | Sam S | 2011-01-12
56789 | Sam S | 2012-01-12
Now the idea is to find duplicates with a different 'some_date' field and delete the older records. The final output should look something like this.
ctrc_num | Ctrc_name | some_date
---------------------------------------
12345 | John R | 2012-01-12
56789 | Sam S | 2012-01-12
Also note that my table does not have a primary key, it was originally created this way, not sure why, and it has to fit inside a stored procedure.

If you look at this:
SELECT * FROM <tablename> WHERE some_date IN
(
SELECT MAX(some_date) FROM <tablename> GROUP BY ctrc_num,ctrc_name
HAVING COUNT(ctrc_num) > 1
AND COUNT(ctrc_name) > 1
)
You can see it selects the two most recent dates for the duplicate rows. If I switch the select in the brackets to 'min date' and use it to delete then you are removing the two older dates for the duplicate rows.
DELETE FROM <tablename> WHERE some_date IN
(
SELECT MIN(some_date) FROM <tablename> GROUP BY ctrc_num,ctrc_name
HAVING COUNT(ctrc_num) > 1
AND COUNT(ctrc_name) > 1
)

This is for SQL Server
CREATE TABLE StackOverFlow
([ctrc_num] int, [Ctrc_name] varchar(6), [some_date] datetime)
;
INSERT INTO StackOverFlow
([ctrc_num], [Ctrc_name], [some_date])
SELECT 12345, 'John R', '2011-01-12 00:00:00' UNION ALL
SELECT 12345, 'John R', '2012-01-12 00:00:00' UNION ALL
SELECT 56789, 'Sam S', '2011-01-12 00:00:00' UNION ALL
SELECT 56789, 'Sam S', '2012-01-12 00:00:00'
;WITH RankedByDate AS
(
SELECT ctrc_num
,Ctrc_name
,some_date
,ROW_NUMBER() OVER(PARTITION BY Ctrc_num, Ctrc_name ORDER BY some_date DESC) AS rNum
FROM StackOverFlow
)
DELETE
FROM RankedByDate
WHERE rNum > 1
SELECT
[ctrc_num]
, [Ctrc_name]
, [some_date]
FROM StackOverFlow
And here is the sql fiddle to test it http://sqlfiddle.com/#!6/32718/6
What I tried to do here is
rank the records by descending order of date
delete those that are older (keep the latest)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Which aggregate function to use in the following pivot clause? - database

Related

Using STRING_SPLIT for 2 columns in a single table

Split two columns in one row

MS SQL - Concatenating results in one joined table/column where the rest are duplicate values

Oracle Data Masking using random names from a temp table

How to remove a duplicate row in SQL with an older date field

Categories

Resources