Split two columns in one row - database

I have a table with two columns: Salary and Department_id
|Salary|Department_id|
|---------------------
|1000 |10 |
|2000 |90 |
|3000 |10 |
|4000 |90 |
Now I need to split this colums in one row and calculate sum of salary for every department.
Output:
|Dep10|Dep90|
|-----------|
|4000 |6000 |
NOTE: "Dep10" and "Dep90" are aliases.
I try to use decode or case
SELECT DECODE(department_id, 10, SUM(salary),NULL) AS "Dep10",
DECODE(department_id, 90, SUM(salary), NULL) AS "Dep90"
FROM employees
GROUP BY department_id
but I obtain this:

select
sum(case when Department_id = '10' then Salary end) as Dep10,
sum(case when Department_id = '90' then Salary end) as Dep90
from employees

Use PIVOT:
Oracle Setup:
CREATE TABLE test_data ( Salary, Department_id ) AS
SELECT 1000, 10 FROM DUAL UNION ALL
SELECT 2000, 90 FROM DUAL UNION ALL
SELECT 3000, 10 FROM DUAL UNION ALL
SELECT 4000, 90 FROM DUAL
Query:
SELECT *
FROM test_data
PIVOT ( SUM( salary ) FOR Department_id IN ( 10 AS Dep10, 90 AS Dep90 ) )
Output:
DEP10 | DEP90
----: | ----:
4000 | 6000
db<>fiddle here

I think you should:
1 - use GROUP BY clause on your first table.
2 - use PIVOT feature you can learn about it here. In a few words, you can transpose columns and rows using it.
Good luck!

Related

Which aggregate function to use in the following pivot clause?

I have a table Players which has two columns : Name and Sport_Played.
Sample data would be like:
Name. Sport _played
Ravi Cricket
Raju Cricket
Ronaldo Football
Messi Football
Anand Chess
I want to pivot the table having columns as sport played and the columns should contain the names of players sorted ascendingly.
Cricket Football Chess
Raju Messi Anand
Ravi Ronaldo Null
The problem is that pivot requires an aggregate function. What aggregate function to use to display the names of players as part of column of sport played. Thanks.
Without an example of how you want your output it is difficult to know what you are intending but:
having columns as sport played and the columns should contain the names of players sorted ascendingly
You do not need to use PIVOT, you can use LISTAGG:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE players ( Name, Sport_played ) AS
SELECT 'Ravi', 'Cricket' FROM DUAL UNION ALL
SELECT 'Raju', 'Cricket' FROM DUAL UNION ALL
SELECT 'Ronaldo', 'Football' FROM DUAL UNION ALL
SELECT 'Messi', 'Football' FROM DUAL UNION ALL
SELECT 'Anand', 'Chess' FROM DUAL;
Query 1:
SELECT sport_played,
LISTAGG( name, ',' ) WITHIN GROUP ( ORDER BY name ) As names
FROM players
GROUP BY sport_played
Results:
| SPORT_PLAYED | NAMES |
|--------------|---------------|
| Chess | Anand |
| Cricket | Raju,Ravi |
| Football | Messi,Ronaldo |
Update:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE players ( Name, Sport_played ) AS
SELECT 'Ravi', 'Cricket' FROM DUAL UNION ALL
SELECT 'Raju', 'Cricket' FROM DUAL UNION ALL
SELECT 'Ronaldo', 'Football' FROM DUAL UNION ALL
SELECT 'Messi', 'Football' FROM DUAL UNION ALL
SELECT 'Anand', 'Chess' FROM DUAL;
Query 1:
SELECT *
FROM ( SELECT p.*,
ROW_NUMBER() OVER ( PARTITION BY Sport_played
ORDER BY name ) AS rn
FROM players p )
PIVOT (
MAX( Name )
FOR Sport_Played IN (
'Cricket' As Cricket,
'Football' As Football,
'Chess' AS Chess
)
)
Results:
| RN | CRICKET | FOOTBALL | CHESS |
|----|---------|----------|--------|
| 1 | Raju | Messi | Anand |
| 2 | Ravi | Ronaldo | (null) |
You can use any (string) aggregation function in the PIVOT including MAX(name), MIN(name) or even LISTAGG( name, ',' ) WITHIN GROUP ( ORDER BY Name ). The ROW_NUMBER() analytic function will generate a unique number-per-sport so the aggregation function will only ever work on a single value so it does not matter what aggregation function is used.

How can we take the sum of each columns in SQL Server without using ;with cte?

How can I take sum of each rows by two row sum in 3rd column?
Here's a screenshot to illustrate:
You can see for id 1 sum is 10 but for id 2 sum is 10+50 = 60
and third sum is 60+100 = 160 and so on.
With Cte it is working fine for me. I need with out ;with cte means though code I need the sum
Example will as shown below
DECLARE #t TABLE(ColumnA INT, ColumnB VARCHAR(50));
INSERT INTO #t
VALUES (10,'1'), (50,'2'), (100,'3'), (5,'4'), (45,'5');
;WITH cte AS
(
SELECT ColumnB, SUM(ColumnA) asum
FROM #t
GROUP BY ColumnB
), cteRanked AS
(
SELECT asum, ColumnB, ROW_NUMBER() OVER(ORDER BY ColumnB) rownum
FROM cte
)
SELECT
(SELECT SUM(asum)
FROM cteRanked c2
WHERE c2.rownum <= c1.rownum) AS ColumnA,
ColumnB
FROM
cteRanked c1;
One option, which doesn't require explicit analytic functions, would be to use a correlated subquery to calculate the running total:
SELECT
t1.ID,
t1.Currency,
(SELECT SUM(t2.Currency) FROM yourTable t2 WHERE t2.ID <= t1.ID) AS Sum
FROM yourTable t1
Output:
Demo here:
Rextester
It looks like you need a simple running total.
There is an easy and efficient way to calculate running total in SQL Server 2012 and later. You can use SUM(...) OVER (ODER BY ...), like in the example below:
Sample data
DECLARE #t TABLE(ColumnA INT, ColumnB VARCHAR(50));
INSERT INTO #t
VALUES (10,'1'), (50,'2'), (100,'3'), (5,'4'), (45,'5');
Query
SELECT
ColumnB
,ColumnA
,SUM(ColumnA) OVER (ORDER BY ColumnB
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS SumColumnA
FROM #t
ORDER BY ColumnB;
Result
+---------+---------+------------+
| ColumnB | ColumnA | SumColumnA |
+---------+---------+------------+
| 1 | 10 | 10 |
| 2 | 50 | 60 |
| 3 | 100 | 160 |
| 4 | 5 | 165 |
| 5 | 45 | 210 |
+---------+---------+------------+
For SQL Server 2008 and below you need to use either correlated sub-queries as you do already or a simple cursor, which may be faster if the table is large.

Oracle Data Masking using random names from a temp table

We need to mask some Personally Identifiable Information in our Oracle 10g database. The process I'm using is based on another masking script that we are using for Sybase (which works fine), but since the information in the Oracle and Sybase databases is quite different, I've hit a bit of a roadblock.
The process is to select all data out of the PERSON table, into a PERSON_TRANSFER table. We then use a random number to select a random name from the PERSON_TRANSFER table, and then update the PERSON table with that random name. This works fine in Sybase because there is only one row per person in the PERSON table.
The issue I've encountered is that in the Oracle DB, there are multiple rows per PERSON, and the name may or may not be different for each row, e.g.
|PERSON|
:-----------------:
|PERSON_ID|SURNAME|
|1 |Purple |
|1 |Purple |
|1 |Pink | <--
|2 |Gray |
|2 |Blue | <--
|3 |Black |
|3 |Black |
The PERSON_TRANSFER is a copy of this table. The table is in the millions of rows, so I'm just giving a very basic example here :)
The logic I'm currently using would just update all rows to be the same for that PERSON_ID, e.g.
|PERSON|
:-----------------:
|PERSON_ID|SURNAME|
|1 |Brown |
|1 |Brown |
|1 |Brown | <--
|2 |White |
|2 |White | <--
|3 |Red |
|3 |Red |
But this is incorrect as the name that is different for that PERSON_ID needs to be masked differently, e.g.
|PERSON|
:-----------------:
|PERSON_ID|SURNAME|
|1 |Brown |
|1 |Brown |
|1 |Yellow | <--
|2 |White |
|2 |Green | <--
|3 |Red |
|3 |Red |
How do I get the script to update the distinct names separately, rather than just update them all based on the PERSON_ID? My script currently looks like this
DECLARE
v_SURNAME VARCHAR2(30);
BEGIN
select pt.SURNAME
into v_SURNAME
from PERSON_TRANSFER pt
where pt.PERSON_ID = (SELECT PERSON_ID FROM
( SELECT PERSON_ID FROM PERSON_TRANSFER
ORDER BY dbms_random.value )
WHERE rownum = 1);
END;
Which causes an error because too many rows are returned for that random PERSON_ID.
1) Is there a more efficient way to update the PERSON table so that names are randomly assigned?
2) How do I ensure that the PERSON table is masked correctly, in that the various surnames are kept distinct (or the same, if they are all the same) for any single PERSON_ID?
I'm hoping this is enough information. I've simplified it a fair bit (the table has a lot more columns, such as First Name, DOB, TFN, etc.) in the hope that it makes the explanation easier.
Any input/advice/help would be greatly appreciated :)
Thanks.
One of the complications is that the same surname may appear under different person_id's in the PERSON table. You may be better off using a separate, auxiliary table holding surnames that are distinct (for example you can populate it by selecting distinct surnames from PERSONS).
Setup:
create table persons (person_id, surname) as (
select 1, 'Purple' from dual union all
select 1, 'Purple' from dual union all
select 1, 'Pink' from dual union all
select 2, 'Gray' from dual union all
select 2, 'Blue' from dual union all
select 3, 'Black' from dual union all
select 3, 'Black' from dual
);
create table mask_names (person_id, surname) as (
select 1, 'Apple' from dual union all
select 2, 'Banana' from dual union all
select 3, 'Grape' from dual union all
select 4, 'Orange' from dual union all
select 5, 'Pear' from dual union all
select 6, 'Plum' from dual
);
commit;
CTAS to create PERSON_TRANSFER:
create table person_transfer (person_id, surname) as (
select ranked.person_id, rand.surname
from ( select person_id, surname,
dense_rank() over (order by surname) as rk
from persons
) ranked
inner join
( select surname, row_number() over (order by dbms_random.value()) as rnd
from mask_names
) rand
on ranked.rk = rand.rnd
);
commit;
Outcome:
SQL> select * from person_transfer order by person_id, surname;
PERSON_ID SURNAME
---------- -------
1 Pear
1 Pear
1 Plum
2 Banana
2 Grape
3 Apple
3 Apple
Added at OP's request: The scope has been extended - the requirement now is to update surname in the original table (PERSONS). This can be best done with the merge statement and the join (sub)query I demonstrated earlier. This works best when the PERSONS table has a PK, and indeed the OP said the real-life table PERSONS has such a PK, made up of the person_id column and an additional column, date_from. In the script below, I drop persons and recreate it to include this additional column. Then I show the query and the result.
Note - a mask_names table is still needed. A tempting alternative would be to just shuffle the surnames already present in persons so there would be no need for a "helper" table. Alas that won't work. For example, in a trivial example persons has only one row. To obfuscate surnames, one MUST come up with surnames not in the original table. More interestingly, assume every person_id has exactly two rows, with distinct surnames, but those surnames in every case are 'John' and 'Mary'. It doesn't help to just shuffle those two names. One does need a "helper" table like mask_names.
New setup:
drop table persons;
create table persons (person_id, date_from, surname) as (
select 1, date '2016-01-04', 'Purple' from dual union all
select 1, date '2016-01-20', 'Purple' from dual union all
select 1, date '2016-03-20', 'Pink' from dual union all
select 2, date '2016-01-24', 'Gray' from dual union all
select 2, date '2016-03-21', 'Blue' from dual union all
select 3, date '2016-04-02', 'Black' from dual union all
select 3, date '2016-02-13', 'Black' from dual
);
commit;
select * from persons;
PERSON_ID DATE_FROM SURNAME
---------- ---------- -------
1 2016-01-04 Purple
1 2016-01-20 Purple
1 2016-03-20 Pink
2 2016-01-24 Gray
2 2016-03-21 Blue
3 2016-04-02 Black
3 2016-02-13 Black
7 rows selected.
New query and result:
merge into persons p
using (
select ranked.person_id, ranked.date_from, rand.surname
from (
select person_id, date_from, surname,
dense_rank() over (order by surname) as rk
from persons
) ranked
inner join (
select surname, row_number() over (order by dbms_random.value()) as rnd
from mask_names
) rand
on ranked.rk = rand.rnd
) t
on (p.person_id = t.person_id and p.date_from = t.date_from)
when matched then update
set p.surname = t.surname;
commit;
select * from persons;
PERSON_ID DATE_FROM SURNAME
---------- ---------- -------
1 2016-01-04 Apple
1 2016-01-20 Apple
1 2016-03-20 Orange
2 2016-01-24 Plum
2 2016-03-21 Grape
3 2016-04-02 Banana
3 2016-02-13 Banana
7 rows selected.

How to remove a duplicate row in SQL with an older date field

I have two rows in my table which are exact duplicates with the exception of a date field. I want to find these records and delete the older record by hopefully comparing the dates.
For example I have the following data
ctrc_num | Ctrc_name | some_date
---------------------------------------
12345 | John R | 2011-01-12
12345 | John R | 2012-01-12
56789 | Sam S | 2011-01-12
56789 | Sam S | 2012-01-12
Now the idea is to find duplicates with a different 'some_date' field and delete the older records. The final output should look something like this.
ctrc_num | Ctrc_name | some_date
---------------------------------------
12345 | John R | 2012-01-12
56789 | Sam S | 2012-01-12
Also note that my table does not have a primary key, it was originally created this way, not sure why, and it has to fit inside a stored procedure.
If you look at this:
SELECT * FROM <tablename> WHERE some_date IN
(
SELECT MAX(some_date) FROM <tablename> GROUP BY ctrc_num,ctrc_name
HAVING COUNT(ctrc_num) > 1
AND COUNT(ctrc_name) > 1
)
You can see it selects the two most recent dates for the duplicate rows. If I switch the select in the brackets to 'min date' and use it to delete then you are removing the two older dates for the duplicate rows.
DELETE FROM <tablename> WHERE some_date IN
(
SELECT MIN(some_date) FROM <tablename> GROUP BY ctrc_num,ctrc_name
HAVING COUNT(ctrc_num) > 1
AND COUNT(ctrc_name) > 1
)
This is for SQL Server
CREATE TABLE StackOverFlow
([ctrc_num] int, [Ctrc_name] varchar(6), [some_date] datetime)
;
INSERT INTO StackOverFlow
([ctrc_num], [Ctrc_name], [some_date])
SELECT 12345, 'John R', '2011-01-12 00:00:00' UNION ALL
SELECT 12345, 'John R', '2012-01-12 00:00:00' UNION ALL
SELECT 56789, 'Sam S', '2011-01-12 00:00:00' UNION ALL
SELECT 56789, 'Sam S', '2012-01-12 00:00:00'
;WITH RankedByDate AS
(
SELECT ctrc_num
,Ctrc_name
,some_date
,ROW_NUMBER() OVER(PARTITION BY Ctrc_num, Ctrc_name ORDER BY some_date DESC) AS rNum
FROM StackOverFlow
)
DELETE
FROM RankedByDate
WHERE rNum > 1
SELECT
[ctrc_num]
, [Ctrc_name]
, [some_date]
FROM StackOverFlow
And here is the sql fiddle to test it http://sqlfiddle.com/#!6/32718/6
What I tried to do here is
rank the records by descending order of date
delete those that are older (keep the latest)

How to aggregate on the same column?

I am trying to figure out which seems very simple but I am not able to figure out a way to do same.
Consider following table and data,
create table dummy_query
(id varchar2(20), amount number(10,2), memo varchar(20));
insert into dummy_query values('1', 10.00, 'Memo');
insert into dummy_query values('1', 20.00, 'Memo1');
I want to get the values as:
Id MemoValue Memo1Value TotalSum
----------
1 10.00 20.00 30.00
Is there any way to get data in this manner?
Thanks!
SQL Fiddle
Oracle 11g R2 Schema Setup:
create table dummy_query (id, amount, memo) AS
SELECT '1', 10.00, 'Memo' FROM DUAL
UNION ALL SELECT '1', 20.00, 'Memo1' FROM DUAL
UNION ALL SELECT '1', 30.00, 'Memo2' FROM DUAL;
Query 1:
If the TotalSum is the total of all memo amounts (and not just Memo and Memo1) then you can do:
SELECT ID,
SUM( CASE memo WHEN 'Memo' THEN amount END ) AS MemoValue,
SUM( CASE memo WHEN 'Memo1' THEN amount END ) AS Memo1Value,
SUM( amount ) AS TotalSum
FROM dummy_query
GROUP BY id
Results:
| ID | MEMOVALUE | MEMO1VALUE | TOTALSUM |
|----|-----------|------------|----------|
| 1 | 10 | 20 | 60 |
Query 2:
But if the TotalSum is just MemoValue + Memo1Value then add in a where clause:
SELECT ID,
SUM( CASE memo WHEN 'Memo' THEN amount END ) AS MemoValue,
SUM( CASE memo WHEN 'Memo1' THEN amount END ) AS Memo1Value,
SUM( amount ) AS TotalSum
FROM dummy_query
WHERE memo IN ( 'Memo', 'Memo1' )
GROUP BY id
Results:
| ID | MEMOVALUE | MEMO1VALUE | TOTALSUM |
|----|-----------|------------|----------|
| 1 | 10 | 20 | 30 |
Query 3:
Or, if you need to include all rows for another reason, then you could do:
SELECT ID,
SUM( CASE memo WHEN 'Memo' THEN amount END ) AS MemoValue,
SUM( CASE memo WHEN 'Memo1' THEN amount END ) AS Memo1Value,
SUM( CASE WHEN memo IN ( 'Memo', 'Memo1' ) THEN amount END ) AS TotalSum
FROM dummy_query
GROUP BY id
Results:
| ID | MEMOVALUE | MEMO1VALUE | TOTALSUM |
|----|-----------|------------|----------|
| 1 | 10 | 20 | 30 |
Im not very familiar with PIVOT but what you have tried I did build query on that. Hope this is out put you need -
SELECT a.*, b.SUM_AMT
FROM ((SELECT ID, amount, memo FROM dummy_query) PIVOT (SUM (amount)
FOR (memo)
IN ('Memo', 'Memo1'))) a,
( SELECT ID, SUM (AMOUNT) "SUM_AMT"
FROM dummy_query
GROUP BY id) b
WHERE a.id = b.id
There can be better ways to reslove this so lets see what other folks have to share.
Solution with pivot, which you already started:
select id, memo, memo1, memo+memo1 msum from (
select * from dummy_query
pivot (sum(amount) for (memo) in ('Memo' memo,'Memo1' memo1)))
and more traditional, without pivot:
select id, memo, memo1, memo1+memo msum from (
select id,
sum(case when memo='Memo' then amount end) memo,
sum(case when memo='Memo1' then amount end) memo1
from dummy_query group by id)
SQLFiddle

Resources