TSQL - UNPIVOT from Excel imported data - sql-server

I have an Excel spreadsheet that imports into a table like so:
+-------------------------------------------------------------------------+
| Col1 Col2 Col3 Col4 Col5 |
+-------------------------------------------------------------------------+
| Ref Name 01-01-2013 02-01-2013 03-01-2013 |
| 1 John 500 550 600 |
| 2 Fred 600 650 400 |
| 3 Paul 700 750 550 |
| 4 Steve 800 850 700 |
+-------------------------------------------------------------------------+
My goal is to change it to look like this:
+-------------------------------------------------------------------------+
| Ref Name Date Sales |
+-------------------------------------------------------------------------+
| 1 John 01-01-2013 500 |
| 1 John 02-02-2013 550 |
| 1 John 03-01-2013 600 |
| 2 Fred 01-01-2013 600 |
| ..... |
+-------------------------------------------------------------------------+
So far I figured out how to use UNPIVOT to get the dates and sales numbers into 1 column but that doesn't solve the problem of breaking the dates out into their own column. Any help is appreciated. Thanks!!

You could possibly use two separate UNPIVOT queries and then join them. The first unpivot, will convert the row with the ref value in col1, then the second subquery does an unpivot of the sales. You join the subqueries on the previous column names:
select s.col1,
s.col2,
d.value date,
s.value sales
from
(
select col1, col2, col, value
from yt
unpivot
(
value
for col in (col3, col4, col5)
) un
where col1 = 'ref'
) d
inner join
(
select col1, col2, col, value
from yt
unpivot
(
value
for col in (col3, col4, col5)
) un
where col1 <> 'ref'
) s
on d.col = s.col;
See SQL Fiddle with Demo

Related

I have a table where customer ID are being duplicated because of their reactivation date. I need to pivot the reactivation date per CustomerID

I have a following table
I need to pivot the table and have it like the table below:
How can I have the unique customer ID in a column and all the reactivation dates pivoted like in the above picture?
To attribute a numeric sequence to the reactivation dates, use row_number() over() and then you can pivot that sequence from rows to columns:
select
customer_id
, activation_date
, [1] as reactivation_dt_1
, [2] as reactivation_dt_2
, [3] as reactivation_dt_3
, [4] as reactivation_dt_4
from (
select
customer_id, activation_date, reactivation_date
, row_number() over(partition by customer_id
order by reactivation_date ASC) as pivcol
from mytable
) as d
pivot (
max(reactivation_date)
for pivcol in ([1],[2],[3],[4])
) as p
order by
customer_id
result
+-------------+-----------------+-------------------+-------------------+-------------------+-------------------+
| customer_id | activation_date | reactivation_dt_1 | reactivation_dt_2 | reactivation_dt_3 | reactivation_dt_4 |
+-------------+-----------------+-------------------+-------------------+-------------------+-------------------+
| 1 | 2010-01-01 | 2012-02-01 | 2015-03-01 | 2017-07-01 | 2022-07-01 |
| 2 | 2011-12-03 | 2013-05-01 | 2014-08-10 | 2015-12-09 | |
+-------------+-----------------+-------------------+-------------------+-------------------+-------------------+
see db<>fiddle here

SQL query to get value having multiple in the same table in SQL Server

Let's say I have a table with many columns like col1, col2, col3, id, variantId, col4, col5 etc
However I am only interested in id, variantId which look like this:
+----------+-----------+
| id | variantId |
+----------+-----------+
| a | 11 |
| a | 12 |
| b | 31 |
| c | 41 |
| c | 54 |
| d | abc |
| e | xyz |
| e | xyz |
+----------+-----------+
I need distinct ids which having count of distinct variantId more than once
In this case I would only get a and c
You can use group by and having:
select id
from t
group by id
having min(variant_id) <> max(variant_id);
You can also use:
having count(distinct variant_id) > 1
Try with group by having clause
select id
from table
group by id
having count(distinct variant_id) > 1
You can do it more efficiently with EXISTS:
select distinct t.id
from tablename t
where exists (
select 1 from tablename
where id = t.id and variantid <> t.variantid
)

Update several columns with latest values from another table

Here's the data:
[ TABLE_1 ]
id | prod1 | date1 | prod2 | date2 | prod3 | date3 |
---|--------|--------|--------|--------|--------|-------|
1 | null | null | null | null | null | null |
2 | null | null | null | null | null | null |
3 | null | null | null | null | null | null |
[ TABLE_2 ]
id | date | product |
-----|-------------|-----------|
1 | 20140101 | X |
1 | 20140102 | Y |
1 | 20140103 | Z |
2 | 20141201 | data |
2 | 20141201 | Y |
2 | 20141201 | Z |
3 | 20150101 | data2 |
3 | 20150101 | data3 |
3 | 20160101 | X |
Both tables have other columns not listed here.
date is formatted: yyyymmdd and datatype is int.
[ TABLE_2 ] doesn't have empty rows, just tried to make sample above more readable.
Here's the Goal:
I need to update [ TABLE_1 ] prod1,date1,prod2,date2,prod3,date3
with product collected from [ TABLE_2 ] with corresponding date values.
Data must be sorted so that "latest" product becomes prod1,
2nd latest product will be prod2 and 3rd is prod3.
Latest product = biggest date (int).
If dates are equal, order doesn't matter. (see id=2 and id=3).
Updated [ TABLE_1 ] should be:
id | prod1 | date1 | prod2 | date2 | prod3 | date3 |
---|--------|----------|--------|----------|--------|----------|
1 | Z | 20140103 | Y | 20140102 | X | 20140101 |
2 | data | 20141201 | Y | 20141201 | Z | 20141201 |
3 | X | 20160101 | data2 | 20150101 | data3 | 20150101 |
Ultimate goal is to get the following :
[ TABLE_3 ]
id | order1 | order2 | order3 | + Columns from [ TABLE_1 ]
---|--------------------|----------------------|------------|--------------------------
1 | 20140103:Z | 20140102:Y | 20140103:Z |
2 | 20141201:data:Y:Z | NULL | NULL |
3 | 20160101:X | 20150101:data2:data3 | NULL |
I have to admit this exceeds my knowledge and I haven't tried anything.
Should I do it with JOIN or SELECT subquery?
Should I try to make it in one SQL -clause or perhaps in 3 steps,
each prod&date -pair at the time ?
What about creating [ TABLE_3 ] ?
It has to have columns from [ TABLE_1 ].
Is it easiest to create it from [ TABLE_2 ] -data or Updated [ TABLE_1 ] ?
Any help would be highly appreciated.
Thanks in advance.
I'll post some of my own shots on comments.
After looking into it (after my comment), a stored procedure would be best, that you can call to view the data as a pivot, and do away with TABLE_1. Obviously if you need to make this dynamic, you'll need to look into dynamic pivots, it's a bit of a hack with CTEs:
CREATE PROCEDURE DBO.VIEW_AS_PIVOTED_DATA
AS
;WITH CTE AS (
SELECT ID, [DATE], 'DATE' + CAST(ROW_NUMBER() OVER(PARTITION BY ID ORDER BY [DATE] DESC) AS VARCHAR) AS [RN]
FROM TABLE_2)
, CTE2 AS (
SELECT ID, PRODUCT, 'PROD' + CAST(ROW_NUMBER() OVER(PARTITION BY ID ORDER BY [DATE] DESC) AS VARCHAR) AS [RN]
FROM TABLE_2)
, CTE3 AS (
SELECT ID, [DATE1], [DATE2], [DATE3]
FROM CTE
PIVOT(MAX([DATE]) FOR RN IN ([DATE1],[DATE2],[DATE3])) PIV)
, CTE4 AS (
SELECT ID, [PROD1], [PROD2], [PROD3]
FROM CTE2
PIVOT(MAX(PRODUCT) FOR RN IN ([PROD1],[PROD2],[PROD3])) PIV)
SELECT A.ID, [PROD1], [DATE1], [PROD2], [DATE2], [PROD3], [DATE3]
FROM CTE3 AS A
JOIN CTE4 AS B
ON A.ID=B.ID
Construction:
WITH ranked AS (
SELECT [id]
,[date]
,[product]
,row_number() over (partition by id order by date desc) rn
FROM [sistemy].[dbo].[TABLE_2]
)
SELECT id, [prod1],[date1],[prod2],[date2],[prod3],[date3]
FROM
(
SELECT id, type+cast(rn as varchar(1)) col, value
FROM ranked
CROSS APPLY
(
SELECT 'date', CAST([date] AS varchar(8))
UNION ALL
SELECT 'prod', product
) ca(type, value)
) unpivoted
PIVOT
(
max(value)
for col IN ([prod1],[date1],[prod2],[date2],[prod3],[date3])
) pivoted
You need to take a few steps to achive the aim.
Rank your products by date:
SELECT [id]
,[date]
,[product]
,row_number() over (partition by id order by date desc) rn
FROM [sistemy].[dbo].[TABLE_2]
Unpivot your date and product columns into one column. You can use UNPIVOT OR CROSS APPLY statements. I prefer CROSS APPLY
SELECT id, type+cast(rn as varchar(1)) col, value
FROM ranked
CROSS APPLY
(
SELECT 'date', CAST([date] AS varchar(8))
UNION ALL
SELECT 'prod', product
) ca(type, value)
or the same result using UNPIVOT
SELECT id, type+cast(rn as varchar(1)) col, value
FROM (
SELECT [id],
rn,
CAST([date] AS varchar(500)) date,
CAST([product] AS varchar(500)) prod
FROM ranked) t
UNPIVOT
(
value FOR type IN (date, product)
) unpvt
and at last you use PIVOTE and get a result.

Remove Almost Duplicate Rows in SQL

I have found a lot of examples online of how to remove duplicate rows in a SQL table but I cannot figure out how to remove almost duplicate rows.
Data Example
+--------+----------+--------+
| Col1 | Col2 | NumCol |
+--------+----------+--------+
| USA | Organic | 300 |
| USA | Organic | 400 |
| Canada | Referral | 120 |
| Canada | Referral | 120 |
+--------+----------+--------+
Desired Output
+--------+----------+--------+
| Col1 | Col2 | NumCol |
+--------+----------+--------+
| USA | Organic | 400 |
| Canada | Referral | 120 |
+--------+----------+--------+
In this example, if 2 rows are identical then I would like one of them to be removed. In addition, if 2 rows match based on Col1 and Col2, then I would like the row with the lesser value in NumCol to be removed.
My SQL Server Express code is:
WITH CTE AS(
SELECT [Col1]
,[Col2]
,[NumCol]
, RN = ROW_NUMBER()OVER(PARTITION BY [Col1]
,[Col2]
,[NumCol] ORDER BY [Col1])
FROM table
)
DELETE FROM CTE WHERE RN > 1
This code does a good job of deleting duplicates but it doesn't get rid of rows where only Col1 and Col2 match but not NumCol. How should I approach something like this? I'm a newbie to SQL, so any explanation in layman's terms is appreciated!
You can let the row numbers restart per (Col1, Col2) pair by changing:
RN = ROW_NUMBER()OVER(PARTITION BY [Col1]
,[Col2]
,[NumCol] ORDER BY [Col1])
To:
RN = ROW_NUMBER() OVER(
PARTITION BY Col1, Col1
ORDER BY NumCol desc)
The order by NumCol desc makes sure that the rows with the lower NumCol are removed.

How to select last occurrence of duplicating record in oracle

I am having a problem with Oracle query where the basic goal is to get the last row of every re-occurring rows, but there's a complication that you'll understand from the data:
Suppose I have a table that looks like this:
ID | COL1 | COL2 | COL3 | UPDATED_DATE
------|------|------|------|-------------
001 | a | b | c | 14/05/2013
002 | a | b | c | 16/05/2013
003 | a | b | c | 12/05/2013
You should be able to guess that since columns 1 to 3 have the same values for all 3 rows they are re-occurring data. The problem is, I want to get the latest updated row, which means row #2.
I have an existing query that works if the table is without ID column, but I still need that column, so if anybody could help me point out what I'm doing wrong, that'd be great.
select col1,
col2,
col3,
max(updated_date)
from tbl
order by col1, col2, col3;
The above query returns me row #2, which is correct, but I still need the ID.
Note: I know that I could have encapsulate the above query with another query that selects the ID column based on the 4 columns, but since I'm dealing with millions of records, the re-query will make the app very ineffective.
Try
WITH qry AS
(
SELECT ID, COL1, COL2, COL3, updated_date,
ROW_NUMBER() OVER (PARTITION BY COL1, COL2, COL3 ORDER BY updated_date DESC) rank
FROM tbl
)
SELECT ID, COL1, COL2, COL3, updated_date
FROM qry
WHERE rank = 1
or
SELECT t1.ID, t2.COL1, t2.COL2, t2.COL3, t2.updated_date
FROM tbl t1 JOIN
(
SELECT COL1, COL2, COL3, MAX(updated_date) updated_date
FROM tbl
GROUP BY COL1, COL2, COL3
) t2 ON t1.COL1 = t2.COL1
AND t1.COL2 = t2.COL2
AND t1.COL3 = t2.COL3
AND t1.updated_date = t2.updated_date
Output in both cases:
| ID | COL1 | COL2 | COL3 | UPDATED_DATE |
--------------------------------------------------------
| 2 | a | b | c | May, 16 2013 00:00:00+0000 |
Here is SQLFiddle demo for both queries.

Resources