Snowflake Flatten scenario - snowflake-cloud-data-platform

I have some variant structure in a source table that I need to read, lookup and get into another variant structure into a target table.
-- Source table: clicks
create table clicks
(payload variant
);
-- set data in clicks
insert into clicks
select
object_construct(
'language', 'en-us',
'browser', 'chrome',
'color', '35',
'event_ids', '190,195'
)
union
object_construct(
'language', 'en-us',
'browser', 'chrome',
'color', '32',
'event_ids', '201,203,190,195'
)
union
select
object_construct(
'language', 'en-us',
'browser', 'mozilla',
'color', '38',
'event_ids', '188,190,202,203,195,176'
);
-- Target table: clicks_processed
create table clicks_processed
(payload variant,
click_events variant )
Lookup table: click_event
create table click_event
(
key varchar,
event_id varchar,
desc varchar
)
-- set up data for click_event
insert into click_event (key,event_id,desc)
select 'event201','201','pic_click'
union
select 'event202','202','vid_view'
union
select 'event203','203','download'
Requirement:
Insert records into clicks_processed table.
Payload column of clicks_processed table is the same payload from clicks table.
click_events column of clicks_processed table: event_ids key has to be pulled out of the clicks.payload and joined with lookup table "click_event" to form a variant that will be inserted to click_events column of clicks_processed
Desired output: clicks_processed (only matching event_id from click_event get populated under click_events)
Note: The first record has null for click_events because none of the event_ids match from the click_event lookup table (still need the record)
Current code I've come up with:
select max(payload),
object_agg(e.key,object_construct(
'code', e.event_id,
'desc', e.desc
)) as click_events
from clicks c, table(flatten(input => split(c.payload:event_ids,','))) as f
inner join click_event e on e.event_id = f.value
group by seq
This works fine if all the records have matching event_ids from that of click_event otherwise not (first record in the example above), a left join with click_event also didn't help.
Also, not sure if this is the best piece of code, especially worried about usage of max(payload) and group by. In real scenario, the payload is huge with about 500 keys and the event_ids can hold around 30-50 events which means the flatten also blows up as many records.
Thanks!

select any_value(payload),
object_agg(e.key,object_construct(
'code', e.event_id,
'desc', e.desc
)) as click_events
from (
select f.seq, f.value::varchar as value, c.payload
from clicks c,
lateral flatten(input => split(c.payload:event_ids,',')) as f
) a
left join click_event e
on a.value = e.event_id
group by a.seq;
Flatten the array and then join back to the click_event as a left join. This gives you an empty array as opposed to a NULL, though.
Also, leverage any_value() function over the max() function, since that value will always be the same across the seq value.

Related

How to use cross apply string split result to update a table in sql?

I am trying to split a column('categories') of a Table 'movies_titles' which has string separated data values in it.
e.g:
ID title categories
1 Movie A Comedy, Drama, Romance
2 Movie B Animation
3 Movie C Documentary, Life changing
I want to split the comma delimited string and place each values in a separate rows and update the table
-- this query shows the splitted strings as I want it
SELECT *
FROM dbo.movies_titles
CROSS APPLY
string_split(categories, ',')
O/P:
ID title categories value
1 Movie A Comedy, Drama, Romance Comedy
1 Movie A Comedy, Drama, Romance Drama
1 Movie A Comedy, Drama, Romance Romance
2 Movie B Animation Animation
3 Movie C Documentary, Life changing Documentary
3 Movie C Documentary, Life changing Life changing
I want to use UPDATE query to set the result obtained from value column. I just don't want to use SELECT query to view the result but permanently update the changes to the table. How do I achieve this in sql server?
You can do something similar to your intention creating new rows, because the update statement won't create the additional rows made by the split.
There can be issues if the ID column is unique, like a primary key, and there is the need to keep the title associated with that column.
I've created two scenarios on DB Fiddle, showing how you can do this using only one table as the question instructed, but a better alternative would be to save this information on another table.
This code on DB Fiddle: link
--Assuming your table is something like this
create table movies_id_as_pk (
ID int identity(1,1) primary key,
title varchar(200),
categories varchar(200),
category varchar(200)
)
--Or this
create table movies_other_pk (
another_id int identity(1,1) primary key,
ID int,
title varchar(200),
categories varchar(200),
category varchar(200)
)
--The example data
set identity_insert movies_id_as_pk on
insert into movies_id_as_pk (ID, title, categories) values
(1, 'Movie A', 'Comedy, Drama, Romance'),
(2, 'Movie B', 'Animation'),
(3, 'Movie C', 'Documentary, Life changing')
set identity_insert movies_id_as_pk off
insert into movies_other_pk (ID, title, categories)
select ID, title, categories from movies_id_as_pk
--You can't update directly any of the tables, because as the result of the split
--have more rows than the table, it would just leave the first value found:
update m set category = rtrim(ltrim(s.value))
from movies_id_as_pk m
cross apply string_split(m.categories, ',') as s
update m set category = rtrim(ltrim(s.value))
from movies_other_pk m
cross apply string_split(m.categories, ',') as s
select * from movies_id_as_pk
select * from movies_other_pk
--What you can do is create the aditional rows, inserting them:
--First, let's undo what the last instructions have changed
update movies_id_as_pk set category=NULL
update movies_other_pk set category=NULL
--Then use inserts to create the rows with the categories split
insert into movies_id_as_pk (title, category)
select m.title, rtrim(ltrim(s.value))
from movies_id_as_pk m
cross apply string_split(m.categories, ',') as s
insert into movies_other_pk (ID, title, category)
select m.ID, m.title, rtrim(ltrim(s.value))
from movies_other_pk m
cross apply string_split(m.categories, ',') as s
select * from movies_id_as_pk
select * from movies_other_pk
It actually is possible to insert or update at the same time. That is to say: we can update each row with a single category, then create new rows for the extra ones.
We can use MERGE for this. We can use the same table as source and target. We just need to split the source, then add a row-number partitioned per each original row. We then filter the ON clause to match only the first row.
WITH Source AS (
SELECT
m.ID,
m.title,
category = TRIM(cat.value),
rn = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY (SELECT NULL))
FROM movies m
CROSS APPLY STRING_SPLIT(m.categories, ',') cat
)
MERGE movies t
USING Source s
ON s.ID = t.ID AND s.rn = 1
WHEN MATCHED THEN
UPDATE
SET categories = s.category
WHEN NOT MATCHED THEN
INSERT (ID, title, categories)
VALUES (s.ID, s.title, s.category)
;
db<>fiddle
I wouldn't necessarily recommend this as a general solution though, because it appears you actually have other normalization problems to sort out first. You should really have separate tables for all this information:
Movie
Category
MovieCategory

How to write the COLUMNS to ROWS in SQL server [duplicate]

Looking for elegant (or any) solution to convert columns to rows.
Here is an example: I have a table with the following schema:
[ID] [EntityID] [Indicator1] [Indicator2] [Indicator3] ... [Indicator150]
Here is what I want to get as the result:
[ID] [EntityId] [IndicatorName] [IndicatorValue]
And the result values will be:
1 1 'Indicator1' 'Value of Indicator 1 for entity 1'
2 1 'Indicator2' 'Value of Indicator 2 for entity 1'
3 1 'Indicator3' 'Value of Indicator 3 for entity 1'
4 2 'Indicator1' 'Value of Indicator 1 for entity 2'
And so on..
Does this make sense? Do you have any suggestions on where to look and how to get it done in T-SQL?
You can use the UNPIVOT function to convert the columns into rows:
select id, entityId,
indicatorname,
indicatorvalue
from yourtable
unpivot
(
indicatorvalue
for indicatorname in (Indicator1, Indicator2, Indicator3)
) unpiv;
Note, the datatypes of the columns you are unpivoting must be the same so you might have to convert the datatypes prior to applying the unpivot.
You could also use CROSS APPLY with UNION ALL to convert the columns:
select id, entityid,
indicatorname,
indicatorvalue
from yourtable
cross apply
(
select 'Indicator1', Indicator1 union all
select 'Indicator2', Indicator2 union all
select 'Indicator3', Indicator3 union all
select 'Indicator4', Indicator4
) c (indicatorname, indicatorvalue);
Depending on your version of SQL Server you could even use CROSS APPLY with the VALUES clause:
select id, entityid,
indicatorname,
indicatorvalue
from yourtable
cross apply
(
values
('Indicator1', Indicator1),
('Indicator2', Indicator2),
('Indicator3', Indicator3),
('Indicator4', Indicator4)
) c (indicatorname, indicatorvalue);
Finally, if you have 150 columns to unpivot and you don't want to hard-code the entire query, then you could generate the sql statement using dynamic SQL:
DECLARE #colsUnpivot AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX)
select #colsUnpivot
= stuff((select ','+quotename(C.column_name)
from information_schema.columns as C
where C.table_name = 'yourtable' and
C.column_name like 'Indicator%'
for xml path('')), 1, 1, '')
set #query
= 'select id, entityId,
indicatorname,
indicatorvalue
from yourtable
unpivot
(
indicatorvalue
for indicatorname in ('+ #colsunpivot +')
) u'
exec sp_executesql #query;
well If you have 150 columns then I think that UNPIVOT is not an option. So you could use xml trick
;with CTE1 as (
select ID, EntityID, (select t.* for xml raw('row'), type) as Data
from temp1 as t
), CTE2 as (
select
C.id, C.EntityID,
F.C.value('local-name(.)', 'nvarchar(128)') as IndicatorName,
F.C.value('.', 'nvarchar(max)') as IndicatorValue
from CTE1 as c
outer apply c.Data.nodes('row/#*') as F(C)
)
select * from CTE2 where IndicatorName like 'Indicator%'
sql fiddle demo
You could also write dynamic SQL, but I like xml more - for dynamic SQL you have to have permissions to select data directly from table and that's not always an option.
UPDATEAs there a big flame in comments, I think I'll add some pros and cons of xml/dynamic SQL. I'll try to be as objective as I could and not mention elegantness and uglyness. If you got any other pros and cons, edit the answer or write in comments
cons
it's not as fast as dynamic SQL, rough tests gave me that xml is about 2.5 times slower that dynamic (it was one query on ~250000 rows table, so this estimate is no way exact). You could compare it yourself if you want, here's sqlfiddle example, on 100000 rows it was 29s (xml) vs 14s (dynamic);
may be it could be harder to understand for people not familiar with xpath;
pros
it's the same scope as your other queries, and that could be very handy. A few examples come to mind
you could query inserted and deleted tables inside your trigger (not possible with dynamic at all);
user don't have to have permissions on direct select from table. What I mean is if you have stored procedures layer and user have permissions to run sp, but don't have permissions to query tables directly, you still could use this query inside stored procedure;
you could query table variable you have populated in your scope (to pass it inside the dynamic SQL you have to either make it temporary table instead or create type and pass it as a parameter into dynamic SQL;
you can do this query inside the function (scalar or table-valued). It's not possible to use dynamic SQL inside the functions;
Just to help new readers, I've created an example to better understand #bluefeet's answer about UNPIVOT.
SELECT id
,entityId
,indicatorname
,indicatorvalue
FROM (VALUES
(1, 1, 'Value of Indicator 1 for entity 1', 'Value of Indicator 2 for entity 1', 'Value of Indicator 3 for entity 1'),
(2, 1, 'Value of Indicator 1 for entity 2', 'Value of Indicator 2 for entity 2', 'Value of Indicator 3 for entity 2'),
(3, 1, 'Value of Indicator 1 for entity 3', 'Value of Indicator 2 for entity 3', 'Value of Indicator 3 for entity 3'),
(4, 2, 'Value of Indicator 1 for entity 4', 'Value of Indicator 2 for entity 4', 'Value of Indicator 3 for entity 4')
) AS Category(ID, EntityId, Indicator1, Indicator2, Indicator3)
UNPIVOT
(
indicatorvalue
FOR indicatorname IN (Indicator1, Indicator2, Indicator3)
) UNPIV;
Just because I did not see it mentioned.
If 2016+, here is yet another option to dynamically unpivot data without actually using Dynamic SQL.
Example
Declare #YourTable Table ([ID] varchar(50),[Col1] varchar(50),[Col2] varchar(50))
Insert Into #YourTable Values
(1,'A','B')
,(2,'R','C')
,(3,'X','D')
Select A.[ID]
,Item = B.[Key]
,Value = B.[Value]
From #YourTable A
Cross Apply ( Select *
From OpenJson((Select A.* For JSON Path,Without_Array_Wrapper ))
Where [Key] not in ('ID','Other','Columns','ToExclude')
) B
Returns
ID Item Value
1 Col1 A
1 Col2 B
2 Col1 R
2 Col2 C
3 Col1 X
3 Col2 D
I needed a solution to convert columns to rows in Microsoft SQL Server, without knowing the colum names (used in trigger) and without dynamic sql (dynamic sql is too slow for use in a trigger).
I finally found this solution, which works fine:
SELECT
insRowTbl.PK,
insRowTbl.Username,
attr.insRow.value('local-name(.)', 'nvarchar(128)') as FieldName,
attr.insRow.value('.', 'nvarchar(max)') as FieldValue
FROM ( Select
i.ID as PK,
i.LastModifiedBy as Username,
convert(xml, (select i.* for xml raw)) as insRowCol
FROM inserted as i
) as insRowTbl
CROSS APPLY insRowTbl.insRowCol.nodes('/row/#*') as attr(insRow)
As you can see, I convert the row into XML (Subquery select i,* for xml raw, this converts all columns into one xml column)
Then I CROSS APPLY a function to each XML attribute of this column, so that I get one row per attribute.
Overall, this converts columns into rows, without knowing the column names and without using dynamic sql. It is fast enough for my purpose.
(Edit: I just saw Roman Pekar answer above, who is doing the same.
I used the dynamic sql trigger with cursors first, which was 10 to 100 times slower than this solution, but maybe it was caused by the cursor, not by the dynamic sql. Anyway, this solution is very simple an universal, so its definitively an option).
I am leaving this comment at this place, because I want to reference this explanation in my post about the full audit trigger, that you can find here: https://stackoverflow.com/a/43800286/4160788
DECLARE #TableName varchar(max)=NULL
SELECT #TableName=COALESCE(#TableName+',','')+t.TABLE_CATALOG+'.'+ t.TABLE_SCHEMA+'.'+o.Name
FROM sysindexes AS i
INNER JOIN sysobjects AS o ON i.id = o.id
INNER JOIN INFORMATION_SCHEMA.TABLES T ON T.TABLE_NAME=o.name
WHERE i.indid < 2
AND OBJECTPROPERTY(o.id,'IsMSShipped') = 0
AND i.rowcnt >350
AND o.xtype !='TF'
ORDER BY o.name ASC
print #tablename
You can get list of tables which has rowcounts >350 . You can see at the solution list of table as row.
The opposite of this is to flatten a column into a csv eg
SELECT STRING_AGG ([value],',') FROM STRING_SPLIT('Akio,Hiraku,Kazuo', ',')

How to delete Duplicate records in snowflake database table

how to delete the duplicate records from snowflake table. Thanks
ID Name
1 Apple
1 Apple
2 Apple
3 Orange
3 Orange
Result should be:
ID Name
1 Apple
2 Apple
3 Orange
Adding here a solution that doesn't recreate the table. This because recreating a table can break a lot of existing configurations and history.
Instead we are going to delete only the duplicate rows and insert a single copy of each, within a transaction:
-- find all duplicates
create or replace transient table duplicate_holder as (
select $1, $2, $3
from some_table
group by 1,2,3
having count(*)>1
);
-- time to use a transaction to insert and delete
begin transaction;
-- delete duplicates
delete from some_table a
using duplicate_holder b
where (a.$1,a.$2,a.$3)=(b.$1,b.$2,b.$3);
-- insert single copy
insert into some_table
select *
from duplicate_holder;
-- we are done
commit;
Advantages:
Doesn't recreate the table
Doesn't modify the original table
Only deletes and inserts duplicated rows (good for time travel storage costs, avoids unnecessary reclustering)
All in a transaction
If you have some primary key as such:
CREATE TABLE fruit (key number, id number, name text);
insert into fruit values (1,1, 'Apple'), (2,1,'Apple'),
(3,2, 'Apple'), (4,3, 'Orange'), (5,3, 'Orange');
as then
DELETE FROM fruit
WHERE key in (
SELECT key
FROM (
SELECT key
,ROW_NUMBER() OVER (PARTITION BY id, name ORDER BY key) AS rn
FROM fruit
)
WHERE rn > 1
);
But if you do not have a unique key then you cannot delete that way. At which point a
CREATE TABLE new_table_name AS
SELECT id, name FROM (
SELECT id
,name
,ROW_NUMBER() OVER (PARTITION BY id, name) AS rn
FROM table_name
)
WHERE rn > 1
and then swap them
ALTER TABLE table_name SWAP WITH new_table_name
Here's a very simple approach that doesn't need any temporary tables. It will work very nicely for small tables, but might not be the best approach for large tables.
insert overwrite into some_table
select distinct * from some_table
;
The OVERWRITE keyword means that the table will be truncated before the insert takes place.
Snowflake does not have effective primary keys, their use is primarily with ERD tools.
Snowflake does not have something like a ROWID either, so there is no way to identify duplicates for deletion.
It is possible to temporarily add a "is_duplicate" column, eg. numbering all the duplicates with the ROW_NUMBER() function, and then delete all records with "is_duplicate" > 1 and finally delete the utility column.
Another way is to create a duplicate table and swap, as others have suggested.
However, constraints and grants must be kept. One way to do this is:
CREATE TABLE new_table LIKE old_table COPY GRANTS;
INSERT INTO new_table SELECT DISTINCT * FROM old_table;
ALTER TABLE old_table SWAP WITH new_table;
The code above removes exact duplicates. If you want to end up with a row for each "PK" you need to include logic to select which copy you want to keep.
This illustrates the importance to add update timestamp columns in a Snowflake Data Warehouse.
this has been bothering me for some time as well. As snowflake has added support for qualify you can now create a dedupped table with a single statement without subselects:
CREATE TABLE fruit (id number, nam text);
insert into fruit values (1, 'Apple'), (1,'Apple'),
(2, 'Apple'), (3, 'Orange'), (3, 'Orange');
CREATE OR REPLACE TABLE fruit AS
SELECT * FROM
fruit
qualify row_number() OVER (PARTITION BY id, nam ORDER BY id, nam) = 1;
SELECT * FROM fruit;
Of course you are left with a new table and loose table history, primary keys, foreign keys and such.
Based on above ideas.....following query worked perfectly in my case.
CREATE OR REPLACE TABLE SCHEMA.table
AS
SELECT
DISTINCT *
FROM
SCHEMA.table
;
Your question boils down to: How can I delete one of two perfectly identical rows? . You can't. You can only do a DELETE FROM fruit where ID = 1 and Name = 'Apple';, then both rows will go away. Or you don't, and keep both.
For some databases, there are workarounds using internal rows, but there isn't any in snowflake, see https://support.snowflake.net/s/question/0D50Z00008FQyGqSAL/is-there-an-internalmetadata-unique-rowid-in-snowflake-that-i-can-reference . You cannot limit deletes, either, so your only option is to create a new table and swap.
Additional Note on Hans Henrik Eriksen's remark on the importance of update timestamps: This is a real help when the duplicates where added later. If, for example, you want to keep the newer values, you can then do this:
-- setup
create table fruit (ID Integer, Name VARCHAR(16777216), "UPDATED_AT" TIMESTAMP_NTZ);
insert into fruit values (1, 'Apple', CURRENT_TIMESTAMP::timestamp_ntz)
, (2, 'Apple', CURRENT_TIMESTAMP::timestamp_ntz)
, (3, 'Orange', CURRENT_TIMESTAMP::timestamp_ntz);
-- wait > 1 nanosecond
insert into fruit values (1, 'Apple', CURRENT_TIMESTAMP::timestamp_ntz)
, (3, 'Orange', CURRENT_TIMESTAMP::timestamp_ntz);
-- delete older duplicates (DESC)
DELETE FROM fruit
WHERE (ID
, UPDATED_AT) IN (
SELECT ID
, UPDATED_AT
FROM (
SELECT ID
, UPDATED_AT
, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY UPDATED_AT DESC) AS rn
FROM fruit
)
WHERE rn > 1
);
simple UNION eliminate duplicates on use case of just all columns/no pks.
anyway problem should he solved as early on ingestion pipeline, and/or use scd etc.
Just a raw magic best way how to delete is wrong in principle, use scd with high resolution timestamp, solves any problem.
you want fix massive dups load ? then add column like batch id and remove all batch loaded records
Its like being healthy, you have 2 approaches:
eat a lot > get far > go-to a gym to burn it
eat well > have healthy life style and no need for gym.
So before discussing best gym, try change life style.
hope this helps, learn to do pressure upstream on data producers instead of living like jesus christ trying to clean up the mess of everyone.
The following solution is effective if you are looking at one or few columns as primary key references for the table.
-- Create a temp table to hold our duplicates (only second occurrence)
CREATE OR REPLACE TRANSIENT TABLE temp_table AS (
SELECT [col1], [col2], .. [coln]
FROM (
SELECT *, ROW_NUMBER () OVER(
PARTITION BY [pk]1, [pk]2, .. [pk]m
ORDER BY [pk]1, [pk]2, .. [pk]m) AS duplicate_count
FROM [schema].[table]
) WHERE duplicate_count = 2
);
-- Delete all the duplicate records from the table
DELETE FROM [schema].[table] t1
USING temp_table t2
WHERE
t1.[pk]1 = t2.[pk]1 AND
t1.[pk]2 = t2.[pk]2 AND
..
t1.[pk]n = t2.[pk]m;
-- Insert single copy using the temp_table in the original table
INSERT INTO [schema].[table]
SELECT *
FROM temp_table;
This is inspired by #Felipe Hoffa's answer:
##create table with dupes and take the max id
create or replace transient table duplicate_holder as (
select max(S.ID) ID, some_field, count(some_field) numberAssets
from some_table S
group by some_field
having count(some_field)>1
)
##join back to the original table on the field excluding the ID in the duplicate table and delete.
delete from some_table as t
USING duplicate_holder as d
WHERE t.some_field=d.some_field
and t.id <> d.id
Not sure if people are still interested in this but I've used the below query which is more elegant and seems to have worked
create or replace table {{your_table}} as
select * from {{your_table}}
qualify row_number() over (partition by {{criteria_columns}} order by 1) = 1

Naming a group_concat column in a select

sqlite3
I have two tables. One contains some lists and the other contains each list's items.
I want to create a select statement that grabs the rows in the lists table, but also creates a column which is a comma-delimited summary of the items in each list.
I have this working as follows:
select
master._id as _id,
master.name as name,
master.created_on as created_on,
group_concat(items.name, ', ')
from
tablea master
join
tableb items
on
master._id = items.master_id
group by
master._id
However, I would like to name the column returned by the group_concat as "summary" like so:
select
master._id as _id,
master.name as name,
master.created_on as created_on,
group_concat(items.name, ', ') as summary
from
tablea master
join
tableb items
on
master._id = items.master_id
group by
master._id
When I do this, I get an sql error:
"SQL error: near "summary": syntax error
How can I achieve what I'm wanting to do?
I would also like to order the items in the group_concat alphabetically descending, but naming the column is my first priority.
"AS" is optional. However, both with and without "AS" works fine for me (using SQLite version 3.6.14.2):
drop table tablea;
drop table tableb;
create table tablea(_id int, name varchar, created_on varchar);
create table tableb(master_id int, name varchar);
insert into tablea values(0, 'Hello', '2010');
insert into tableb values(0, 'x');
select
master._id as _id,
master.name as name,
master.created_on as created_on,
group_concat(items.name, ', ') as summary
from
tablea master
join
tableb items
on
master._id = items.master_id
group by
master._id

SQL Server: Examples of PIVOTing String data

Trying to find some simple SQL Server PIVOT examples. Most of the examples that I have found involve counting or summing up numbers. I just want to pivot some string data. For example, I have a query returning the following.
Action1 VIEW
Action1 EDIT
Action2 VIEW
Action3 VIEW
Action3 EDIT
I would like to use PIVOT (if even possible) to make the results like so:
Action1 VIEW EDIT
Action2 VIEW NULL
Action3 VIEW EDIT
Is this even possible with the PIVOT functionality?
Remember that the MAX aggregate function will work on text as well as numbers. This query will only require the table to be scanned once.
SELECT Action,
MAX( CASE data WHEN 'View' THEN data ELSE '' END ) ViewCol,
MAX( CASE data WHEN 'Edit' THEN data ELSE '' END ) EditCol
FROM t
GROUP BY Action
Table setup:
CREATE TABLE dbo.tbl (
action VARCHAR(20) NOT NULL,
view_edit VARCHAR(20) NOT NULL
);
INSERT INTO dbo.tbl (action, view_edit)
VALUES ('Action1', 'VIEW'),
('Action1', 'EDIT'),
('Action2', 'VIEW'),
('Action3', 'VIEW'),
('Action3', 'EDIT');
Your table:
SELECT action, view_edit FROM dbo.tbl
Query without using PIVOT:
SELECT Action,
[View] = (Select view_edit FROM tbl WHERE t.action = action and view_edit = 'VIEW'),
[Edit] = (Select view_edit FROM tbl WHERE t.action = action and view_edit = 'EDIT')
FROM tbl t
GROUP BY Action
Query using PIVOT:
SELECT [Action], [View], [Edit] FROM
(SELECT [Action], view_edit FROM tbl) AS t1
PIVOT (MAX(view_edit) FOR view_edit IN ([View], [Edit]) ) AS t2
Both queries result:
If you specifically want to use the SQL Server PIVOT function, then this should work, assuming your two original columns are called act and cmd. (Not that pretty to look at though.)
SELECT act AS 'Action', [View] as 'View', [Edit] as 'Edit'
FROM (
SELECT act, cmd FROM data
) AS src
PIVOT (
MAX(cmd) FOR cmd IN ([View], [Edit])
) AS pvt
From http://blog.sqlauthority.com/2008/06/07/sql-server-pivot-and-unpivot-table-examples/:
SELECT CUST, PRODUCT, QTY
FROM Product) up
PIVOT
( SUM(QTY) FOR PRODUCT IN (VEG, SODA, MILK, BEER, CHIPS)) AS pvt) p
UNPIVOT
(QTY FOR PRODUCT IN (VEG, SODA, MILK, BEER, CHIPS)
) AS Unpvt
GO
Well, for your sample and any with a limited number of unique columns, this should do it.
select
distinct a,
(select distinct t2.b from t t2 where t1.a=t2.a and t2.b='VIEW'),
(select distinct t2.b from t t2 where t1.a=t2.a and t2.b='EDIT')
from t t1
With pivot_data as
(
select
action, -- grouping column
view_edit -- spreading column
from tbl
)
select action, [view], [edit]
from pivot_data
pivot ( max(view_edit) for view_edit in ([view], [edit]) ) as p;
I had a situation where I was parsing strings and the first two positions of the string in question would be the field names of a healthcare claims coding standard. So I would strip out the strings and get values for F4, UR and UQ or whatnot. This was great on one record or a few records for one user. But when I wanted to see hundreds of records and the values for all usersz it needed to be a PIVOT. This was wonderful especially for exporting lots of records to excel. The specific reporting request I had received was "every time someone submitted a claim for Benadryl, what value did they submit in fields F4, UR, and UQ. I had an OUTER APPLY that created the ColTitle and the value fields below
PIVOT(
min(value)
FOR ColTitle in([F4], [UR], [UQ])
)

Resources