sql server pivot string from one column to three columns - sql-server

I've been approaching a problem perhaps in the wrong way. I've researched pivot examples
http://www.codeproject.com/Tips/500811/Simple-Way-To-Use-Pivot-In-SQL-Query
How to create a pivot query in sql server without aggregate function
but they aren't the type I'm looking for.. or perhaps I'm approaching this in the wrong way, and I'm new to sql server.
I want to transform:
Student:
studid | firstname | lastname | school
-----------------------------------------
1 mike lee harvard
1 mike lee ucdavis
1 mike lee sfsu
2 peter pan chico
2 peter pan ulloa
3 peter smith ucb
Desired output: (note for school, want only 3 columns max.)
studid| firstname | lastname | school1 | school2 | school3
---------------------------------------------------------------------
1 mike lee Harvard ucdavis sfsu
2 peter pan chico ulloa
3 peter smith ucb
The tutorials I see shows the use of Sum() , count() ... but I have no idea how to pivot string values of one column and put them into three columns.

You can get the results you desire by taking max(school) for each pivot value. I'm guessing the pivot value you want is rank over school partitioned by student. This would be the query for that:
select * from
(select *, rank() over (partition by studid order by school) rank from student) r
pivot (max(school) for rank in ([1],[2],[3])) pv
note that max doesn't actually do anything. the query would return the same results if you replaced it with min. just the pivot syntax requires the use of an aggregate function here.

Related

Using PIVOT with SQL Server without Aggregate function

I'm stuck on using PIVOT in a simple example (which I give in entirety below). Full disclosure, I got this from https://www.hackerrank.com/. I picked it precisely because I want to get more familiar with PIVOT and this looked like a simple example! I've looked at numerous posts on the subject, and have been using this to crib off: https://social.msdn.microsoft.com/Forums/sqlserver/en-US/b76a4668-d0c3-4c51-8d86-117d5c181e69/pivot-without-aggregate-function?forum=transactsql but don't seem to be able to get things quite right. Here is the table:
TABLE OCCUPATIONS
Name Occupation
Samantha Doctor
Julia Actor
Maria Actor
Meera Singer
Ashley Professor
Ketty Professor
Christeen Professor
Jane Actor
Jenny Doctor
Priya Singer
The task is to have the output with columns Doctor, Professor, Singer or Actor (in that order). If you run out of data for one or more columns, put NULL. Here is the expected output (copied directly from the site).
Jenny Ashley Meera Jane
Samantha Christeen Priya Julia
NULL Ketty NULL Maria
As an aside, it appears they want the results without column headers (I'm not sure!).
Here is the latest iteration of what I have tried:
SELECT [Doctor], [Professor],[Singer], [Actor]
FROM
(SELECT [Name], [Occupation] from OCCUPATIONS) as pvtsource
PIVOT
( MAX([Name]) FOR [Occupation] IN ([Doctor], [Professor],[Singer], [Actor]) ) AS p
and it yields:
Doctor Professor Singer Actor
Samantha Ketty Priya Maria
I'm not surprised by this incorrect result. After all, I did say in my query MAX. I assume it's just picking the MAX name for each profession based on the alphabetical sort. Maria is a "bigger" actor than Julia or Jane for example if you based it on the alphabet. But when I remove the MAX, I get an error ("Incorrect syntax..."). How does one do this?
Thanks!
Bonus questions
1. Good, gentle, articles to PIVOT? I clearly haven't gotten it through my thick head. Eventually, I do want to be able to do more complicated pivots where I SUM or take MAX.
2. How to display results without column headers?
3. I'd also be interested in how to do this without PIVOT if there is a simple way.
You need to "FEED" the pivot with an X-Axis,Y-Axis and a Value. We create a row key via dense_rank()
Example
Declare #YourTable Table ([Name] varchar(50),[Occupation] varchar(50)) Insert Into #YourTable Values
('Samantha','Doctor')
,('Julia','Actor')
,('Maria','Actor')
,('Meera','Singer')
,('Ashley','Professor')
,('Ketty','Professor')
,('Christeen','Professor')
,('Jane','Actor')
,('Jenny','Doctor')
,('Priya','Singer')
Select *
from (Select *
,RN = dense_rank() over (partition by occupation order by name)
From #YourTable
) src
Pivot (max(Name) for Occupation in ([Doctor], [Professor],[Singer], [Actor]) ) pvt
Returns
RN Doctor Professor Singer Actor
1 Jenny Ashley Meera Jane
2 Samantha Christeen Priya Julia
3 NULL Ketty NULL Maria
NOTE:
If you don't want RN in your results, rather than the top SELECT *, you can specify the desired columns
SELECT [Doctor], [Professor],[Singer], [Actor]
From (...) src
Pivot (...) pvt
EDIT - Commentary
If you run the inner query
Select *
,RN = dense_rank() over (partition by occupation order by name)
From #YourTable
Order By RN
You'll get
Name Occupation RN
Jane Actor 1
Jenny Doctor 1
Ashley Professor 1
Meera Singer 1
Priya Singer 2
Christeen Professor 2
Samantha Doctor 2
Julia Actor 2
Maria Actor 3
Ketty Professor 3
RN becomes the Y-Axis, Occupation becomes the X-Axis and Name is the value.
Pivots by design are aggregates, therefore we just need a Y-Axis to perform the group by.

Assign Numbers to Groups in Calculated Column SQL Server

This is probably extremely easy however for some reason I am having difficulty in pinpointing exactly how to do this. I have a list of names and cities associated with those names. I want to assign a number in a calculated column based on 1) name and 2) city. Example code below:
Name | City | Calculated Column
John NYC 1
John NYC 1
John NYC 1
John LA 2
John LA 2
Chris NYC 1
Chris SF 2
Christ SF 2
Chris LA 3
I am assuming I need to use an over and partition function, but have not been able to properly calculate the 'Calculated Column' above. Any assistance would be greatly appreciated. Thanks so much in advance!
I think you can use dense_rank as below:
Select *, [Computed Column]= dense_rank() over(Partition by [Name] order by City)
from yourtable
CREATE TABLE TEST
(
NAME VARCHAR(10),
CITY VARCHAR(5)
)
INSERT INTO TEST
VALUES
('John','NYC'),
('John','NYC'),
('John','NYC'),
('John','LA'),
('John','LA'),
('Chris','NYC'),
('Chris','SF'),
('Christ','SF'),
('Chris','LA')
SELECT NAME,CITY,dense_rank() OVER(partition by name ORDER BY city desc) calculated column
FROM TEST
---output---
NAME CITY calculated column
Chris SF 1
Chris NYC 2
Chris LA 3
Christ SF 1
John NYC 1
John NYC 1
John NYC 1
John LA 2
John LA 2

Dynamic Sql Filter using tables

Any ideas on how this should be done with T-SQL queries?
I have two tables, Table A contain records I want to return but filter through. Table B contains the list of filters and class categories. New records are added to Table A all the time. The goal is to dynamically categorized records in Table A based on the filters listed in Table B.
Example:
Table A
Name
------------
John Doe
Mary Lamb
Peter Pan
Tom Sawyer
Suzie Lamb
Nancy Lamb
Josh Reddin
Table B:
Filter | Category
----------------------
John%Doe% | Team 1
%Lamb% | Team 2
Tom% | Team 1
Desired output:
Name | Category
John Doe | Team 1
Tom Sawyer | Team 1
Mary Lamb | Team 2
Suzie Lamb | Team 2
Nancy Lamb | Team 2
Peter Pan |
Josh Reddin |
I thought about doing the following but not sure if that's the best solution:
SELECT Filter, category from TableB (Get list of filters)
Using SQL Loop through filters returned in (1.) and find matches in Table A using LIKE.
Example:
SELECT name, Category
FROM Table A, Table B
WHERE Table A.Name Like (CURRENT filter FROM B)
Insert/append record(s) returned in (2.) into TempTable
SELECT *
FROM TempTable (this returns Names and categories as shown in the desired output)
UNION
SELECT *
FROM Table A
RIGHT OUTER JOIN TempTable on NAME
WHERE Category in null
(This returns rows with no categories found...Peter Pan and Josh Reddin)
Any ideas?
How about performance?
Thanks.
You can use combination of like and left join
select a.Name,b.Category
from tableA a left join tableB b on a.name like b.Filter

SQL Server 2008 Perform a draw between 2 tables

I have 2 tables on SQL Server 2008, each one has a single column and the same rows count number:
USERS OPERATION
Name Operation
----------- -----------
John W383
William R823
Karen X933
Peter M954
Alex S744
I need to perform every week a random draw between the 2 tables to get something like the follow and save it into a 3rd. table:
DRAW_RESULT:
Name Operation_Assigned Week_Number
----------------------------------------------
Peter M954 2
William W383 2
John S744 2
Alex X933 2
Karen R823 2
Name Operation_Assigned Week_Number
----------------------------------------------
William R823 3
Alex M954 3
Karen X933 3
John S744 3
Peter W383 3
How can I do this using T-SQL?
If I understood correctly what you're doing, something like this should work:
select name, operation from (
select
row_number() over (order by (select null)) as RN,
name
from
users
) U join (
select
row_number() over (order by newid()) as RN,
operation
from
operation
) O on U.RN = O.RN
Edit: row_number with newid() works, so removed the extra derived table.
Here's also SQL Fiddle to test this.

Detecting Correlated Columns in Data

Suppose I have the following data:
OrderNumber | CustomerName | CustomerAddress | CustomerCode
1 | Chris | 1234 Test Drive | 123
2 | Chris | 1234 Test Drive | 123
How can I detect that the columns "CustomerName", "CustomerAddress", and "CustomerCode" all correlate perfectly? I'm thinking that Sql Server data mining is probably the right tool for the job, but I don't have too much experience with that.
Thanks in advance.
UPDATE:
By "correlate", I mean in the statistics sense, that whenever column a is x, column b will be y. In the above data, The last three columns correlate with each other, and the first column does not.
The input of the operation would be the name of the table, and the output would be something like :
Column 1 | Column 2 | Certainty
CustomerName | CustomerAddress | 100%
CustomerAddress | CustomerCode | 100%
There is a 'functional dependency' test built in to the SQL Server Data Profiling component (which is an SSIS component that ships with SQL Server 2008). It is described pretty well on this blog post:
http://blogs.conchango.com/jamiethomson/archive/2008/03/03/ssis-data-profiling-task-part-7-functional-dependency.aspx
I have played a little bit with accessing the data profiler output via some (under-documented) .NET APIs and it seems doable. However, since my requirement dealt with distribution of column values, I ended up going with something much simpler based on the output of DBCC STATISTICS. I was quite impressed by what I saw of the profiler component and the output viewer.
What do you mean by correlate? Do you just want to see if they're equal? You can do that in T-SQL by joining the table to itself:
select distinct
case when a.OrderNumber < b.OrderNumber then a.OrderNumber
else b.OrderNumber
end as FirstOrderNumber,
case when a.OrderNumber < b.OrderNumber then b.OrderNumber
else a.OrderNumber
end as SecondOrderNumber
from
MyTable a
inner join MyTable b on
a.CustomerName = b.CustomerName
and a.CustomerAddress = b.CustomerAddress
and a.CustomerCode = b.CustomerCode
This would return you:
FirstOrderNumber | SecondOrderNumber
1 | 2
Correlation is defined on metric spaces, and your values are not metric.
This will give you percent of customers that don't have customerAddress uniquely defined by customerName:
SELECT AVG(perfect)
FROM (
SELECT
customerName,
CASE
WHEN COUNT(customerAddress) = COUNT(DISTINCT customerAddress)
THEN 0
ELSE 1
END AS perfect
FROM orders
GROUP BY
customerName
) q
Substitute other columns instead of customerAddress and customerName into this query to find discrepancies between them.

Resources