Getting ROW_NUMBER to repeat if field meets a condition - sql-server

I need ROW_NUMBER to assign data to a specific user if a condition is met.
ROW_NUMBER will increment normally until a duplicate value is found. When the duplicate value is found, I need it to use the same ROW_NUMBER until a new value is found.
For instance...
When using
SELECT ROW_NUMBER() OVER (ORDER BY COMPANY) AS rownum
,Company
,Contact
FROM TABLE
We can obviously expect this result
rownum Company Contact
1 BOB'S BURGERS BOB
2 STEVE'S SARDINES STEVE
3 STEVE'S SARDINES JERRY
4 STEVE'S SARDINES MARY
5 LARRY's LOBSTER LARRY
6 CHRIS' COWS CHRIS
What I'm trying to get is this. Whenever the Company name doesn't change, repeat the ROW_NUMBER and continue to increment the number when the company does change
rownum Company Contact
1 BOB'S BURGERS BOB
2 STEVE'S SARDINES STEVE
2 STEVE'S SARDINES JERRY
2 STEVE'S SARDINES MARY
3 LARRY'S LOBSTER LARRY
4 CHRIS' COWS CHRIS
I'm using this condition to see if the company matches the previous company name. It returns a 2 if the condition is true
ROW_NUMBER() OVER (PARTITION BY COMPANY ORDER BY COMPANY) AS SameCompany

You want DENSE_RANK not ROW_NUMBER. Try this:
SELECT DENSE_RANK() OVER (ORDER BY COMPANY) AS rownum
,Company
,Contact
FROM TABLE

Related

Using PIVOT with SQL Server without Aggregate function

I'm stuck on using PIVOT in a simple example (which I give in entirety below). Full disclosure, I got this from https://www.hackerrank.com/. I picked it precisely because I want to get more familiar with PIVOT and this looked like a simple example! I've looked at numerous posts on the subject, and have been using this to crib off: https://social.msdn.microsoft.com/Forums/sqlserver/en-US/b76a4668-d0c3-4c51-8d86-117d5c181e69/pivot-without-aggregate-function?forum=transactsql but don't seem to be able to get things quite right. Here is the table:
TABLE OCCUPATIONS
Name Occupation
Samantha Doctor
Julia Actor
Maria Actor
Meera Singer
Ashley Professor
Ketty Professor
Christeen Professor
Jane Actor
Jenny Doctor
Priya Singer
The task is to have the output with columns Doctor, Professor, Singer or Actor (in that order). If you run out of data for one or more columns, put NULL. Here is the expected output (copied directly from the site).
Jenny Ashley Meera Jane
Samantha Christeen Priya Julia
NULL Ketty NULL Maria
As an aside, it appears they want the results without column headers (I'm not sure!).
Here is the latest iteration of what I have tried:
SELECT [Doctor], [Professor],[Singer], [Actor]
FROM
(SELECT [Name], [Occupation] from OCCUPATIONS) as pvtsource
PIVOT
( MAX([Name]) FOR [Occupation] IN ([Doctor], [Professor],[Singer], [Actor]) ) AS p
and it yields:
Doctor Professor Singer Actor
Samantha Ketty Priya Maria
I'm not surprised by this incorrect result. After all, I did say in my query MAX. I assume it's just picking the MAX name for each profession based on the alphabetical sort. Maria is a "bigger" actor than Julia or Jane for example if you based it on the alphabet. But when I remove the MAX, I get an error ("Incorrect syntax..."). How does one do this?
Thanks!
Bonus questions
1. Good, gentle, articles to PIVOT? I clearly haven't gotten it through my thick head. Eventually, I do want to be able to do more complicated pivots where I SUM or take MAX.
2. How to display results without column headers?
3. I'd also be interested in how to do this without PIVOT if there is a simple way.
You need to "FEED" the pivot with an X-Axis,Y-Axis and a Value. We create a row key via dense_rank()
Example
Declare #YourTable Table ([Name] varchar(50),[Occupation] varchar(50)) Insert Into #YourTable Values
('Samantha','Doctor')
,('Julia','Actor')
,('Maria','Actor')
,('Meera','Singer')
,('Ashley','Professor')
,('Ketty','Professor')
,('Christeen','Professor')
,('Jane','Actor')
,('Jenny','Doctor')
,('Priya','Singer')
Select *
from (Select *
,RN = dense_rank() over (partition by occupation order by name)
From #YourTable
) src
Pivot (max(Name) for Occupation in ([Doctor], [Professor],[Singer], [Actor]) ) pvt
Returns
RN Doctor Professor Singer Actor
1 Jenny Ashley Meera Jane
2 Samantha Christeen Priya Julia
3 NULL Ketty NULL Maria
NOTE:
If you don't want RN in your results, rather than the top SELECT *, you can specify the desired columns
SELECT [Doctor], [Professor],[Singer], [Actor]
From (...) src
Pivot (...) pvt
EDIT - Commentary
If you run the inner query
Select *
,RN = dense_rank() over (partition by occupation order by name)
From #YourTable
Order By RN
You'll get
Name Occupation RN
Jane Actor 1
Jenny Doctor 1
Ashley Professor 1
Meera Singer 1
Priya Singer 2
Christeen Professor 2
Samantha Doctor 2
Julia Actor 2
Maria Actor 3
Ketty Professor 3
RN becomes the Y-Axis, Occupation becomes the X-Axis and Name is the value.
Pivots by design are aggregates, therefore we just need a Y-Axis to perform the group by.

SQL Server 2008 Perform a draw between 2 tables

I have 2 tables on SQL Server 2008, each one has a single column and the same rows count number:
USERS OPERATION
Name Operation
----------- -----------
John W383
William R823
Karen X933
Peter M954
Alex S744
I need to perform every week a random draw between the 2 tables to get something like the follow and save it into a 3rd. table:
DRAW_RESULT:
Name Operation_Assigned Week_Number
----------------------------------------------
Peter M954 2
William W383 2
John S744 2
Alex X933 2
Karen R823 2
Name Operation_Assigned Week_Number
----------------------------------------------
William R823 3
Alex M954 3
Karen X933 3
John S744 3
Peter W383 3
How can I do this using T-SQL?
If I understood correctly what you're doing, something like this should work:
select name, operation from (
select
row_number() over (order by (select null)) as RN,
name
from
users
) U join (
select
row_number() over (order by newid()) as RN,
operation
from
operation
) O on U.RN = O.RN
Edit: row_number with newid() works, so removed the extra derived table.
Here's also SQL Fiddle to test this.

Delete latest entry in SQL Server without using datetime or ID

I have a basic SQL Server delete script that goes:
Delete from tableX
where colA = ? and colB = ?;
In tableX, I do not have any columns indicating sequential IDs or timestamp; just varchar. I want to delete the latest entry that was inserted, and I do not have access to the row number from the insert script. TOP is not an option because it's random. Also, this particular table does not have a primary key, and it's not a matter of poor design. Is there any way I can do this? I recall mysql being able to call something like max(row_number) and also something along the lines of limit one.
ROW_NUMBER exists in SQL Server, too, but it must be used with an OVER (order_by_clause). So... in your case it's impossible for you unless you come up with another sorting algo.
MSDN
Edit: (Examples for George from MSDN ... I'm afraid his company has a Firewall rule that blocks MSDN)
SQL-Code
USE AdventureWorks2012;
GO
SELECT ROW_NUMBER() OVER(ORDER BY SalesYTD DESC) AS Row,
FirstName, LastName, ROUND(SalesYTD,2,1) AS "Sales YTD"
FROM Sales.vSalesPerson
WHERE TerritoryName IS NOT NULL AND SalesYTD <> 0;
Output
Row FirstName LastName SalesYTD
--- ----------- ---------------------- -----------------
1 Linda Mitchell 4251368.54
2 Jae Pak 4116871.22
3 Michael Blythe 3763178.17
4 Jillian Carson 3189418.36
5 Ranjit Varkey Chudukatil 3121616.32
6 José Saraiva 2604540.71
7 Shu Ito 2458535.61
8 Tsvi Reiter 2315185.61
9 Rachel Valdez 1827066.71
10 Tete Mensa-Annan 1576562.19
11 David Campbell 1573012.93
12 Garrett Vargas 1453719.46
13 Lynn Tsoflias 1421810.92
14 Pamela Ansman-Wolfe 1352577.13
Returning a subset of rows
USE AdventureWorks2012;
GO
WITH OrderedOrders AS
(
SELECT SalesOrderID, OrderDate,
ROW_NUMBER() OVER (ORDER BY OrderDate) AS RowNumber
FROM Sales.SalesOrderHeader
)
SELECT SalesOrderID, OrderDate, RowNumber
FROM OrderedOrders
WHERE RowNumber BETWEEN 50 AND 60;
Using ROW_NUMBER() with PARTITION
USE AdventureWorks2012;
GO
SELECT FirstName, LastName, TerritoryName, ROUND(SalesYTD,2,1),
ROW_NUMBER() OVER(PARTITION BY TerritoryName ORDER BY SalesYTD DESC) AS Row
FROM Sales.vSalesPerson
WHERE TerritoryName IS NOT NULL AND SalesYTD <> 0
ORDER BY TerritoryName;
Output
FirstName LastName TerritoryName SalesYTD Row
--------- -------------------- ------------------ ------------ ---
Lynn Tsoflias Australia 1421810.92 1
José Saraiva Canada 2604540.71 1
Garrett Vargas Canada 1453719.46 2
Jillian Carson Central 3189418.36 1
Ranjit Varkey Chudukatil France 3121616.32 1
Rachel Valdez Germany 1827066.71 1
Michael Blythe Northeast 3763178.17 1
Tete Mensa-Annan Northwest 1576562.19 1
David Campbell Northwest 1573012.93 2
Pamela Ansman-Wolfe Northwest 1352577.13 3
Tsvi Reiter Southeast 2315185.61 1
Linda Mitchell Southwest 4251368.54 1
Shu Ito Southwest 2458535.61 2
Jae Pak United Kingdom 4116871.22 1
Your current table design does not allow you to determine the latest entry. YOu have no field to sort on to indicate which record was added last.
You need to redesign or pull that information from the audit tables. If you have a database without audit tables, you might have to find a tool to read the transaction logs and it will be a very time-consuming and expensive process. Or if you know the date the records you want to remove were added, you could possibly use a backup from just before this happened to find the records that were added. Just be awwre that you might be looking at records changed after this date that you want to keep.
If you need to do this on a regular basis instead of one-time to fix some bad data, then you need to properly design your database to include an identity field and possibly a dateupdated field (maintained through a trigger) or audit tables. (In my opinion no database containing information your company is depending on should be without audit tables, one of the many reasons why you should never allow an ORM to desgn a database, but I digress.) If you need to know the order records were added to a table, it is your responsiblity as the developer to create that structure. Databases only store what is deisnged for tehm to store, if you didn't design it in, then it is not available easily or at all
If (colA +'_'+ colB) can not be dublicate try this.
declare #delColumn nvarchar(250)
set #delColumn = (select top 1 DeleteColumn from (
select (colA +'_'+ colB) as DeleteColumn ,
ROW_NUMBER() OVER(ORDER BY colA DESC) as Id from tableX
)b
order by Id desc
)
delete from tableX where (colA +'_'+ colB) =#delColumn

Multiple to Multiple, Junction Tables, Newbie on Database Structure

Please feel free to comment on this as I am new and very confused on how to structure this.
I want to create a database of people with interests. I want to record their interests and then see what people have common interests and display them.
I have 3 tables: Person, Interest, InterestType
Person is a table of people
Interest is an interest that a person can have.
InterestType is the name of the interest, say Skiing or Biking. (I separated it because I want all person to use a common typeset of interests)
My setup is as follow:
personTable: id, name, interestID
interestTable: id, interestType, personID
interestType: id, name
How do I get the list of people with the same interest?
I have made a simple model in Access, but you should be able to "translate" this to SQLite without too many problems.
Given:
PersonTable
personId Name
1 Paolo
2 Carla
3 Angelo
4 Franco
5 John
6 Lisa
InterestType
interestId Name
1 Calligraphy
2 Karate
3 Chess
4 Movies
5 Hiking
InterestTable
interestId personId
1 1
2 1
3 1
2 2
3 2
4 2
1 3
2 3
1 5
A simple query sorted by Interest Name and then by Person Name should do the trick:
SELECT interestType.Name, personTable.Name
FROM personTable INNER JOIN
(interestType INNER JOIN interestTable ON
interestType.interestId=interestTable.interestId)
ON personTable.personId=interestTable.personId
ORDER BY 1, 2;
will return:
interestType.Name personTable.Name
Calligraphy Angelo
Calligraphy John
Calligraphy Paolo
Chess Carla
Chess Paolo
Karate Angelo
Karate Carla
Karate Paolo
Movies Carla
If you want to look for a specific interest, just add a where clause:
SELECT interestType.Name, personTable.Name
FROM personTable INNER JOIN
(interestType INNER JOIN interestTable ON interestType.interestId=interestTable.interestId)
ON personTable.personId=interestTable.personId
WHERE interestType.Name="Karate"
ORDER BY 1, 2;
interestType.Name personTable.Name
Karate Angelo
Karate Carla
Karate Paolo
Try this..
SELECT * FROM personTable pt
INNER JOIN interestTable it
ON pt.id = it.id
WHERE it.interestType = "theInterestType";

T-SQL - Getting most recent date and most recent future date

Assume the table of records below
ID Name AppointmentDate
-- -------- ---------------
1 Bob 1/1/2010
1 Bob 5/1/2010
2 Henry 5/1/2010
2 Henry 8/1/2011
3 John 8/1/2011
3 John 12/1/2011
I want to retrieve the most recent appointment date by person. So I need a query that will give the following result set.
1 Bob 5/1/2010 (5/1/2010 is most recent)
2 Henry 8/1/2011 (8/1/2011 is most recent)
3 John 8/1/2011 (has 2 future dates but 8/1/2011 is most recent)
Thanks!
Assuming that where you say "most recent" you mean "closest", as in "stored date is the fewest days away from the current date and we don't care if it's before or after the current date", then this should do it (trivial debugging might be required):
SELECT ID, Name, AppointmentDate
from (select
ID
,Name
,AppointmentDate
,row_number() over (partition by ID order by abs(datediff(dd, AppointmentDate, getdate()))) Ranking
from MyTable) xx
where Ranking = 1
This usese the row_number() function from SQL 2005 and up. The subquery "orders" the data as per the specifications, and the main query picks the best fit.
Note also that:
The search is based on the current date
We're only calculating difference in days, time (hours, minutes, etc.) is ignored
If two days are equidistant (say, 2 before and 2 after), we pick one randomly
All of which could be adjusted based on your final requirements.
(Phillip beat me to the punch, and windowing functions are an excellent choice. Here's an alternative approach:)
Assuming I correctly understand your requirement as getting the date closest to the present date, whether in the past or future, consider this query:
SELECT t.Name, t.AppointmentDate
FROM
(
SELECT Name, AppointmentDate, ABS(DATEDIFF(d, GETDATE(), AppointmentDate)) AS Distance
FROM Table
) t
JOIN
(
SELECT Name, MIN(ABS(DATEDIFF(d, GETDATE(), AppointmentDate))) AS MinDistance
FROM Table
GROUP BY Name
) d ON t.Name = d.Name AND t.Distance = d.MinDistance

Resources