If I have a table of data like this
tableid author book pubdate
1 1 The Hobbit 1923
2 1 Fellowship 1925
3 2 Foundation Trilogy 1947
4 2 I Robot 1942
5 3 Frankenstein 1889
6 3 Frankenstein 2 1894
Is there a query that would get me the following without having to use a temp table, table variable or cte?
tableid author book pubdate
1 1 The Hobbit 1923
4 2 I Robot 1942
5 3 Frankenstein 1889
So I want min(ranking) grouping by person and ending up with book for that min(ranking) value.
OK, the data I gave initially was flawed. Instead of a ranking column I'll have a date column. I need the book published earliest by author.
Missed that a CTE was not valid (but not sure why). How about as a subquery?
SELECT tableid, author, book, pubdate
FROM
(
SELECT
tableid, author, book, pubdate,
rn = ROW_NUMBER() OVER
(
PARTITION BY author
ORDER BY pubdate
)
FROM dbo.src -- replace this with the real table name
) AS x
WHERE rn = 1
ORDER BY tableid;
Original:
;WITH x AS
(
SELECT
tableid, author, book, pubdate,
rn = ROW_NUMBER() OVER
(
PARTITION BY author
ORDER BY pubdate
)
FROM dbo.src -- replace this with the real table name
)
SELECT tableid, author, book, pubdate
FROM x
WHERE rn = 1
ORDER BY tableid;
If you want to return multiple rows when there is a tie for earliest book, use RANK() in place of ROW_NUMBER(). In the case of a tie and you only want to return one row, you need to add additional tie breaker columns to the ORDER BY within OVER().
select * from table where ranking = 1
EDIT
Are you looking for this query to work in situations where there is no value of rank=1 for a given table and person? in that case, try this:
select *, RANK() OVER (Partition By talbeid, personid order by rank asc) as sqlrank
from table
where sqlrank = 1
EDIT OF MY EDIT:
This will work for the earliest pub date:
select *, RANK() OVER (Partition By author order by pubdate asc) as sqlrank
from table
where sqlrank = 1
SELECT tableid,author,book,pubdate FROM my_table as my_table1 WHERE pubdate =
(SELECT MIN(pubdate) FROM my_table as my_table2 WHERE my_table1.author = my_table2.author);
WITH min_table as
(
SELECT author, min(pubdate) as min_pubdate
FROM table
GROUP BY author
)
SELECT t.tableid, t.author, t.book, t.pubdate
FROM table t INNER JOIN min_table mt on t.author = mt.author and t.pub_date = mt.min_pubdate
Your sample data may be a overly simplistic. You talk about 'min(ranking)', but for all your examples, the minimum ranking for each personid is 1. So the answers you have received so far short-circuit the issue and simple select for ranking = 1. You don't state it in your "requirements", but it sounds like the minimum rank value for any particular personid may not necessarily be 1, correct? Also, you don't mention if a person can rank two or more books with the same minimum rank, so answers will be incomplete due to this missing requirement.
If my psychic abilities are accurate, then you might want to try something like this (untested obviously):
SELECT tableid, personid, book, ranking
FROM UnknownTable UNKTBL INNER JOIN
(SELECT personid, min(ranking) as ranking
FROM UnknownTable GROUP BY personid) MINRANK
ON UNKTBL.personid = MINRANK.personid AND UNKTBL.ranking = MINRANK.ranking
This will return all the rows for each person where the ranking value is the minimum value for that person. So if the minimum ranking for person 6 is 2, and there are two books for that person with that ranking, then both book rows will be returned.
If these are not, in fact your requirements, then please edit your question with more details/example data. Thanks!
Edit
Based on your change in requirements/example data, the SQL above should still work, if you change the column names appropriately. You still don't mention if an author can have two books in the same year (i.e. a prolific author such as Stephen King), so the SQL I have here will give multiple rows if the same author publishes two books in the same year, and that year is the earliest year of publication for that author.
SELECT * FROM my_table WHERE ranking = 1
ZING!
Seriously though I don't follow your question - can you provide a more elaborate or complicated example? I think I'm missing something obvious.
Related
I have 3 tables as follows:
Product: (product_id, product_description)
Seller: (seller_id, seller_name)
Association: (seller_id, product_id, price)
Many sellers sell many products. I need to find the two cheapest prices for each product (ordered by increasing price) and their corresponding vendors. The ideal column outputs are:
product_id, product_description, seller_id, seller_name, price
p01, milk, s04, walmart, 1.50
p01, milk, s02, target, 2.25
p02, rice, s05, safeway, 1.30
p02, rice, s03, dillons, 1.75
Here's what I've tried on SQL-server; it's an intermediate step towards the answer. I'm triggering an error but don't understand why:
SELECT TOP 2 *
FROM
(SELECT A.seller_id, A.product_id, min(price) AS A.price
FROM Association A
GROUP BY A.seller_ID, A.product_id)
ORDER BY A.price ASC
And the error:
Msg 102, Level 15, State 1, Line 8
Incorrect syntax near '.'.
Edit: I used the solution proposed by Benjamin; it's near correct. Here's the query output:
seller_id, product_id, price, m
1 1 7.89 1
3 1 8.00 1
6 1 8.50 1
1 2 12.05 1
6 2 12.50 1
1 3 13.67 1
6 3 15.00 1
1 4 7.66 1
3 4 7.50 1
6 4 8.24 1
Of note, some product_id values, such as 1 and 4, occurred 3 times, where I only need the two lowest prices, not the third (or higher.) So I believe that this code is ordering by price, but not removing entries with a price higher than the second lowest.
Its easier to do it with a CTE:
with min1 as (
SELECT A.seller_id, A.product_id, A.price,
row_number() over (partition by A.seller_id, A.product_id order by a.price asc) as rn
FROM Association A
)
select * from min1
where rn <3
order by seller_id, product_id, price;
To answer your question, you have a derived table in your query (which some might call a subquery). It must have an alias - so give it one.
In the query used to form the derived table, you have used incorrect syntax. You should not attempt to give the aggregated column the name of a.price (and you SHOULD be consistent with your names and their spelling - one day this inconsistency will bite you). Why? Well, first it is the source of your error. If you want the column to be named "a.price", then you need to delimit it since it violates the rules for regular identifiers. But don't - the period (or dot) has a specific meaning / usage and using it in the column name is very, VERY misleading. So just give it an alias without the period.
... (select A.seller_id, A.product_id, min(A.price) as minprice
from ... ) as MinAssoc
As you can see in this snippet, I gave the derived table the alias "MinAssoc" - which is the first thing I mentioned. If you leave it out, you will encounter an error if you just fix the column alias problem.
Next, stop using single letter aliases. That is just lazy. Sure, this is a short example and it is easy to see what your code does NOW. But you are building and reinforcing habits that will not serve you well and it reflects poorly on your work when others see it and need to decipher more complicated queries (because a single letter doesn't provide any help to understanding the "thing" a row represents).
These will fix your errors but you will need to use a different approach to your goal - as already suggested.
You can archive it using ROW_NUMBER () OVER clause either in SUB-QUERY or CTE, following is the sub-query example:
and the error due to AS A.price which supposed to be AS price in your example.
SELECT *
FROM
(SELECT A.seller_id,
A.product_id,
price,
ROW_NUMBER() OVER (PARTITION BY A.seller_id, A.product_id order by a.price) as RN
FROM Association A
) as T
Where T.RN <=2
ORDER BY price ASC
I'm trying to understand the behavior of
select ..... ,MIN(count(*)) over (partition by hotelid)
VS
select ..... ,count(*) over (partition by hotelid)
Ok.
I have a list of hotels (1,2,3)
Each hotel has departments.
On each departments there are workers.
My Data looks like this :
select * from data
Ok. Looking at this query :
select hotelid,departmentid , cnt= count(*) over (partition by hotelid)
from data
group by hotelid, departmentid
ORDER BY hotelid
I can perfectly understand what's going on here. On that result set, partitioning by hotelId , we are counting visible rows.
But look what happens with this query :
select hotelid,departmentid , min_cnt = min(count(*)) over (partition by hotelid)
from data
group by hotelid, departmentid
ORDER BY hotelid
Question:
Where are those numbers came from? I don't understand how adding min caused that result? min of what?
Can someone please explain how's the calculation being made?
fiddle
The 2 statements are very different. The first query is counting the rows after the grouping and then application the PARTITION. So, for example, with hotel 1 there is 1 row returned (as all rows for Hotel 1 have the same department A as well) and so the COUNT(*) OVER (PARTITION BY hotelid) returns 1. Hotel 2, however, has 2 departments 'B' and 'C', and so hence returns 2.
For your second query, you firstly have the COUNT(*), which is not within the OVER clause. That means it counts all the rows within the GROUP BY specified in your query: GROUP BY hotelid, departmentid. For Hotel 1, there are 4 rows for department A, hence 4. Then you take the minimum of 4; which is unsurprisingly 4. For all the other hotels, they have at least 1 entry with only 1 row for a hotel and department and so returns 1.
I have an exercice that I've been trying to complete for hours, but still no luck.
Question:
List the name, address e number of borrowed book's for all the clients that have more than 5 books loaned.
The schematic is on the picture below and so far I have done this query, that is missing the number of books loaned
select
Name, Address
from
borrower
where
cardno in (select cardno
from book_loans
group by cardno
having count(cardno) in (select emprestimo.emprestimo
from
(select COUNT(cardno) as emprestimo
from book_loans
group by cardno
having COUNT(cardno) >= 5) emprestimo));
Since you are learning, I'll give you pseudocode. You know about the having clause - that forms the basis for finding the clients. You apply that set of IDs (CardNo) by joining to borrower. Something like:
select <columns>
from (query to find cardnos > 5 books>) as cards
inner join Borrower as brw on ...
order by ...;
And learn the next lesson well - a resultset is not guaranteed to be ordered without an order by clause.
I have a basic SQL Server delete script that goes:
Delete from tableX
where colA = ? and colB = ?;
In tableX, I do not have any columns indicating sequential IDs or timestamp; just varchar. I want to delete the latest entry that was inserted, and I do not have access to the row number from the insert script. TOP is not an option because it's random. Also, this particular table does not have a primary key, and it's not a matter of poor design. Is there any way I can do this? I recall mysql being able to call something like max(row_number) and also something along the lines of limit one.
ROW_NUMBER exists in SQL Server, too, but it must be used with an OVER (order_by_clause). So... in your case it's impossible for you unless you come up with another sorting algo.
MSDN
Edit: (Examples for George from MSDN ... I'm afraid his company has a Firewall rule that blocks MSDN)
SQL-Code
USE AdventureWorks2012;
GO
SELECT ROW_NUMBER() OVER(ORDER BY SalesYTD DESC) AS Row,
FirstName, LastName, ROUND(SalesYTD,2,1) AS "Sales YTD"
FROM Sales.vSalesPerson
WHERE TerritoryName IS NOT NULL AND SalesYTD <> 0;
Output
Row FirstName LastName SalesYTD
--- ----------- ---------------------- -----------------
1 Linda Mitchell 4251368.54
2 Jae Pak 4116871.22
3 Michael Blythe 3763178.17
4 Jillian Carson 3189418.36
5 Ranjit Varkey Chudukatil 3121616.32
6 José Saraiva 2604540.71
7 Shu Ito 2458535.61
8 Tsvi Reiter 2315185.61
9 Rachel Valdez 1827066.71
10 Tete Mensa-Annan 1576562.19
11 David Campbell 1573012.93
12 Garrett Vargas 1453719.46
13 Lynn Tsoflias 1421810.92
14 Pamela Ansman-Wolfe 1352577.13
Returning a subset of rows
USE AdventureWorks2012;
GO
WITH OrderedOrders AS
(
SELECT SalesOrderID, OrderDate,
ROW_NUMBER() OVER (ORDER BY OrderDate) AS RowNumber
FROM Sales.SalesOrderHeader
)
SELECT SalesOrderID, OrderDate, RowNumber
FROM OrderedOrders
WHERE RowNumber BETWEEN 50 AND 60;
Using ROW_NUMBER() with PARTITION
USE AdventureWorks2012;
GO
SELECT FirstName, LastName, TerritoryName, ROUND(SalesYTD,2,1),
ROW_NUMBER() OVER(PARTITION BY TerritoryName ORDER BY SalesYTD DESC) AS Row
FROM Sales.vSalesPerson
WHERE TerritoryName IS NOT NULL AND SalesYTD <> 0
ORDER BY TerritoryName;
Output
FirstName LastName TerritoryName SalesYTD Row
--------- -------------------- ------------------ ------------ ---
Lynn Tsoflias Australia 1421810.92 1
José Saraiva Canada 2604540.71 1
Garrett Vargas Canada 1453719.46 2
Jillian Carson Central 3189418.36 1
Ranjit Varkey Chudukatil France 3121616.32 1
Rachel Valdez Germany 1827066.71 1
Michael Blythe Northeast 3763178.17 1
Tete Mensa-Annan Northwest 1576562.19 1
David Campbell Northwest 1573012.93 2
Pamela Ansman-Wolfe Northwest 1352577.13 3
Tsvi Reiter Southeast 2315185.61 1
Linda Mitchell Southwest 4251368.54 1
Shu Ito Southwest 2458535.61 2
Jae Pak United Kingdom 4116871.22 1
Your current table design does not allow you to determine the latest entry. YOu have no field to sort on to indicate which record was added last.
You need to redesign or pull that information from the audit tables. If you have a database without audit tables, you might have to find a tool to read the transaction logs and it will be a very time-consuming and expensive process. Or if you know the date the records you want to remove were added, you could possibly use a backup from just before this happened to find the records that were added. Just be awwre that you might be looking at records changed after this date that you want to keep.
If you need to do this on a regular basis instead of one-time to fix some bad data, then you need to properly design your database to include an identity field and possibly a dateupdated field (maintained through a trigger) or audit tables. (In my opinion no database containing information your company is depending on should be without audit tables, one of the many reasons why you should never allow an ORM to desgn a database, but I digress.) If you need to know the order records were added to a table, it is your responsiblity as the developer to create that structure. Databases only store what is deisnged for tehm to store, if you didn't design it in, then it is not available easily or at all
If (colA +'_'+ colB) can not be dublicate try this.
declare #delColumn nvarchar(250)
set #delColumn = (select top 1 DeleteColumn from (
select (colA +'_'+ colB) as DeleteColumn ,
ROW_NUMBER() OVER(ORDER BY colA DESC) as Id from tableX
)b
order by Id desc
)
delete from tableX where (colA +'_'+ colB) =#delColumn
Assume the table of records below
ID Name AppointmentDate
-- -------- ---------------
1 Bob 1/1/2010
1 Bob 5/1/2010
2 Henry 5/1/2010
2 Henry 8/1/2011
3 John 8/1/2011
3 John 12/1/2011
I want to retrieve the most recent appointment date by person. So I need a query that will give the following result set.
1 Bob 5/1/2010 (5/1/2010 is most recent)
2 Henry 8/1/2011 (8/1/2011 is most recent)
3 John 8/1/2011 (has 2 future dates but 8/1/2011 is most recent)
Thanks!
Assuming that where you say "most recent" you mean "closest", as in "stored date is the fewest days away from the current date and we don't care if it's before or after the current date", then this should do it (trivial debugging might be required):
SELECT ID, Name, AppointmentDate
from (select
ID
,Name
,AppointmentDate
,row_number() over (partition by ID order by abs(datediff(dd, AppointmentDate, getdate()))) Ranking
from MyTable) xx
where Ranking = 1
This usese the row_number() function from SQL 2005 and up. The subquery "orders" the data as per the specifications, and the main query picks the best fit.
Note also that:
The search is based on the current date
We're only calculating difference in days, time (hours, minutes, etc.) is ignored
If two days are equidistant (say, 2 before and 2 after), we pick one randomly
All of which could be adjusted based on your final requirements.
(Phillip beat me to the punch, and windowing functions are an excellent choice. Here's an alternative approach:)
Assuming I correctly understand your requirement as getting the date closest to the present date, whether in the past or future, consider this query:
SELECT t.Name, t.AppointmentDate
FROM
(
SELECT Name, AppointmentDate, ABS(DATEDIFF(d, GETDATE(), AppointmentDate)) AS Distance
FROM Table
) t
JOIN
(
SELECT Name, MIN(ABS(DATEDIFF(d, GETDATE(), AppointmentDate))) AS MinDistance
FROM Table
GROUP BY Name
) d ON t.Name = d.Name AND t.Distance = d.MinDistance