Hackerrank SQL challenge - sql-server

This T-SQL query
SELECT city, Len(city)
FROM station
ORDER BY Len(city)
returns table sorted by city, not by Len(city) - is this proper behavior?
Acme 4
Addison 7
Agency 6
Aguanga 7
Alanson 7
Alba 4
...
The challenge is :
https://www.hackerrank.com/challenges/weather-observation-station-5

Since you want first and last, I'd probably just use a union and top 1. makes it clear as to what you're after and easy to maintain.
And since you can use alias in order by... I'd alias len(city)
SELECT TOP 1
city, len(city) LenCity
FROM
station
ORDER BY
LenCity ASC
UNION ALL
SELECT TOP 1
city, Len(City) lenCity
FROM
station
ORDER BY
LenCity DESC

Here's a link to my GitHub if you have any problems with the other questions. These are the answers to all the Basic questions. Feel free to join in!
https://github.com/jaymoore3/SolvingHackerRank/tree/main/SQL/Basic
Buuuuutttt... If you just need the code:
select city,len(city) as LengthOf
from station
group by city,len(city)
having len(city)=(select max(len(city)) from station)
union
select top 1 city,len(city) as LengthOf
from station
group by city,len(city)
having len(city)=(select min(len(city)) from station)

Related

Top vs Rank/Row Number functions - Which performs higher?

I attempted to Google the Cost of using Top in a query vs using a Ranking or Row_Number type function.
Does the cost of each depend on the situation or can the cost of these two features be determined across the board for all situations?
Some mock SQL is below using a simple CTE to demonstrate my question would look like the below:
WITH fData AS
(
SELECT 1 AS ID, 'John' AS fName, 'Black' AS lName, CAST('05/19/1975' AS DATE) AS birthDate UNION ALL
SELECT 2 AS ID, 'John' AS fName, 'Black' AS lName, CAST('04/1/1989' AS DATE) AS birthDate UNION ALL
SELECT 3 AS ID, 'John' AS fName, 'Black' AS lName, CAST('11/16/1995' AS DATE) AS birthDate UNION ALL
SELECT 4 AS ID, 'John' AS fName, 'Black' AS lName, CAST('01/16/1968' AS DATE) AS birthDate UNION ALL
SELECT 5 AS ID, 'John' AS fName, 'Black' AS lName, CAST('01/16/1968' AS DATE) AS birthDate
)
/* Using TOP 1 vs Row_Number() - Uncomment this and comment the below to VIEW TOP version */
--SELECT TOP 1 d.ID, d.fName, d.lName, d.birthDate
--FROM fData d
--ORDER BY d.birthDate
/* Using the below vs TOP 1 */
SELECT * FROM
( SELECT d.ID, d.fName, d.lName, d.birthDate, Row_Number() OVER (ORDER BY d.birthDate) AS ranker
FROM fData d
) r
WHERE r.ranker = 1
When using TOP there's not a need to apply a secondary Wrapping query around it and it looks cleaner. After applying a Row_Number or a Ranking function you then must wrap it to tell the query which row you are now wanting... either by applying the WHERE ranker = 1 or ranker >= 5 to achieve the same as TOP 1 or TOP 5.
Which is better faster if this is even something that can be determined?
In the case of your example the TOP is somewhat more efficient.
The execution plan for TOP is below
The TOP N sort with N=1 just needs to keep track of the row with the lowest birthDate that it sees.
For the row_number query it recognises that the row number is always ascending and does itself add a TOP 1 to the plan but it doesn't combine the separated TOP and SORT into a TOP N Sort - so it does a full sort of all 5 rows.
In the case that an index supplies rows in the desired order without the need for a sort there won't be much in it. The row_number query will have an extra couple of operators that are fairly inexpensive anyway.
WHY use ranking functions in SQL Server when it has TOP
Ranking functions in general are more powerful than TOP.
For the cases where both would work consider that TOP is a fairly ancient proprietary syntax and not standard SQL. It was in the product a long time before window functions were added. If portable SQL is a concern you should not use TOP.
Though you might not use ranking functions either. As another (standard SQL) alternative is
SELECT d.ID, d.fName, d.lName, d.birthDate
FROM fData d
ORDER BY d.birthDate
OFFSET 0 ROWS
FETCH NEXT 1 ROW ONLY
which gives the same plan as TOP 1

Is there a way to combine these queries?

I have begun working some of the programming problems on HackerRank as a "productive distraction".
I was working on the first few in the SQL section and came across this problem (link):
Query the two cities in STATION with the shortest and
longest CITY names, as well as their respective lengths
(i.e.: number of characters in the name). If there is
more than one smallest or largest city, choose the one
that comes first when ordered alphabetically.
Input Format
The STATION table is described as follows:
where LAT_N is the northern latitude and LONG_W is
the western longitude.
Sample Input
Let's say that CITY only has four entries:
1. DEF
2. ABC
3. PQRS
4. WXY
Sample Output
ABC 3
PQRS 4
Explanation
When ordered alphabetically, the CITY names are listed
as ABC, DEF, PQRS, and WXY, with the respective lengths
3, 3, 4 and 3. The longest-named city is obviously PQRS,
but there are options for shortest-named city; we choose
ABC, because it comes first alphabetically.
I agree that this requirement could be written much more clearly, but the basic gist is pretty easy to get, especially with the clarifying example. The question I have, though, occurred to me because the instructions given in the comments for the question read as follows:
/*
Enter your query here.
Please append a semicolon ";" at the end of the query and
enter your query in a single line to avoid error.
*/
Now, writing a query on a single line doesn't necessarily imply a single query, though that seems to be the intended thrust of the statement. However, I was able to pass the test case using the following submission (submitted on 2 lines, with a carriage return in between):
SELECT TOP 1 CITY, LEN(CITY) FROM STATION ORDER BY LEN(CITY), CITY;
SELECT TOP 1 CITY, LEN(CITY) FROM STATION ORDER BY LEN(CITY) DESC, CITY;
Again, none of this is advanced SQL. But it got me thinking. Is there a non-trivial way to combine this output into a single results set? I have some ideas in mind where the WHERE clause basically adds some sub-queries in an OR statement to combine the two queries into one. Here is another submission I had that passed the test case:
SELECT
CITY,
LEN(CITY)
FROM
STATION
WHERE
ID IN (SELECT TOP 1 ID FROM STATION ORDER BY LEN(CITY), CITY)
OR
ID IN (SELECT TOP 1 ID FROM STATION ORDER BY LEN(CITY) DESC, CITY)
ORDER BY
LEN(CITY), CITY;
And, yes, I realize that the final , CITY in the final ORDER BY clause is superfluous, but it kind of makes the point that this query hasn't really saved that much effort, especially against returning the query results separately.
Note: This isn't a true MAX and MIN situation. Given the following input, you aren't actually taking the first and last rows:
Sample Input
1. ABC
2. ABCD
3. ZYXW
Based on the requirements as written, you'd take #1 and #2, not #1 and #3.
This makes me think that my solutions actually might be the most efficient way to accomplish this, but my set-based thinking could always use some strengthening, and I'm not sure if that might play in here or not.
Here's another alternative. I think it's pretty straight forward, easy to understand what's going on. Performance is good.
Still has a couple of sub-queries though.
select
min(City), len(City)
from Station
group by
len(City)
having
len(City) = (select min(len(City)) from Station)
or
len(City) = (select max(len(City)) from Station)
Untested as well, but I don't see a reason for it not to work:
SELECT *
FROM (
SELECT TOP (1) CITY, LEN(CITY) AS CITY_LEN
FROM STATION
ORDER BY CITY_LEN, CITY
) AS T
UNION ALL
SELECT *
FROM (
SELECT TOP (1) CITY, LEN(CITY) AS CITY_LEN
FROM STATION
ORDER BY CITY_LEN DESC, CITY
) AS T2;
You cant have UNION ALL with ORDER BY for each SELECT statement, but you can workaround it by using subqueries togeter with TOP (1) clause and ORDER BY.
UNTESTED:
WITH CTE AS (
Select ID, len(City), row_number() over (order by City) as AlphaRN,
row_number() over (order by Len(City) desc) as LenRN) B
Select * from cte
Where AlphaRN = 1 and (lenRN = (select max(lenRN) from cte) or
lenRN = (Select min(LenRN) from cte))
Here's the best I could come up with:
with Ordering as
(
select
City,
Forward = row_number() over (order by len(City), City),
Backward = row_number() over (order by len(City) desc, City)
from
Station
)
select City, len(City) from Ordering where 1 in (Forward, Backward);
There are definitely a lot of ways to approach this as evidenced by the variety of answers, but I don't think anything beats your original two-query solution in terms of cleanly and concisely expressing the intended behavior. Interesting question, though!
This is what I came with. I tried to use only one query, without CTE's or sub-queries.
;WITH STATION AS ( --Dummy table
SELECT *
FROM (VALUES
(1,'DEF','EU',1,9),
(2,'ABC','EU',1,6), -- This is shortest
(3,'PQRS','EU',1,5),
(4,'WXY','EU',1,4),
(5,'FGHA','EU',1,2),
(6,'ASDFHG','EU',1,3) --This is longest
) as t(ID, CITY, [STATE], LAT_N,LONG_W)
)
SELECT TOP 1 WITH TIES CITY,
LEN(CITY) as CITY_LEN
FROM STATION
ORDER BY ROW_NUMBER() OVER(PARTITION BY LEN(CITY) ORDER BY LEN(CITY) ASC),
CASE WHEN MAX(LEN(CITY)) OVER (ORDER BY (SELECT NULL)) = LEN(CITY)
OR MIN(LEN(CITY)) OVER (ORDER BY (SELECT NULL))= LEN(CITY)
THEN 0 ELSE 1 END
Output:
CITY CITY_LEN
ABC 3
ASDFHG 6
select min(CITY), length(CITY)
from STATION
group by length(CITY)
having length(CITY) = (select min(length(CITY)) from STATION)
or length(CITY) = (select max(length(CITY)) from STATION);

get first row for each group

I want to transform this data
account; docdate; docnum
17700; 9/11/2015; 1
17700; 9/12/2015; 2
70070; 9/1/2015; 4
70070; 9/2/2015; 6
70070; 9/3/2015; 9
into this
account; docdate; docnum
17700; 9/12/2015; 2
70070; 9/3/2015; 9
.. for each account I want to have one row with the most current (=max(docdate)) docdate. I already tried different approaches with cross apply and row_number but couldn't achieve the desired results
Use ROW_NUMBER:
SELCT account, docdate, docnum
FROM (
SELECT account, docdate, docnum,
ROW_NUMBER() OVER (PARTITION BY account
ORDER BY docdate DESC) AS rn
FROM mytable ) AS t
WHERE t.rn = 1
PARTITION BY account clause creates slices of rows sharing the same account value. ORDER BY docdate DESC places the record having the maximum docdate value at the top of its related slice. Hence rn = 1 points to the record with the maximum docdate value within each account partition.

Selecting top 3 distinct records from a table

I have a table like this which contains the columns: country, woeid, sometopic, LastDateTime
Venezuela 23424982 Metoo
Venezuela 23424982 Chaderton
India 25424282 BossAgain
World 1 EL AVIADOR
Venezuela 23424982 ChicagoBurning and so on...
I want distinct country to be selected and only top 3 rows.
I want the result to be like this ordered by LastDateTime.
Venezuela 23424982 Chaderton
India 25424282 BossAgain
World 1 EL AVIADOR
I tried like this:
select distinct(country), woeid, sometopic, * from PopularTrends order by LastModifiedTime desc
but this didnot work.
any pointers?
Try this,
SELECT COUNTRY,WOEID,SOMETOPIC,LASTDATETIME
FROM (SELECT *,
Row_number()
OVER(
PARTITION BY COUNTRY
ORDER BY LASTDATETIME) AS RN
FROM #Yourtable)A
WHERE RN = 1

SQL Update sequence data based upon date field

I am attempting to update a table that contains deed information. Specifically property ID, sale sequence, and deed date. The program generates the sale sequence data sequentially regardless of the deed date or prior deed information for the property in question.
[property_ID] [sale_number] [sale_deed_date]
1 1 01/15/1990
1 2 06/25/1970
1 3 08/12/1930
What I would like to accomplish is re-sequence sale_number data so they are in chronological order. Similar to this:
[property_ID] [sale_number] [sale_deed_date]
1 1 08/12/1930
1 2 06/25/1970
1 3 01/15/1990
Any help with this would be greatly appreciated.
You can do this by grabbing the correct order in a cte:
;WITH cte AS (SELECT property_ID, sales_number, sales_deed_date, rn = ROW_NUMBER() OVER (PARTITION BY Property_ID ORDER BY sales_deed_date) FROM tablename)
UPDATE t
SET t.sales_number = cte.rn
FROM tablename t
INNER JOIN cte ON t.property_ID = cte.property_ID AND t.sales_deed_date = cte.sales_deed_date

Resources