I'm trying to collect the highest mountain for each country on the American continent from this database : http://www.semwebtech.org/sqlfrontend/
When I execute this query :
SELECT DISTINCT Country.Name AS Country, Mountain.Name AS Mountain, Elevation FROM Mountain
JOIN Geo_Mountain ON Mountain=Name
JOIN Encompasses ON Geo_Mountain.Country=Encompasses.Country
JOIN Country ON Geo_Mountain.Country=Country.Code
WHERE Continent='North America' OR Continent='South America'
ORDER BY Country.Name, Elevation DESC
This is an extract of what I have :
COUNTRY MOUNTAIN ELEVATION
Argentina Aconcagua 6962
Argentina Ojos del Salado 6893
Argentina Monte Pissis 6795
Bolivia Alto Toroni 5982
Bolivia Licancabur 5920
Bolivia Ollagüe 5870
Bolivia Zapaleri 5653
The thing is that I have all the mountain listed in the database and I'm unable to select only the highest mountain for each country
I'v tried to use GROUP BY Country.Name between WHEREand ORDER BYbut received and error message:
A database error occured: ORA-00979: not a GROUP BY expression
Here is the Referential Dependencies Diagramm: http://www.dbis.informatik.uni-goettingen.de/Mondial/mondial-abh.pdf
group by country.name is correct. But you must make more changes. In the select clause, you can select country.name since you are grouping by it. So far so good.
elevation can't stand, and it is not what you want - you want max(elevation) as elevation. You only want the max for each group.
The more interesting is the mountain name. Of course, it is not a group function; but you don't want that. You only want the name of the highest mountain in each group (that is, in each country). There are a few ways to do that; the most efficient is
max(mountain.name) keep (dense_rank last order by elevation) as mountain_name
Related
I have a table like this in Snowflake. It supports ANSI SQL, so don't worry if this DB isn't familiar to you.
Salesman
Customer
Country
Brown
Super Company
UK
Brown
Another customer
UK
Smith
Contoso
US
Brown
Test company
US
I'd need to find where each salesman have most of customers. So desired response for the query would be like this.
Salesman
Country
cnt(country)
Brown
UK
2
Smith
US
1
I've come up with this
SELECT
salesman,
country,
max(count(country))
FROM
customertable
GROUP BY
salesman, country
But nested aggeregation functions aren't supported. And I've already read quite good reasons for that. But I just cannot find a way to do that in any other way.
QUALIFY could be used to filter the highest value per salesman:
SELECT salesman,
country,
count(country) AS cnt
FROM customertable
GROUP BY salesman, country
QUALIFY RANK() OVER(PARTITION BY salesman ORDER BY cnt DESC) = 1
Regarding your questions i guess you would want to count customer by country instead of country.
This should do the job with the use of WINDOW FUNCTIONS AND QUALIFY
Window Functions documentation
CREATE OR REPLACE TABLE customers (salesman STRING, customer STRING, country STRING);
INSERT INTO customers
VALUES
('Brown', 'Super Company', 'UK'),
('Brown', ' Another customer', 'UK'),
('Smith', 'Contoso', 'US'),
('Brown', 'Test company', 'US')
;
SELECT
salesman,
country,
COUNT(customer) AS nb_customer
FROM customers
GROUP BY
salesman,
country
QUALIFY RANK() OVER (PARTITION BY salesman ORDER BY nb_customer DESC) = 1
;
Apologies if something like this was answered, but I'm not seeing it. This is SQL Server. I'm looking at a list of trucks with unique identifiers (Unique ID). They go to various cities. They arrive (ArrivalDate) in the City (City) and then leave (SentDate). There is no destination column. The movement column is a sort of unique identifier for the movement; the first part is the UniqueID but the second is nothing.
UniqueID
City
ArrivalDate
SentDate
Movement
97841
Los Angeles North
25/01/2021
27/01/2021
97841 : 949814
93621
Baltimore
21/01/2021
22/01/2021
93621 : 646946
96872
Los Angeles South
19/01/2021
19/01/2021
96872 : 685469
97842
Boston
12/12/2020
20/12/2020
97841 : 646488
I'd like to write a query that shows a Unique ID that leaves from a particular city and goes to another. So, it would look something like this:
UniqueID
City 1
City 2
SentDate
ArrivalDate
97841
Los Angeles South
Boston
12/12/2020
15/12/2020
97841
Los Angeles North
Boston
01/01/2021
05/01/2021
I run the code below. I'm using Inner Join on the same table to filter on the UniqueID. I get the right UniqueID, right cities. But obviously the dates aren't linking right. I'm getting valid dates for the cities, but they are out of order/whack. As in sure, the truck arrived in Boston but it's a date for six months later, when it should've been the showing proper arrival date of 4 or 5 days later. Any ideas for the right solutions for this? Thanks!
SELECT DISTINCT
aa.UniqueID,
aa.City AS 'City 1',
bb.City AS 'City 2',
aa.SentDate,
bb.ArrivalDate
FROM Movedata aa
INNER JOIN Movedata bb on bb.UniqueID = aa.UniqueID
WHERE (aa.City LIKE '%Los Angeles South%' AND bb.City LIKE '%Los Angeles North%') AND
(aa.SentDate < bb.ArrivalDate) AND
aa.SentDate BETWEEN '2020-01-28 00:00:00.000' AND '2021-01-29 00:00:00.000'
ORDER by aa.SentDate DESC;
SELECT
first_name + ' ' + last_name AS name,
country,
birthdate,
-- Retrieve the birthdate of the oldest voter per country
FIRST_VALUE(birthdate)
OVER (PARTITION BY country ORDER BY birthdate) AS oldest_voter,
-- Retrieve the birthdate of the youngest voter per country
LAST_VALUE(birthdate)
OVER (PARTITION BY country ORDER BY birthdate ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS youngest_voter
FROM voters
WHERE country IN ('Spain', 'USA');
The above query results in the following data:
name country birthdate oldest_vote youngest_voter
Caroline Griffin Spain 1981-03-20 1981-03-20 1988-03-21
Christopher Jackson Spain 1981-04-15 1981-03-20 1988-03-21
Raul Raji Spain 1981-04-25 1981-03-20 1988-03-21
Karen Cai Spain 1981-05-03 1981-03-20 1988-03-21
If we remove the window function clause (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) of the "LAST_VALUE(birthdate)" the result changes as below:
SELECT
first_name + ' ' + last_name AS name,
country,
birthdate,
-- Retrieve the birthdate of the oldest voter per country
FIRST_VALUE(birthdate)
OVER (PARTITION BY country ORDER BY birthdate) AS oldest_voter,
-- Retrieve the birthdate of the youngest voter per country
LAST_VALUE(birthdate)
OVER (PARTITION BY country ORDER BY birthdate) AS youngest_voter
FROM voters
WHERE country IN ('Spain', 'USA');
name country birthdate oldest_voter youngest_voter
Caroline Griffin Spain 1981-03-20 1981-03-20 1981-03-20
Christopher Jackson Spain 1981-04-15 1981-03-20 1981-04-15
Raul Raji Spain 1981-04-25 1981-03-20 1981-04-25
Karen Cai Spain 1981-05-03 1981-03-20 1981-05-03
The question is
FIRST_VALUE(birthdate) is giving the first value of the ordered partition by country which we use in the oldest_voter.
Why do we need a window function clause for the LAST_VALUE(birthdate) for the similar result which we need for the youngest_voter?
When I remove the clause the youngest_voter results copies the birthdate column and not the LAST_VALUE(birthdate) similar to FIRST_VALUE(birthdate).
Last_Value (and First_Value) are somewhat strange because they're analytic functions.
Analytic functions deal with windows differently than normal aggregate functions do.
To demonstrate this, I'll take a detour and use a running total using SUM as a first example of the difference between aggregate functions and analytic functions.
SUM is normally an aggregate function (e.g., when not using windowed functions)
However, it becomes an analytic function when you do include an ORDER BY window function.
Say you have the following table
id num_items
1 5
2 8
3 3
4 5
If you then ran SELECT SUM(num_items) AS Total FROM mytable the result is 21, as expected. This is the typical 'aggregate' version of the SUM function.
However, it you add ORDER BY to the SUM, it becomes an analytic function.
Running SELECT SUM(Num_items) OVER (ORDER BY id) AS Total FROM mytable; gives you the following - a running total.
Total
5
13
16
21
With analytic functions, window functions operate on the data to the current row only unless specified otherwise with the ROWS BETWEEN clause.
Now, in your example (birthdates) without the ROWS BETWEEN clause, we can run through the processing.
Let's take the first row to start.
The analytic functions will operate on that one row only
So both the first_value (ordered by birthdate) and last_value (ordered by birthdate) will be this row's value
Let's take the second row
The analytic functions will operate on the first two rows
The first_value will be from the first row, but the last_value will be from the second row
Only at the last row will the results be as you expect. For first_value, it is typically not a problem (as you have demonstrated) but it is a 'gotcha!' for last_value.
UPDATE: To overcome the issue, instead of specifying the ROWS BETWEEN component, you can sort it the other way and use First_Value e.g.,
instead of LAST_VALUE(birthdate) OVER (PARTITION BY country ORDER BY birthdate) AS youngest_voter
use FIRST_VALUE(birthdate) OVER (PARTITION BY country ORDER BY birthdate DESC) AS youngest_voter
I'm trying to figure out the best way to do something - basically I'm looking for advice before I do it the long/hard way!
I have the following model associations:
Seller hasMany Invoices
Invoice hasOne Supplier
Supplier belongsTo SupplierType
Each invoice is for a certain amount and is from a certain date. I want to be able to retrieve Sellers who have spent within a certain amount in the past 'full' month for which we have data. So, I need to get the date 1 month before the most recent invoice, find the total on all invoices for that Seller since that date, and then retrieve only those where the total lies between, say, £10000 and £27000 (or whatever range the user has set).
Secondly, I want to be able to do the same thing, but with the SupplierType included. So, the user may say that they want Sellers who have spent between £1000 & £5000 from Equipment Suppliers, and between £1000 & £7000 from Meat Suppliers.
My plan here is to do an inital search for the appropriate supplier type id, and then I can filter the invoices based on whether each one is from a supplier of an appropriate type.
I'm mainly not sure whether there is a way to work out the monthly total and then filter on it in one step. Or am I going to have to do it in several steps? I looked at Virtual Fields, but I don't think they do what I need - they seem to be mainly used to combine fields from the same record - is that correct?
(Posted on behalf of the question author).
I'm posting the eventual solution here in case it helps anyone else:
SELECT seller_id FROM
(SELECT i.seller_id, SUM(price_paid) AS totalamount FROM invoices i
JOIN
(SELECT seller_id, MAX(invoice_date) AS maxdate FROM invoices) sm
ON i.seller_id = sm.seller_id
WHERE i.invoice_date > (sm.maxdate - 30) GROUP BY seller_id) t
WHERE t.totalamount BETWEEN 0 AND 1000
This can be done in a single query that will look something like:
select * from (
select seller, sum(amount) as totalamount
from invoices i join
(select seller, max(invoicedate) as maxdate from invoices group by seller) sm
on i.seller=sm.seller
and i.invoicedate>(sm.maxdate-30)
group by seller
) t where t.totalamount between 1000 and 50000
i have a table named locations of which i want to select and get values in such a way that it should select only distinct values from a column but select all other values .
table name: locations
column names 1: country values : America, India, India, India
column names 2: state/Province : Newyork, Punjab, Karnataka, kerala
when i select i should get India only once and all the three states listed under India . is ther any way..??? sombody please help
You could do this:
SELECT country, GROUP_CONCAT(state SEPARATOR ', ')
FROM locations
GROUP BY country
But this sort of thing is often best done in the presentation layer.
You want it displayed in that order, not selected?
In this case you have to add a condition inside of your loop to check a country and print ot out only if it was changed.