Get unique values query with latest entries - database

I am trying to write a query to get unique transaction values, with the sale value and the latest date that took place.
Here is my query:
select transaction, sales, max(sale_date) from xyz_table where report_date = 20160718 group by transaction, sales;
This is the result that i get:
This is the sample data:
|transaction | sales| sale_date| report_date|
|1397115220084030| 0.000144| 20160714|20160718|
|13971230534538500| 0 | 20160716|20160718|
|13973937437448300| 0.000001| 20160716|20160718|
|13976744119997000| 0.008563| 20160714|20160718|
|13976744119997000| 0.002392| 20160715|20160718|
What i wanted was unique transactions with the latest sale date:
This is the required data:
|transaction | sales| sale_date| report_date|
|1397115220084030| 0.000144| 20160714|20160718|
|13971230534538500| 0 | 20160716|20160718|
|13973937437448300| 0.000001| 20160716|20160718|
|13976744119997000| 0.002392| 20160715|20160718|
I have tried to do max of sales but that still does not give the correct result:
select transaction, Max(sales), max(sale_date) from xyz_table where report_date = 20160718 group by transaction;
Wrong result:
This is the required data:
|transaction | sales| sale_date| report_date|
|1397115220084030| 0.000144| 20160714|20160718|
|13971230534538500| 0 | 20160716|20160718|
|13973937437448300| 0.000001| 20160716|20160718|
|13976744119997000| 0.008563| 20160715|20160718|
Please can someone help me.
Thanks

In Hive, you would use window functions:
select t.*
from (select t.*,
row_number() over (partition by transaction order by sale_date desc) as seqnum
from transactions t
) t
where seqnum = 1;
The MySQL query would be quite different, because it does not support this ANSI standard functionality.

Related

SQL - Return first non-empty value for previous days

I'm currently working with an exchange rates table in SQL that has these fields:
| Country | ExchangeRateDt | ExchangeRateValue |
| DK | 202000601 | 0.2 |
| DK | 202000603 | 0.21 |
| HR | 202000601 | 0.10 |
| HR | 202000602 | 0.12 |
For each currency I don't have a value for any day of the year because of bank holidays or simply weekends.
I need to join it with an order table where some orders are placed on weekends and on a specific day I could not have an exchange rate to calculate taxes.
I need to take the first non missing value from the previous days (so in the examples should I have an order for day 2020-06-02 in Denmark I should exchange it using the rate 0.2)
I thought about using a calendar table but I can't manage to get the job done.
Can someone help me?
Thanks in advance,
R
To get the most recent value less than or equal to the current day:
SELECT
<whatever columns you need from order>
,exchange.ExchangeRateValue
FROM
<order table> order
LEFT JOIN
<exchange rate table> exchange
ON exchange.Country = order.Country
AND exchange.ExchangeRateDt =
(
SELECT
MAX(ExchangeRateDt)
FROM
<exchange rate table>
WHERE
Country = order.Country
AND ExchangeRateDt <= order.OrderDt
)
Ensure the clustered index on the exchange rate table is (Country, ExchangeRateDt).
I have this as a left join so you will still return order results if the currency information is somehow missing. You would have to refer to business rules on how to proceed if no exchange rate was available.
You would typically create a calendar table that stores all the days you are interested in, say dates, with each date on a separate row.
You would also probably have a table that lists the countries: I assumed countries.
Then, one option is a lateral join:
select c.country, d.date, t.ExchangeRateValue
from dates d
cross join countries c
outer apply (
select top (1) t.*
from mytable t
where t.country = c.country and t.ExchangeRateDt <= d.date
order by t.ExchangeRateDt desc limit 1
) t
If you don't have these two tables, or can't create them, then one option is a recursive query to generate the dates and a subquery to list the countries. For example, this would generate the data for the month of June:
with dates as (
select '20200601' date
union all
select dateadd(day, 1, date) from dates where date < '20200701'
)
select c.country, d.date, t.ExchangeRateValue
from dates d
cross join (select distinct country from mytable) c
outer apply (
select top (1) t.*
from mytable t
where t.country = c.country and t.ExchangeRateDt <= d.date
order by t.ExchangeRateDt desc limit 1
) t
You should be able to do the mapping between the transation date and the exchange rate date with this query:
select TAB.primary_key, TAB.TransationDate, max(EXR.ExchangeRateDt)
from yourtable TAB
inner join exchangerate EXR
on TAB.Country = EXR.Country and TAB.TransationDate >= EXR.ExchangeRateDt
group by TAB.primary_key, TAB.TransationDate

Select rows where a value is maximum, and a column is null

I have a table, products, that looks along these lines:
productID | version | done
1 | 1 | 2000-01-01
1 | 2 | NULL
2 | 1 | NULL
2 | 2 | 2000-01-01
Version is assumed to be increasing.
What I want is a query that returns a ProductID and its highest / current Version, if the Done column for that version is NULL. In plain English, I want all products where the latest version is not Done, and the corresponding version. The goal: among products, find the ones with a new version that have not been "done" / processed yet.
Note: in the example above, I would expect the query to return ProductID 1, Version 2 only. I do not want the highest not-done version of a product, I want the highest version of a product, if it is not-done. Sorry if the clarification is overkill.
I wrote a query which appears to do what I want:
SELECT productID ProductID, version Version
FROM products
WHERE done IS NULL
AND version IN (
SELECT MAX(version)
FROM products
GROUP BY productID
)
However, it also appears to not be very efficient. So my question is, is there a better way to approach this query?
We can try using ROW_NUMBER here:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY productID ORDER BY version DESC) rn
FROM products
)
SELECT productID, version
FROM cte
WHERE rn = 1 AND done IS NULL;
Demo
The CTE above assigns a row number, starting with 1, to latest record for each product, according to version. Then, we subquery and retain only product records where the latest one happens to not have a value assigned to the done column.
Seems you are almost correct with your query, what's missing is the correlation between the productID of your subquery and your main table.
SELECT t.productID ProductID, t.version Version
FROM products t
WHERE t.done IS NULL
AND version IN (
SELECT MAX(p.version)
FROM products p
WHERE p.productID = t.productID
GROUP BY p.productID
)
Another solution is to use join
select t1.* from products t1
inner join
(select max(version) as versionId, productID
from products
group by productID) t2 on t2.productID = t1.productID and t2.versionId = t1.version
where coalesce(done, '') = ''

SQL MAX Date Does Not Decipher Seconds

I have a table which contains the following data:
ID | ObjectID | ActionDate
=======================================
12345 | 422107 | 2016-10-05 11:24:23.790
12346 | 422107 | 2016-10-05 11:24:28.797
I want to return the ID and max date, but the MAX function does not seem to be calculating down to seconds value (SS). Am I missing something, or is this a limitation with the MAX function? Here is the code I am using:
SELECT
TMOA.ObjectID AS [ObjID]
, TMOA.ID AS [ObjActionID]
, MAX(TMOA.ActionDate) AS [PrepDate]
FROM
TM_Procedure AS TMPRD
left join TM_ObjectAction AS TMOA ON TMPRD.ID = TMOA.ObjectID
GROUP BY
TMOA.ObjectID
, TMPRD.ID
, TMOA.ID
Looks like you're grouping by the ID of the table which is UNIQUE. More than likely that's why you're getting a record that you don't want. Just select the MAX(ActionDate) and see what you get.
If you get the records you want, then you have to figure out which column you are selecting/grouping by that is causing the records you don't want. My guess is that it's either TMOA.ObjectID or TMOA.ID
One option is to use the window function Row_Number()
Select *
From (
Select *
,RowNr=Row_Number() over (Partition By ObjectID Order by ActionDate Desc
From YourTable
) A
Where RowNr=1

Laravel - query builder to select multiple rows with unique column value (which has max value from another column)

I have a table like this
data | usage_date | usage_hour
x | 03/03/2016 | 05:30:30
y | 03/02/2016 | 11:30:30
z | 03/03/2016 | 07:30:30
p | 03/02/2016 | 05:30:30
When I run Laravel query I would like see following rows being selected
y | 03/02/2016 | 11:30:30
z | 03/03/2016 | 07:30:30
So basically I want to build a query which will give unique values for 'usage_date', with max 'usage_hour'. How can I build this query?
First of all, you need to know how to do this in plain SQL, because Laravel's QueryBuilder is just a tool for building SQL-queries.
The task you described is kind of tricky and unfortunately there's no short and easy SQL query for that.
It can be done with window functions:
SELECT DISTINCT usage_date,
first_value(usage_hour) OVER (PARTITION BY usage_date ORDER BY usage_hour DESC) as usage_hour,
first_value(data) OVER (PARTITION BY usage_date ORDER BY usage_date DESC, usage_hour DESC) as data
FROM t
(sqlfiddle example)
With QueryBuilder it will look like this:
DB::table('t')
->distinct()
->select([
'usage_date',
DB::raw('first_value(usage_hour) OVER (PARTITION BY usage_date ORDER BY usage_hour DESC) as usage_hour'),
DB::raw('first_value(data) OVER (PARTITION BY usage_date ORDER BY usage_date DESC, usage_hour DESC) as data')
])
->get()
Or it can be done with a subquery (which is inefficient and not cool):
SELECT usage_date, max(usage_hour) as usage_hour,
(SELECT data FROM t AS t2
WHERE t2.usage_date = t.usage_date AND t2.usage_hour = max(t.usage_hour)
LIMIT 1) AS data
FROM t
GROUP BY usage_date
If anyone knows a method without subqueries and window functions, please let me know.
Try this and let see how it goes
$table = \DB::table('table name')->distinct('usage_date')->groupBy('usage_date')->orderBy('usage_hour','desc')->get();

Max Value with unique values in more than one column

I feel like I'm missing something really obvious here.
Using T-SQL/SQL-Server:
I have unique values in more than one column but want to select the max version based on one particular column.
Dataset:
Example
ID | Name| Version | Code
------------------------
1 | Car | 3 | NULL
1 | Car | 2 | 1000
1 | Car | 1 | 2000
Target status: I want my query to only select the row with the highest version value. Running a MAX on the version column pulls all three because of the distinct values in the 'Code' column:
SELECT ID
,Name
,MAX(Version)
,Code
FROM Table
GROUP BY ID, Name, Code
The net result is that I get all three entries as per the data set due to the unique values in the Code column, but I only want the top row (Version 3).
Any help would be appreciated.
You need to identify the row with the highest version as 1 query and use another outer query to pull out all the fields for that row. Like so:
SELECT t.ID, t.Name, GRP.Version, t.Code
FROM (
SELECT ID
,Name
,MAX(Version) as Version
FROM Table
GROUP BY ID, Name
) GRP
INNER JOIN Table t on GRP.ID = t.ID and GRP.Name = t.Name and GRP.Version = t.Version
You can also use row_number() to do this kind of logic, for example like this:
select ID, Name, Version, Code
from (
select *, row_number() over (order by Version desc) as RN
from Table1
) X where RN = 1
Example in SQL Fiddle
add the top statment to force the return of a single row. Also add the order by notation
SELECT top 1 ID
,Name
,MAX(Version)
,Code
FROM Table
GROUP BY ID, Name, Code
order by max(version) desc

Resources