RunningDifference(x) for multiple x values - yandex

My Table tbl_data(event_time, monitor_id,type,event_date,status)
select status,sum(runningDifference(event_time)) as delta from (SELECT status,event_date,event_time FROM tbl_data WHERE event_date >= '2018-05-01' AND monitor_id =3 ORDER BY event_time ASC) group by status
Result will be
status delta
1 4665465
2 965
This query result give me right answer for single monitor_id, Now I required it for multiple monitor_id,
How can I achieve it in single/same query??

Usually this is achieved with conditional expressions. Like SELECT ...,if(monitor_id = 1, status, NULL) AS status1,... and then you do your aggregate function that, as you might know, skips NULL values. But I did some testing and it turns out that because of clickhouse internals runningDifference() can't distinguish columns originated from the same source. At the same time it distinguishes columns that came from different sources just fine. It is a bug.
I opened an issue on Github: https://github.com/yandex/ClickHouse/issues/2590
UPDATE: Devs reacted incredibly fast and with the latest source from master you can get what you want with the strategy I described. See the issue for code example.

Related

How to write where clause to consider the earlier date given 2 dates?

Control table:
ControlID, Date1, Date2, Date3
Sale table:
ID, ControlID, SaleDate
I want to get the sales from Date1 to which ever date is earlier amongst Date2 and Date3.
SELECT *
FROM SALE S
JOIN CONTROL C ON S.CONTROLID=C.ID
WHERE S.SALEDATE>=C.DATE1 AND S.SALEDATE<EARLIER(DATE2, DATE3)
What is the correct way to write the EARLIER(DATE2, DATE3) logic? For example - implement this as a new scalar function?
Or maybe:
AND S.SALEDATE<C.DATE2 AND S.SALEDATE<C.DATE3
LEAST may be available to you (SQL Server 2022)
SELECT *
FROM SALE S
JOIN CONTROL C ON S.CONTROLID=C.ID
WHERE S.SALEDATE>=C.DATE1 AND S.SALEDATE<LEAST(DATE2, DATE3)
otherwise try
WHERE S.SALEDATE>=C.DATE1 AND S.SALEDATE< DATE2 AND S.SALEDATE< DATE3
In order to be able to SARG saledate, I'd use a case:
WHERE S.SALEDATE>=C.DATE1
AND S.SALEDATE< CASE WHEN DATE2<DATE3 THEN DATE2 ELSE DATE3 END
An alternative which is probably less performant but arguably easier to read (and capable of accepting more values) is to make a "mini" unpivoted subquery:
WHERE S.SALEDATE>=C.DATE1
AND S.SALEDATE< (select min(datex) from (values (date2),(date3)) as t1(datex))
Post comment addendum:
If you have one less_than condition and one greater_than condition, an index can be used once to satisfy both. For ease's sake, let's say your dates are integers (they actually are anyway). Let's say date1=5, date2=20, date3=17.
If you use my case solution (or you are lucky enough to be able to use earlier), then the engine will:
Calculate once that starting point is date1=5 and ending point= case/earlier(20,17)=17
If there is an index on sale.saledate, it will index seek to 5 and 17. This could be very fast if [sale] is a large table.
Now that it quickly found the starting and ending point, it returns all possible rows on the output/next operator
If you use AND S.SALEDATE<C.DATE2 AND S.SALEDATE<C.DATE3, what probably will happen is that it will start fastly on 5 like before, but then create an expression aliased somewhat like expr01 which includes both of these conditions. It will then evaluate this expr01 beginning at 5 and stopping not at 17, but at the end of the table.
This does have some speculation on my part, that's why it would be helpful for you to run both and then pastetheplan.com.
Note: It is highly probable that either 1) such an index does not exist, or 2) the optimizer wouldn't use it, or 3) your query is fast anyway, which makes all this analysis partly a waste of time, much like XKCD suggests:
However, even in these cases, understanding the points and creating a good programming habit of SARGable queries is just good business.

Nested Query in SOQL

Can someone please help me generate SOQL for the below query.
Getting this error - Nesting of semi join sub-selects is not supported
SELECT External_ID_vod__c, FirstName, LastName, Middle_vod__c
FROM Account
where Id IN (select Account_vod__c from EM_Attendee_vod__c WHERE Id IN (SELECT Incurred_Expense_Attendee_vod__c
FROM Expense_Header_vod__c
where CALENDAR_YEAR(CreatedDate) > 2020 and Status_vod__c = 'Paid_in_Full_vod'))
Yes, with WHERE clauses you can go "down" the related list only 1 level, looks like you'd need 2 levels.
Couple ideas.
Can you do it in 2 steps? First select Account_vod__c from EM_Attendee_vod__c..., then pass the results to 2nd query.
See if you can eliminate a level by using rollup summary fields - but with this case might be tricky, rollup of all payments in 2020 might be not possible.
See if you can run a report that's close to what you need (even if it'd only grab these Account_vod__c) and you could use "reporting snapshot" - save the intermediate results of the report in a helper custom object. That could make it easier to query.
See if you can run the query by going "up". For example Account_vod__c is a real lookup/master-detail you could try with something like
select Account_vod__r.External_ID_vod__c, Account_vod__r.FirstName, Account_vod__r.LastName, Account_vod__r.Middle_vod__c
from EM_Attendee_vod__c
WHERE Id IN (SELECT Incurred_Expense_Attendee_vod__c
FROM Expense_Header_vod__c
where CALENDAR_YEAR(CreatedDate) > 2020 and Status_vod__c = 'Paid_in_Full_vod')
It's not perfect, it'd give you duplicate accounts if they have multiple attendees but it could work good enough. And in a pinch you could always try to deduplicate it with a GROUP BY Account_vod__r.External_ID_vod__c, Account_vod__r.FirstName, Account_vod__r.LastName, Account_vod__r.Middle_vod__c (although GROUP BY doesn't like to have more than 200 results... you could then cheat with LIMIT + OFFSET if you expect to have 2K accounts max)

Is there a way to sum an entire quantity in SQL with unique values

I am trying to get a total summation of both the ItemDetail.Quantity column and ItemDetail.NetPrice column. For sake of example, let's say the quantity that is listed is for each individual item is 5, 2, and 4 respectively. I am wondering if there is a way to display quantity as 11 for one single ItemGroup.ItemGroupName
The query I am using is listed below
select Location.LocationName, ItemDetail.DOB, SUM (ItemDetail.Quantity) as "Quantity",
ItemGroup.ItemGroupName, SUM (ItemDetail.NetPrice)
from ItemDetail
Join ItemGroupMember
on ItemDetail.ItemID = ItemGroupMember.ItemID
Join ItemGroup
on ItemGroupMember.ItemGroupID = ItemGroup.ItemGroupID
Join Location
on ItemDetail.LocationID = Location.LocationID
Inner Join Item
on ItemDetail.ItemID = Item.ItemID
where ItemGroup.ItemGroupID = '78' and DOB = '11/20/2019'
GROUP BY Location.LocationName, ItemDetail.DOB, Item.ItemName,
ItemDetail.NetPrice, ItemGroup.ItemGroupName
If you are using SQL Server 2012 , you can use the summation on partition to display the
details and aggregates in the same query.
SUM(SalesYTD) OVER (ORDER BY DATEPART(yy,ModifiedDate)),1)
Link :
https://learn.microsoft.com/en-us/sql/t-sql/functions/sum-transact-sql?view=sql-server-ver15
We can't be certain without seeing sample data. But I suspect you need to remove some fields from you GROUP BY clause -- probably Item.ItemName and ItemDetail.NetPrice.
Generally, you won't GROUP BY a column that you are applying an aggregate function to in the SELECT -- as in SUM(ItemDetail.NetPrice). And it is not very common, in my experience, to GROUP BY columns that aren't included in the SELECT list - as you are doing with Item.ItemName.
I think you need to go back to basics and read about what GROUP BY does.
First of all welcome to the overflow...
Second: The answer is going to be "It depends"
Any time you aggregate data you will need to Group by the other fields in the query, and you have that in the query. The gotcha is what happens when data is spread across multiple locations.
My suggestion is to rethink your problem and see if you really need these other fields in the query. This will depend on what the person using the data really wants to know.
Do they need to know how many of item X there are, or do they really need to know that item X is spread out over three sites?
You might find you are better off with two smaller queries.

SQL column update to solve data quality issue

I am unable to figure out how to write 'smart code' for this.
In this case I would like the end result for the first two case to be:
product_cat_name
A_SEE
A_BEE
Business is rule is such that one product_cat_name can belong to only one group but due to data quality issues we sometimes have a product_cat_name belonging to 2 different groups. As a special case in such a situation we would like to append group to the product_cat_name so that product_cat_name becomes unique.
It sounds so simple yet I am cracking my head over this.
Any help much appreciated.
Something like this:
with names as (
select prod_cat_nm , prod_cat_nm+group as new_nm from (query that joins 3 tables together) as qry
join
(Select prod_cat_nm, count(distinct group)
from (query that joins 3 tables together) as x
group by
prod_cat_nm
having count(distinct group) > 1) dups
on dups.prod_cat_nm = qry.prod_cat_nm
)
SELECT prod_cat_nm, STRING_AGG(New_nm, '') WITHIN GROUP (ORDER BY New_Nm ASC) AS new_prod_cat_nm
FROM names
GROUP BY prod_cat_nm;
I've used the 2017 STRING_AGG() here as its shortest to write - But you could easily change this to use Recursions or XML path
It is simple if you break it down into small pieces.
You need to UPDATE the table obviously, and change the value of product_cat_name. That's easy.
The new value should be group + product_cat_name. That's easy.
You only want to do this when a product_cat_name is associated with more than one group. That's probably the tricky part, but it can also be broken down into small pieces that are easy.
You need to identify which product_cat_names have more than one group. That's easy. GROUP BY product_cat_name HAVING COUNT(DISTINCT Group) > 1.
Now you need to use that to limit your UPDATE to only those product_cat_names. That's easy. WHERE product_cat_name IN (Subquery using above logic to get PCNs that have more than one Group).
All easy steps. Put them together and you've got your solution.

Windowing function: Finding max value from LAG()

I'm currently working my way through the exam study book, Querying Microsoft SQL Server 2012. I've been learning SQL over the last few months and I am currently looking over windowing functions. I came to this application question and it got me thinking about another question, which I'll list below:
So in the columns diffprev and diffnext it only lists the difference between the previous and the next value. How could I list the maximum difference between subsequent values across all of the rows (partitioned by custid)? So just scanning the table, I see that in custid 1's history, the greatest difference between subsequent rows is $548. Then for custid 2, the greatest difference is $390.95. I could see these values appearing in a maxdiff column across all the rows pertaining to the partition.
Thank you for aiding my studying!
If you're just looking for the value, this should work:
with cte as (
select custid, val - lag(val)
over (partition by custid order by orderdate, orderid) as prevVal
from Sales.OrderValues
)
select custid, max(abs(val))
from cte
group by custid
If you want the details of the rows that attain that maximum, it'll be a bit more work.
Bonus tip - pictures of text are the worst. You're more likely to get help if the people helping don't need to type your code out. Even better though would be a fully functioning example (complete with table definitions and sample data) so we can verify against your data!

Resources