How to relax GROUP BY restrictions in SQL Server? - sql-server

Consider this query:
SELECT F1,F2 FROM TABLE GROUP BY F1
Selecting F1 is valid, but F2 seems to be incorrect (after all it can change from row to row). However SQL Server does not check any logic involved here -- for example F2 could be dependent of F1 (because of the JOIN clause, for example).
I know the workarounds, but my question here is:
How to RELAX this "group by" restriction (directly)?
Something like:
RELAX_GROUPBY
SELECT F1,F2 ....
begin of edit 1
So it would be something similar to MySQL ability to get data without any workarounds from groupped dataset.
Example of data:
F1 | F2
1 | 2
1 | 2
Output (after executing the query given above):
F1 | F2
1 | 2
end of edit 1
Remark: yes, I do know the workarounds -- aggregate functions, creating view, table on-fly, and others (depending on scenario). I am not interested in another workaround. If you know the solution to the question please answer, thank you very much.

Assuming F2 is the same for every F1 (which is where your query is relevant), the easiest way is to do something like
SELECT F1, MAX(F2) AS F2
FROM TABLE
GROUP BY F1
assuming F2 is a field that can have aggregate functions applied to it, of course.
There's no way to relax the GROUP BY in the way you describe, short of rewriting the whole thing. I know MySQL does something a bit different (you can group by one field and SELECT all the others), but it's inconsistent with other implementations.

if you are so sure that F2 is dependent on F1, just add it to the group by (how difficult is that?):
SELECT F1,F2 FROM TABLE GROUP BY F1, F2
The "do what I mean and not what I code" portion of SQL Server will never be good enough to read your mind, tell it how to group the columns and it will do it. There is no facility within SQL Server to "relax" the group by restrictions, and I'm glad.

if you use group by you should use a Aggregate function then only it will work
e.g:
SELECT F1,count(F2) FROM TABLE GROUP BY F1

Related

Nested Query in SOQL

Can someone please help me generate SOQL for the below query.
Getting this error - Nesting of semi join sub-selects is not supported
SELECT External_ID_vod__c, FirstName, LastName, Middle_vod__c
FROM Account
where Id IN (select Account_vod__c from EM_Attendee_vod__c WHERE Id IN (SELECT Incurred_Expense_Attendee_vod__c
FROM Expense_Header_vod__c
where CALENDAR_YEAR(CreatedDate) > 2020 and Status_vod__c = 'Paid_in_Full_vod'))
Yes, with WHERE clauses you can go "down" the related list only 1 level, looks like you'd need 2 levels.
Couple ideas.
Can you do it in 2 steps? First select Account_vod__c from EM_Attendee_vod__c..., then pass the results to 2nd query.
See if you can eliminate a level by using rollup summary fields - but with this case might be tricky, rollup of all payments in 2020 might be not possible.
See if you can run a report that's close to what you need (even if it'd only grab these Account_vod__c) and you could use "reporting snapshot" - save the intermediate results of the report in a helper custom object. That could make it easier to query.
See if you can run the query by going "up". For example Account_vod__c is a real lookup/master-detail you could try with something like
select Account_vod__r.External_ID_vod__c, Account_vod__r.FirstName, Account_vod__r.LastName, Account_vod__r.Middle_vod__c
from EM_Attendee_vod__c
WHERE Id IN (SELECT Incurred_Expense_Attendee_vod__c
FROM Expense_Header_vod__c
where CALENDAR_YEAR(CreatedDate) > 2020 and Status_vod__c = 'Paid_in_Full_vod')
It's not perfect, it'd give you duplicate accounts if they have multiple attendees but it could work good enough. And in a pinch you could always try to deduplicate it with a GROUP BY Account_vod__r.External_ID_vod__c, Account_vod__r.FirstName, Account_vod__r.LastName, Account_vod__r.Middle_vod__c (although GROUP BY doesn't like to have more than 200 results... you could then cheat with LIMIT + OFFSET if you expect to have 2K accounts max)

Is there a way to sum an entire quantity in SQL with unique values

I am trying to get a total summation of both the ItemDetail.Quantity column and ItemDetail.NetPrice column. For sake of example, let's say the quantity that is listed is for each individual item is 5, 2, and 4 respectively. I am wondering if there is a way to display quantity as 11 for one single ItemGroup.ItemGroupName
The query I am using is listed below
select Location.LocationName, ItemDetail.DOB, SUM (ItemDetail.Quantity) as "Quantity",
ItemGroup.ItemGroupName, SUM (ItemDetail.NetPrice)
from ItemDetail
Join ItemGroupMember
on ItemDetail.ItemID = ItemGroupMember.ItemID
Join ItemGroup
on ItemGroupMember.ItemGroupID = ItemGroup.ItemGroupID
Join Location
on ItemDetail.LocationID = Location.LocationID
Inner Join Item
on ItemDetail.ItemID = Item.ItemID
where ItemGroup.ItemGroupID = '78' and DOB = '11/20/2019'
GROUP BY Location.LocationName, ItemDetail.DOB, Item.ItemName,
ItemDetail.NetPrice, ItemGroup.ItemGroupName
If you are using SQL Server 2012 , you can use the summation on partition to display the
details and aggregates in the same query.
SUM(SalesYTD) OVER (ORDER BY DATEPART(yy,ModifiedDate)),1)
Link :
https://learn.microsoft.com/en-us/sql/t-sql/functions/sum-transact-sql?view=sql-server-ver15
We can't be certain without seeing sample data. But I suspect you need to remove some fields from you GROUP BY clause -- probably Item.ItemName and ItemDetail.NetPrice.
Generally, you won't GROUP BY a column that you are applying an aggregate function to in the SELECT -- as in SUM(ItemDetail.NetPrice). And it is not very common, in my experience, to GROUP BY columns that aren't included in the SELECT list - as you are doing with Item.ItemName.
I think you need to go back to basics and read about what GROUP BY does.
First of all welcome to the overflow...
Second: The answer is going to be "It depends"
Any time you aggregate data you will need to Group by the other fields in the query, and you have that in the query. The gotcha is what happens when data is spread across multiple locations.
My suggestion is to rethink your problem and see if you really need these other fields in the query. This will depend on what the person using the data really wants to know.
Do they need to know how many of item X there are, or do they really need to know that item X is spread out over three sites?
You might find you are better off with two smaller queries.

RunningDifference(x) for multiple x values

My Table tbl_data(event_time, monitor_id,type,event_date,status)
select status,sum(runningDifference(event_time)) as delta from (SELECT status,event_date,event_time FROM tbl_data WHERE event_date >= '2018-05-01' AND monitor_id =3 ORDER BY event_time ASC) group by status
Result will be
status delta
1 4665465
2 965
This query result give me right answer for single monitor_id, Now I required it for multiple monitor_id,
How can I achieve it in single/same query??
Usually this is achieved with conditional expressions. Like SELECT ...,if(monitor_id = 1, status, NULL) AS status1,... and then you do your aggregate function that, as you might know, skips NULL values. But I did some testing and it turns out that because of clickhouse internals runningDifference() can't distinguish columns originated from the same source. At the same time it distinguishes columns that came from different sources just fine. It is a bug.
I opened an issue on Github: https://github.com/yandex/ClickHouse/issues/2590
UPDATE: Devs reacted incredibly fast and with the latest source from master you can get what you want with the strategy I described. See the issue for code example.

should database contain some business logics?

Let's say I have an item A,B,C in Table1.
They all have attributes f1. However, A and B has f2 which does not apply to C.
Table1 would be designed as:
itemName f1 f2
------------------------------------
A 100 50
A 43 90
B 66 10
C 23
There would be another table Table2 contains all the possible value of f2:
itemName f2(possible value)
------------------------------------
A 50
A 90
A 77
B 10
Let's say now i want to add an record with the highest value of f2 into Table1,depends on the iteaName. Things working fine for A and B. But in the case of C, when i loop through Table2, since there is no record of C in Table2, I cannot distinguish if it's a corrupted table or the fact that C just does not have attribute f2.
The only twos ways i can think of to solve this issue is:
1. Adding a constraint in the code, like:
if (iteaName == C )
"Do not search Table2"
else (search Table2)
if (No record)
return "Corrupted Table"
Or
2. Adding another bool field "having_f2"in Talbe1 to helping identifying that f2 does not apply to C.
The above is just an example of where to put such business logic constraints, in the DB or in the code.
Can you give me more opinions on the tradeoff between the above two ideology? In another word, which one makes more sence.
Since this is basically a field validation ("if MyModel can have property f2 set to NULL (inexistent)"), I would say, you must do that in a validator of your model.
Only if that is impossible, add some columns to model tables.
The rule I use is the following: database is used to store model data. You should try to store nothing else, except data, if possible. In your case has_f2 is not a data, but a business rule.
Of course, there are exceptions to this rule. For example, sometimes business logic must be controlled by the user and in this case it is perfectly ok to store it in the database.
Regarding your second proposal: you typically can also just query for a ~NULL value in the table, which would be the same as adding and setting a boolean attribute (would be better considering redundancy). This would also be the way to detect if the table is "corrupt". However, you can also start your query by collecting all "itemName" entries from table2, possibly building an intersection with table1 and inserting the cases of interest into table1:
1.) Intersect the "itemName" from table1 and table2 => table3
2.) Join the table3 and table2 on "itemName", "f2" => insert each tuple into table1
Alternatively, you can also split table1 into two tables { "itemName", "f1" } and { "itemName", "f2" } which would eliminate your problem.

GROUP_CONCAT and DISTINCT are great, but how do i get rid of these duplicates i still have?

i have a mysql table set up like so:
id uid keywords
-- --- ---
1 20 corporate
2 20 corporate,business,strategy
3 20 corporate,bowser
4 20 flowers
5 20 battleship,corporate,dungeon
what i WANT my output to look like is:
20 corporate,business,strategy,bowser,flowers,battleship,dungeon
but the closest i've gotten is:
SELECT DISTINCT uid, GROUP_CONCAT(DISTINCT keywords ORDER BY keywords DESC) AS keywords
FROM mytable
WHERE uid !=0
GROUP BY uid
which outputs:
20 corporate,corporate,business,strategy,corporate,bowser,flowers,battleship,corporate,dungeon
does anyone have a solution? thanks a ton in advance!
What you're doing isn't possible with pure SQL the way you have your data structured.
No SQL implementation is going to look at "Corporate" and "Corporate, Business" and see them as equal strings. Therefore, distinct won't work.
If you can control the database,
The first thing I would do is change the data setup to be:
id uid keyword <- note, not keyword**s** - **ONE** value in this column, not a comma delimited list
1 20 corporate
2 20 corporate
2 20 business
2 20 strategy
Better yet would be
id uid keywordId
1 20 1
2 20 1
2 20 2
2 20 3
with a seperate table for keywords
KeywordID KeywordText
1 Corporate
2 Business
Otherwise you'll need to massage the data in code.
Mmm, your keywords need to be in their own table (one record per keyword). Then you'll be able to do it, because the keywords will then GROUP properly.
Not sure if MySql has this, but SQL Server has a RANK() OVER PARTITION BY that you can use to assign each result a rank...doing so would allow you to only select those of Rank 1, and discard the rest.
You have two options as I see it.
Option 1:
Change the way your store your data (keywords in their own table, join the existing table with the keywords table using a many-to-many relationship). This will allow you to use DISTINCT. DISTINCT doesn't work currently because the query sees "corporate" and "corporate,business,strategy" as two different values.
Option 2:
Write some 'interesting' sql to split up the keywords strings. I don't know what the limits are in MySQL, but SQL in general is not designed for this.

Resources