SQL column update to solve data quality issue - sql-server

I am unable to figure out how to write 'smart code' for this.
In this case I would like the end result for the first two case to be:
product_cat_name
A_SEE
A_BEE
Business is rule is such that one product_cat_name can belong to only one group but due to data quality issues we sometimes have a product_cat_name belonging to 2 different groups. As a special case in such a situation we would like to append group to the product_cat_name so that product_cat_name becomes unique.
It sounds so simple yet I am cracking my head over this.
Any help much appreciated.

Something like this:
with names as (
select prod_cat_nm , prod_cat_nm+group as new_nm from (query that joins 3 tables together) as qry
join
(Select prod_cat_nm, count(distinct group)
from (query that joins 3 tables together) as x
group by
prod_cat_nm
having count(distinct group) > 1) dups
on dups.prod_cat_nm = qry.prod_cat_nm
)
SELECT prod_cat_nm, STRING_AGG(New_nm, '') WITHIN GROUP (ORDER BY New_Nm ASC) AS new_prod_cat_nm
FROM names
GROUP BY prod_cat_nm;
I've used the 2017 STRING_AGG() here as its shortest to write - But you could easily change this to use Recursions or XML path

It is simple if you break it down into small pieces.
You need to UPDATE the table obviously, and change the value of product_cat_name. That's easy.
The new value should be group + product_cat_name. That's easy.
You only want to do this when a product_cat_name is associated with more than one group. That's probably the tricky part, but it can also be broken down into small pieces that are easy.
You need to identify which product_cat_names have more than one group. That's easy. GROUP BY product_cat_name HAVING COUNT(DISTINCT Group) > 1.
Now you need to use that to limit your UPDATE to only those product_cat_names. That's easy. WHERE product_cat_name IN (Subquery using above logic to get PCNs that have more than one Group).
All easy steps. Put them together and you've got your solution.

Related

Nested Query in SOQL

Can someone please help me generate SOQL for the below query.
Getting this error - Nesting of semi join sub-selects is not supported
SELECT External_ID_vod__c, FirstName, LastName, Middle_vod__c
FROM Account
where Id IN (select Account_vod__c from EM_Attendee_vod__c WHERE Id IN (SELECT Incurred_Expense_Attendee_vod__c
FROM Expense_Header_vod__c
where CALENDAR_YEAR(CreatedDate) > 2020 and Status_vod__c = 'Paid_in_Full_vod'))
Yes, with WHERE clauses you can go "down" the related list only 1 level, looks like you'd need 2 levels.
Couple ideas.
Can you do it in 2 steps? First select Account_vod__c from EM_Attendee_vod__c..., then pass the results to 2nd query.
See if you can eliminate a level by using rollup summary fields - but with this case might be tricky, rollup of all payments in 2020 might be not possible.
See if you can run a report that's close to what you need (even if it'd only grab these Account_vod__c) and you could use "reporting snapshot" - save the intermediate results of the report in a helper custom object. That could make it easier to query.
See if you can run the query by going "up". For example Account_vod__c is a real lookup/master-detail you could try with something like
select Account_vod__r.External_ID_vod__c, Account_vod__r.FirstName, Account_vod__r.LastName, Account_vod__r.Middle_vod__c
from EM_Attendee_vod__c
WHERE Id IN (SELECT Incurred_Expense_Attendee_vod__c
FROM Expense_Header_vod__c
where CALENDAR_YEAR(CreatedDate) > 2020 and Status_vod__c = 'Paid_in_Full_vod')
It's not perfect, it'd give you duplicate accounts if they have multiple attendees but it could work good enough. And in a pinch you could always try to deduplicate it with a GROUP BY Account_vod__r.External_ID_vod__c, Account_vod__r.FirstName, Account_vod__r.LastName, Account_vod__r.Middle_vod__c (although GROUP BY doesn't like to have more than 200 results... you could then cheat with LIMIT + OFFSET if you expect to have 2K accounts max)

Is there a way to sum an entire quantity in SQL with unique values

I am trying to get a total summation of both the ItemDetail.Quantity column and ItemDetail.NetPrice column. For sake of example, let's say the quantity that is listed is for each individual item is 5, 2, and 4 respectively. I am wondering if there is a way to display quantity as 11 for one single ItemGroup.ItemGroupName
The query I am using is listed below
select Location.LocationName, ItemDetail.DOB, SUM (ItemDetail.Quantity) as "Quantity",
ItemGroup.ItemGroupName, SUM (ItemDetail.NetPrice)
from ItemDetail
Join ItemGroupMember
on ItemDetail.ItemID = ItemGroupMember.ItemID
Join ItemGroup
on ItemGroupMember.ItemGroupID = ItemGroup.ItemGroupID
Join Location
on ItemDetail.LocationID = Location.LocationID
Inner Join Item
on ItemDetail.ItemID = Item.ItemID
where ItemGroup.ItemGroupID = '78' and DOB = '11/20/2019'
GROUP BY Location.LocationName, ItemDetail.DOB, Item.ItemName,
ItemDetail.NetPrice, ItemGroup.ItemGroupName
If you are using SQL Server 2012 , you can use the summation on partition to display the
details and aggregates in the same query.
SUM(SalesYTD) OVER (ORDER BY DATEPART(yy,ModifiedDate)),1)
Link :
https://learn.microsoft.com/en-us/sql/t-sql/functions/sum-transact-sql?view=sql-server-ver15
We can't be certain without seeing sample data. But I suspect you need to remove some fields from you GROUP BY clause -- probably Item.ItemName and ItemDetail.NetPrice.
Generally, you won't GROUP BY a column that you are applying an aggregate function to in the SELECT -- as in SUM(ItemDetail.NetPrice). And it is not very common, in my experience, to GROUP BY columns that aren't included in the SELECT list - as you are doing with Item.ItemName.
I think you need to go back to basics and read about what GROUP BY does.
First of all welcome to the overflow...
Second: The answer is going to be "It depends"
Any time you aggregate data you will need to Group by the other fields in the query, and you have that in the query. The gotcha is what happens when data is spread across multiple locations.
My suggestion is to rethink your problem and see if you really need these other fields in the query. This will depend on what the person using the data really wants to know.
Do they need to know how many of item X there are, or do they really need to know that item X is spread out over three sites?
You might find you are better off with two smaller queries.

SalesForce SOQL highest number from size column

i'm new to SOQL and SF, so bear with me :)
I have Sales_Manager__c object tied to Rent__c via Master-Detail relationship. What i need is to get Manager with highest number of Rent deals for a current year. I figured out that the Rents__r.size column stores that info. The question is how can i gain access to the size column and retrieve highest number out of it?
Here is the picture of query i have
My SOQL code
SELECT (SELECT Id FROM Rents__r WHERE StartDate__c=THIS_YEAR) FROM Sales_Manager__c
Try other way around, by starting the query from Rents and then going "up". This will let you sort by the count and stop after say top 5 men (your way would give you list of all managers, some with rents, some without and then what, you'd have to manually go through the list and find the max).
SELECT Manager__c, Manager__r.Name, COUNT(Id)
FROM Rent__c
WHERE StartDate__c=THIS_YEAR
GROUP BY Manager__c, Manager__r.Name
ORDER BY COUNT(Id) DESC
LIMIT 10
(I'm not sure if Manager__c is the right name for your field, experiment a bit)

RunningDifference(x) for multiple x values

My Table tbl_data(event_time, monitor_id,type,event_date,status)
select status,sum(runningDifference(event_time)) as delta from (SELECT status,event_date,event_time FROM tbl_data WHERE event_date >= '2018-05-01' AND monitor_id =3 ORDER BY event_time ASC) group by status
Result will be
status delta
1 4665465
2 965
This query result give me right answer for single monitor_id, Now I required it for multiple monitor_id,
How can I achieve it in single/same query??
Usually this is achieved with conditional expressions. Like SELECT ...,if(monitor_id = 1, status, NULL) AS status1,... and then you do your aggregate function that, as you might know, skips NULL values. But I did some testing and it turns out that because of clickhouse internals runningDifference() can't distinguish columns originated from the same source. At the same time it distinguishes columns that came from different sources just fine. It is a bug.
I opened an issue on Github: https://github.com/yandex/ClickHouse/issues/2590
UPDATE: Devs reacted incredibly fast and with the latest source from master you can get what you want with the strategy I described. See the issue for code example.

Data model for unique sets

I am hunting for the best way to implement a data model for "recipes"
think like a pizza app where you can compose your own pizza. you select maybe 5 out of 100 ingredients and you select an amount for each. I need to check if I've "seen" that pizza combination before, assign ID if I have not, and retrieve ID if I have.
We have n ingredients.
A recipe is defined by a set of ingredients and a corresponding amount.
Could look like:
Ingr1 90
Ingr2 10
or
Ingr1 90
Ingr2 10
Ingr3 10
I want to store this in a structure where I give each unique recipe an ID, and so it's possible for me to query for the ID given the recipe data set.
I want a stored procedure that takes a data set as a parameter and returns an ID that is new if the recipe was unknown and existing if the recipe already exists.
I am looking for the most efficient way of doing this. My best idea so far is to either encode the recipe as a string (json) and use this as a unique constraint, or have a stored procedure that iterates through the recipe data set and constructs a n level deep if exists statement.
So, I'm confident I can solve the problem, but am looking for a beautiful method.
As far as I can see, you have entities Recipe and Ingredient and M:M relation between them. Data model can look like this (PK in bold):
Recipe (RecipeID, RecipeName)
Ingredient(IngredientID, IngredientName)
RecipeIngredients(RecipeID, IngredientID, Amount)
You can solve task of finding out if same recipe is already present in a database using query but this query wouldn't be simple. It is well-know problem, relational division. There are several approaches. One of the most popular is counting. If some recipe has same amount of ingredients as target one and all ingredients are the same, then they are equal. Such queries often involves data aggregations and perform not very fast on big amount of data.
You can help to solve this problem from application side and you are thinking in right direction. Represent recipe as a string, ordering values by IngredientID (to get same string even if ingredients were added in different order), converting Amount in some stable form (not to get 0.499999 instead of 0.5), calculate some hash out of string, and store this value in Recipe. In simple form hash is an integer value, so you can find doubles very fast.
So it is your call. Every approach has it's own issues. Heavy query in first case and hassle to keep hash in actual state in second case (and possible collisions too). I'd stick with first option until it works OK and start any optimizations only when they are unavoidable.
Query example (new recipe is in #tmp):
;with totals as
(
select RecipeID, count(*) totals
from RecipeIngredients
group by RecipeID
), matched_totals as
(
select i.RecipeID, count(*) matched_totals
from RecipeIngredients i
join #tmp t
on i.IngredientID = t.IngredientID
and i.Amount = t.Amount
group by i.RecipeID
)
select t.*
from totals t
join matched_totals m
on m.RecipeID = t.RecipeID
where
totals = matched_totals
and totals = (select count(*) from #tmp)
This solution is more elegant but much less intuitive:
select *
from Recipe r
where
not exists
( select 1
from RecipeIngredients ri
where
r.RecipeID = ri.RecipeID
and not exists
(select 1 from #tmp t where t.IngredientID = ri.IngredientID)
)

Resources