Way to prevent case statement from excluding values? - google-data-studio

i'm pretty new to the sql world and i've seem to run into a roadblock. Multiple conditions seems to be a regular question but I couldn't find a thread on my current problem. I'm using google data studio and have a dataset in google sheets that includes multiple terms and I want to know how many times each term shows up. Here's and example of my dataset and current written statement
Sample Dataset from the Column labeled "Answer"
mini
profanity
mini, profanity
mini, recorded
credit bureau
Suspicious
CASE
WHEN CONTAINS_TEXT(Answer,"Credit Bureau") THEN "Credit"
WHEN CONTAINS_TEXT(Answer,"Mini") THEN "Mini"
WHEN CONTAINS_TEXT(Answer,"Profanity") THEN "Profanity"
WHEN CONTAINS_TEXT(Answer,"Bankruptcy") THEN "Bank"
WHEN CONTAINS_TEXT(Answer,"Recorded") THEN "Recorded"
WHEN CONTAINS_TEXT(Answer,"Suspicious") THEN "SAR"
ELSE "Other"
END
This statement works perfectly for cells that ONLY contain these terms. However the problem is that in the dataset there are multiple occurrences of the terms showing up in the same cell. Unfortunately the statement will only count the first term and ignore the others, giving me inaccurate totals. For example, in the mini dataset i provided, even though you see profanity twice, it would only be counted once. Does anyone know any potential workarounds or suggestions? Any help would be appreciated.

Count them in separate columns for each category:
credit: case when contains_text(answer, 'credit bureau') then 1 else 0 end a
mini: case when contains_text(answer, 'mini') then 1 else 0 end),
profanity: case when contains_text(answer, 'profanity') then 1 else 0 end)
bank: case when contains_text(answer, 'bankruptcy') then 1 else 0 end)
recorded: case when contains_text(answer, 'recorded') then 1 else 0 end)
sar: case when contains_text(answer, 'suspicious') then 1 else 0 end)
other: case when not contains_text(answer, 'credit bureau')
and not contains_text(answer, 'mini')
and not contains_text(answer, 'profanity')
and not contains_text(answer, 'bankruptcy')
and not contains_text(answer, 'recorded')
and not contains_text(answer, 'suspicious') then 1 else 0 end
Then sum each of those columns to get independent totals for each type.

Related

Dealing with errors while parsing strings

I'm tasked with pulling relevent data out of a field which is essentially free text. I have been able to get the information I need 98% of the time by looking for keywords and using CASE statements to break the field down into 5 different fields.
My issue is I can't get around the last 2% because the errors don't follow any logical order - they are mostly misspellings.
I could bypass the field with a TRY CATCH, but I don't like giving up 4 good pieces of data when the routine is choking on one.
Is there any way to handle blanket errors within a CASE statement, or is there another option?
Current code, the 'b' with the commented out section is where it's choking right now:
CASE WHEN #Location = 0 THEN
CASE WHEN #Duration = 0 THEN
CASE WHEN #Timing = 0 THEN
SUBSTRING(#Comment,#Begin, #Context-#Begin)
ELSE
SUBSTRING(#Comment,#Begin, #Timing-#Begin)
END
ELSE SUBSTRING(#Comment,#Begin, #Duration-#Begin)
END
ELSE SUBSTRING(#Comment,#Begin, #Location-#Begin)
END AS Complaint
,CASE WHEN #Location = 0 THEN ''
ELSE
CASE WHEN #Duration = 0 THEN
CASE WHEN #Timing = 0 THEN SUBSTRING(#Comment,#Location+10, (#CntBegin-11))
ELSE SUBSTRING(#Comment,#Location+10, #Timing-(#Location+10))
END
ELSE SUBSTRING(#Comment,#Location+10, #Duration-(#Location+10))
END
END AS Location
,CASE WHEN #Timing = 0 THEN ''
ELSE
CASE WHEN #CntBegin = 0 THEN
SUBSTRING(#Comment,#Timing+#TimingEnd, (#Location+#Context)-(#Timing+#TimingEnd))
ELSE
'b'--SUBSTRING(#Comment,#Timing+#TimingEnd, (#Location+#CntBegin-1)-(#Timing+#TimingEnd))
END
END AS Timing
On this statement, which has a comma in an odd spot. I have to reference the comma usually for the #CntBegin, but in this case it's making my (#Location+#CntBegin-1) shorter then the (#Timing+#TimingEnd):
'Pt also presents with/for mild check MGP/MGD located in OU, since 12/2015 ? Stability.'
Please take into account, I'm not necessarily trying to fix this error, I'm looking for a way to handle any error that comes up as who knows what someone is going to type. I'd like to just display 'ERR' in that particular field when the code runs into something it can't handle. I just don't want the routine to die.
Assuming your error is due to the length parameter in SUBSTRING being less than 0. I always alias my parameters using CROSS APPLY and then validate the input before calling SUBSTRING(). Something like this should work:
SELECT
CASE WHEN CA.StringLen > 0 /*Ensure valid length*/
THEN SUBSTRING(#comment,#Timing+#TimingEnd,CA.StringLen)
ELSE 'Error'
END
FROM YourTable
CROSS APPLY (SELECT StringLen = (#Location+#CntBegin-1)-(#Timing+#TimingEnd)) AS CA

Average of a player

I have a record that contains stat for a certain cricket player.
It has columns having dates, oppositions, Runs, Balls, Dismissals, Match_Number.
I want to do a query (SQL SERVER) to find out the batting average where every runs (Sum) is to be added; innings having a count of all innings except DNB but dismissal should not have a count of "Not Out", "Retired Hurt", "DNB" grouped by the opposition.
Note : DNB means Did not Bat.
The query doesn't have the required number of innings to calculate the average
So the problem is can't gather information for a single entity (count of no. of innings) having two set of parameters.
Without DNB
Without DNB, Not Out, Retired Hurt.
Please suggest.
You can put a case expression within an aggregate to exclude certain rows from a count/sum/average etc. So you could use something like this:
SELECT a.Opposition,
Matches = COUNT(*),
Innings = COUNT(CASE WHEN a.Dismissal <> 'DNB' THEN 1 END),
Runs = SUM(a.Runs),
Average = SUM(a.Runs) / NULLIF(COUNT(CASE WHEN a.Dismissal NOT IN ('DNB', 'Not Out', 'Retired not out') THEN 1 END), 0)
FROM dbo.SRTundlkarODI AS a
GROUP BY a.Opposition;
N.B. I have wrapped the COUNT for the average in NULLIF(<exp>, 0) so that should the batsmen have never got out you avoid a divide by zero error.

SQLite json_array_length

Trying to run a query on a column containing JSON data.
When I run the following:
SELECT json_array_length(threads.participants)
FROM threads
I get results of 2 and 119 (2 json arrays in most records, 119 in just one of them).
I tried another query using a CASE statement as the following:
SELECT
CASE threads.participants
WHEN json_array_length(threads.participants) = 2 THEN "2 Participants"
ELSE "More than 2 Participants"
END AS "Number of Participants"
FROM threads
I get all NULL results for the second query. I expected to see the same number of results as the first query, just changed to "2 Participants" and "More than 2 Participants". What am I missing here?
The syntax you use in the CASE expression is wrong.
Try this:
SELECT CASE json_array_length(participants)
WHEN 2 THEN '2 Participants'
ELSE 'More than 2 Participants'
END AS "Number of Participants"
FROM threads
Just to be on the safe side, you should include all possible cases (even if they don't currently exist):
SELECT CASE json_array_length(participants)
WHEN 0 THEN 'No Participants'
WHEN 1 THEN '1 Participant'
WHEN 2 THEN '2 Participants'
ELSE 'More than 2 Participants'
END AS "Number of Participants"
FROM threads

How to count the number of cells with a certain value in Google Data Studio

I have a Google sheet with customers that came from different sources and bought different products. Table looks like this:
utm_campaign user_product
campaign_1 1st_product
campaign_2 2nd_product
campaign_3 1st_product
campaign_1 2nd_product
campaign_2 1st_product
I want to count the number of cells in a row "user_product" with different values. what formula I should use to transform it in Data Studio into this:
utm_campaign 1st_product 2nd_product
campaign_1 1 1
campaign_2 1 1
campaign_3 1 0
I have tried this formula
SUM(
CASE
WHEN user_product = "1st_product"
THEN "1"
ELSE "0"
END
)
but something went wrong
Field name contains invalid table alias: t0._339410717_
Seems right except you are casting the numbers as strings by using speech marks so it can't add them up
SUM(
CASE
WHEN user_product = "1st_product"
THEN 1
ELSE 0
END
)
Alternatively you could put the two dimensions into a pivot table with a count to get your required answer

Im trying to determine how to do a multi level count

I am relatively new to SQL, and I'm probably overlooking a simple answer.
For example, I have a table with the following columns
CustomerID
Hair_Color
Eye_Color
Skin_Color
Braces (Y/N)
I want 4 counts of unique CustomerIDs
I want a count of those that are blonde
Then I want a count of all those that have blue eyes, but if they were blonde, I don’t want them included
Then I want a count of all those that are Caucasian, but if they were blonde or had blue eyes, I don’t want them included
Then I want a count of all those with braces, but if they were blonde, had blue eyes, or were Caucasian, I don’t want them included.
CASE Statements to the rescue:
SELECT
SUM(CASE when hair_color = 'blonde' then 1 ELSE 0 END) as blondes,
SUM(CASE WHEN eye_color = 'blue' and hair_color <> 'blonde' THEN 1 ELSE 0 END) as 'Blue_eyed_non_blondes',
etc..
FROM table

Resources