How to count the number of words in a string - snowflake-cloud-data-platform

How do I count the number of words in a string with Snowflake? I cant seem to find anything on Google directly.

Try doing this. It assumes that a space separates every word:
SELECT ARRAY_SIZE(SPLIT(string,' '));

This UDF will handle punctuation, multiple spaces between words, line breaks, etc. It's not completely bulletproof, but should work for most use cases.
create or replace function WORD_COUNT(SENTENCE string)
returns number
language sql
as
$$
array_size(split(trim(regexp_replace(regexp_replace(SENTENCE, '[^A-Za-z0-9]', ' '), '[ ]{2,}', ' ')), ' '))
$$;
select word_count('The quick brown--fox jumps over.the;lazy?dog.');

Related

splitting text into array or splitting an array into more arrays

I am quite new to working with arrays in snowflake (and dbt).
I am splitting out vendor names that I have when they are filled in as join names in our front end. Now, I get them split (see the example below), but I would like to know if it is possible to split an already split array or another function I can use to split the text.
In the example below, the first 3 records are split correctly, but the following three, is not being split correctly, and this is where I would appreciate input.
The simple case statement I am using at the moment is:
,case
when upper(vendor_name) like '% AND %' then split(upper(vendor_name), ' AND ')
when vendor_name like '%&%' then split(upper(vendor_name), ' & ')
else array_construct(upper(vendor_name))
end as arr_vendor_name
You can do this in two steps:
Replace " & ", " AND ", ", " for ",".
Then split on ",".
split(
replace(replace(replace(vendor_name, ' AND ', ','), ' & ', ','), ', ', ',')
, ','
)
Bonus: You won't need the case anymore, and the array_construct goes away too.

How to replace string in "select" statement

I need to add comma after every six digits ,but I don't know its length and I can't use loops.
Thanks in advance.
I've tried REGEXP_REPLACE DB2 function, but it doesn't recognize my column as string.
For example , I need to replace "123456123456" with "123456, 123456".
Try this:
select rtrim(xmlcast(xmlquery('fn:replace($s, "([0-9]{6})", "$1, ")' passing str as "s") as varchar(4000)), ', ')
from table(values ('123456123456')) t(str);

How to use IN condition in SQL Server when comparing Varchars with trailing spaces?

Here is a sample of the issue:
SELECT 'True'
WHERE 'Hello ' IN ('Hello', 'Goodbye', 'Hello ')
Currently this returns 'True' because SQL ignores trailing spaces when comparing VARCHAR. In this related question, this can be solved by using LIKE, however this won't work with an IN condition.
How can I ensure that the comparison takes into account the trailing spaces, specifically when using the IN condition?
Edit:
My list of acceptable values can include items with trailing spaces. Looking to compare exact values essentially (i.e 'Hello ' won't match with 'Hello ')
Assuming that your list of acceptable values do not have trailing spaces, perhaps you could use:
SELECT 'True'
WHERE 'Hello ' IN ('Hello', 'Goodbye') AND 'Hello ' NOT LIKE '% '
You could add a non-space char to the end of your search temrs:
DECLARE #Terminator char(1) = '|';
SELECT 'True'
WHERE 'Hello ' + #Terminator IN ('Hello' + #Terminator , 'Goodbye' + #Terminator)
This will force the comparison to take into account the trailing spaces while keeping everything seargable. (I assume you want to use columns either on the left or on the right side of the IN operator)
I can think of this solution on top of my head:
SELECT 'True'
WHERE reverse('Hello ') IN (reverse('Hello'), reverse('Goodbye'))
Basically this forces to compare string using reverse function.
But Zohar's solution below is most performance driven solution.
SELECT 'True'
WHERE 'Hello '+'|' IN ('Hello'+'|', 'Goodbye'+'|')

SQL Server: Trim Character + Integer

Trying to trim the "+1" from the beginning of phone numbers. For example, after running the query I'm pulling: +12223334444 but need 2223334444. I've tried several trim functions but get an error saying "The trim function requires 1 argument(s)".
Sample portion of query:
Select
Ef.Name EForm,
C.Id Contact_Id,
P.Firstname + ' ' + P.Lastname Agent_Name,
P.Username Username,
C.Duration/1000 Call_Duration,
T.Name Team,
row_number()over(partition by c.id order by q2.text) Rank,
rtrim(c.ani,10) calling_number,
rtrim(c.dnis,10) called_number,
Thank you!
Call numbers are not Integers, never make that mistake, or numbers may be truncated from their leading zeroes.
Trimming is always related to space ' ' characters, nothing else.
You can TRIM(' Hello World ') and the result will be 'Hello World'.
If you want to remove the '+' character, you need to use REPLACE.
I think this resolves your initial thought process, but I would probably REPLACE the +1 like you found out.
RIGHT(c.ani,10) calling_number, RIGHT(c.dnis,10) called_number

Check if a string contains a substring in SQL Server 2005, using a stored procedure

I've a string, #mainString = 'CATCH ME IF YOU CAN'. I want to check whether the word ME is inside #mainString.
How do I check if a string has a specific substring in SQL?
CHARINDEX() searches for a substring within a larger string, and returns the position of the match, or 0 if no match is found
if CHARINDEX('ME',#mainString) > 0
begin
--do something
end
Edit or from daniels answer, if you're wanting to find a word (and not subcomponents of words), your CHARINDEX call would look like:
CHARINDEX(' ME ',' ' + REPLACE(REPLACE(#mainString,',',' '),'.',' ') + ' ')
(Add more recursive REPLACE() calls for any other punctuation that may occur)
You can just use wildcards in the predicate (after IF, WHERE or ON):
#mainstring LIKE '%' + #substring + '%'
or in this specific case
' ' + #mainstring + ' ' LIKE '% ME[., ]%'
(Put the spaces in the quoted string if you're looking for the whole word, or leave them out if ME can be part of a bigger word).

Resources