Add Comma to a String in Snowflake - snowflake-cloud-data-platform

I need to add a comma to a string using Snowflake
EX: Austin TX --> Austin, TX
I already tried (b2_loc ||', '|| (RIGHT(b2_loc, 2))) AS b2_loc which gave me Austin TX, TX

Sounds like a job for String Functions (Regular Expressions)
with cities(city) as (
select * from values
('Austin TX'),
('Nashville TN'),
('Minneapolis MN'),
('St. Louis MO')
)
select regexp_replace(city, '(.*) (.*)', '\\1, \\2') as city
from cities;
CITY
Austin, TX
Nashville, TN
Minneapolis, MN
St. Louis, MO
A good site to learn more about regular expressions Regular Expressions Info

I would inclined to use Dave's answer. But you answer can be fixed:
select
column1 as b2_loc
,(b2_loc ||', '|| (RIGHT(b2_loc, 2))) AS wrong
,trim(substring(b2_loc,0, length(b2_loc)-2)) ||', '|| (RIGHT(b2_loc, 2)) AS correct
from values
('Austin TX')
;
B2_LOC
WRONG
CORRECT
Austin TX
Austin TX, TX
Austin, TX
the reason this happens is, you are get all of the input string then attaching the comma and the last two values.. thus you don't want all of the input, just the minus 2 part of it, and then the whitespace also trimmed. Unfortunately you cannot use -2 in the substring to make it right side relative (like you can is some other languages) so LENGTH also needs to be used, then TRIM to remove whitespace.

Assuming state abbreviations are always 2 characters preceded by a single space, you could use insert
set str='Austin TX';
select insert($str,length($str)-2,0,',')
Alternatively, you could also reverse the string, insert your comma, and reverse it back
select reverse(insert(reverse($str),4,0,','));

Related

Snowflake regexp & split

I have column values that are in between starting of the underscore() and ending with an underscore().
I am trying to see how to extract a value between 2 underscores (_). for example, "xxxx_Whats your number 23345_xxxxx".
I want to discard everything before and after underscore(_).
Any help is greatly appreciated.
REGEXP_SUBSTRING, using a grouping match ( ) and turning on sub-matches 'e', and selecting the first match.. then stating you want to see an underscore, and then many not underscores, and then a underscore.
select
column1,
regexp_substr(column1, '_([^_]*)_',1,1,'e')
from values
('xxxx_Whats your number 23345_xxxxx')
gives:
COLUMN1
REGEXP_SUBSTR(COLUMN1, '([^]*)_',1,1,'E')
xxxx_Whats your number 23345_xxxxx
Whats your number 23345
hmm, you mention discard before and after, thus if you want to include the underscore you will need to move them into the grouping brackets:
select
column1,
regexp_substr(column1, '_([^_]*)_',1,1,'e') as exclude_underscore,
regexp_substr(column1, '(_[^_]*_)',1,1,'e') as include_underscore
from values
('xxxx_Whats your number 23345_xxxxx'),
('has no first underscore_xxxxx'),
('xxx_has no last underscore'),
('nothing between__the underscores');
COLUMN1
EXCLUDE_UNDERSCORE
INCLUDE_UNDERSCORE
xxxx_Whats your number 23345_xxxxx
Whats your number 23345
_Whats your number 23345_
has no first underscore_xxxxx
null
null
xxx_has no last underscore
null
null
nothing between__the underscores
__
then you might also want, atleast 1 character between the underscrores, and thus should change the * to a + or a {n,}

String concatenation based of column length

i have telephone number like this in one table:
ID Telephone extention
------------------------------
1 9986323422 4
2 9992108 2222
3 9962718 241
Final result wanted is number of digit in extention will be taken and replace the end digit/(s) of "Telephone" column.
want my result to be:
ID Telephone extention result
-----------------------------------------
1 9986323422 4 9986323424
2 9992108 2222 9992222
3 9962718 241 9962241
I have 100k records like this. What is the best and quick way to achieve this? Thanks.
This may be a little too cute1 but is an alternative to the STUFF approaches:
SELECT ID,Telephone,Extension,
SUBSTRING(Telephone,1-LEN(Extension),LEN(Telephone)) + Extension as Result
It works because negative arguments to the start parameter for SUBSTRING allow you to truncate the end of the string by those amounts.
1It avoid repetitive calls to LEN(), but the optimizer should be able to avoid duplication anyway and avoids having to reverse the entire string, but this does come at a readability cost.
You can use STUFF() together with some calculations with LEN()
DECLARE #dummyTable TABLE(ID INT,Telephone VARCHAR(100), extention VARCHAR(100));
INSERT INTO #dummyTable VALUES
(1,'9986323422','4')
,(2,'9992108','2222')
,(3,'9962718','241');
SELECT *
,STUFF(t.Telephone,LEN(t.Telephone)-LEN(t.extention)+1,LEN(t.extention),t.extention) AS result
FROM #dummyTable AS t;
You might have to add some validations to avoid errors (e.g. length of extension should be smaller than of phone number)
In similar way use reverse() function with stuff() function to replace ends digits of Telephone value with extention value
select *, reverse(stuff(reverse(Telephone), 1, len(extention), reverse(extention)))
from table

SQL Server - Split string on last occuring number

I have following column (with fictional data):
Location
---------------
15630London
45680Edinburg
138739South Wales
This column contains both Zipcodes and City names. I want to split those 2 into 2 seperate columns.
So in this case, my output would be:
Zip | City
-------|---------
15630 | London
45680 | Edinburg
138739 | South Wales
I tried the zipcode with
LEFT(location,LEN(location)-CHARINDEX('L',location))
But I couldn't find out how to set the CHARINDEX to work on all letters.
Any suggestions / other ideas?
Here is one way using PATINDEX and some string functions
SELECT LEFT(Location, Patindex('%[a-z]%', Location) - 1),
Substring(Location, Patindex('%[a-z]%', Location), Len(Location))
FROM (VALUES ('15630London'),
('45680Edinburg'),
('138739South Wales'))tc(Location)
Note : Above code considers always zip codes are numbers and characters start only with city name and city is present in all the rows (ie) strings are present in every row
Detect the first non-numeric character and pull of that many chars from the left, then read beyond that point to the end:
select
left(Location, patindex('%[^0-9]%', Location) - 1),
substring(Location, patindex('%[^0-9]%', Location), len(Location))
from t
declare #string varchar(200)='15630London'
select substring(#string,1,patindex('%[a-z,A-Z]%',#string)-1),
substring(#string,patindex('%[a-z,A-Z]%',#string),len(#string))

How do I match a substring of variable length?

I am importing data into my SQL database from an Excel spreadsheet.
The imp table is the imported data, the app table is the existing database table.
app.ReceiptId is formatted as "A" followed by some numbers. Formerly it was 4 digits, but now it may be 4 or 5 digits.
Examples:
A1234
A9876
A10001
imp.ref is a free-text reference field from Excel. It consists of some arbitrary length description, then the ReceiptId, followed by an irrelevant reference number in the format " - BZ-0987654321" (which is sometimes cropped short, or even missing entirely).
Examples:
SHORT DESC A1234 - BZ-0987654321
LONGER DESCRIPTION A9876 - BZ-123
REALLY LONG DESCRIPTION A2345 - B
REALLY REALLY LONG DESCRIPTION A23456
The code below works for a 4-digit ReceiptId, but will not correctly capture a 5-digit one.
UPDATE app
SET
[...]
FROM imp
INNER JOIN app
ON app.ReceiptId = right(right(rtrim(replace(replace(imp.ref,'-',''),'B','')),5)
+ rtrim(left(imp.ref,charindex(' - BZ-',imp.ref))),5)
How can I change the code so it captures either 4 (A1234) or 5 (A12345) digits?
As ughai rightfully wrote in his comment, it's not recommended to use anything other then columns in the on clause of a join.
The reason for that is that using functions prevents sql server for using any indexes on the columns that it might use without the functions.
Therefor, I would suggest adding another column to imp table that will hold the actual ReceiptId and be calculated during the import process itself.
I think the best way of extracting the ReceiptId from the ref column is using substring with patindex, as demonstrated in this fiddle:
SELECT ref,
RTRIM(SUBSTRING(ref, PATINDEX('%A[0-9][0-9][0-9][0-9]%', ref), 6)) As ReceiptId
FROM imp
Update
After the conversation with t-clausen-dk in the comments, I came up with this:
SELECT ref,
CASE WHEN PATINDEX('%[ ]A[0-9][0-9][0-9][0-9][0-9| ]%', ref) > 0
OR PATINDEX('A[0-9][0-9][0-9][0-9][0-9| ]%', ref) = 1 THEN
SUBSTRING(ref, PATINDEX('%A[0-9][0-9][0-9][0-9][0-9| ]%', ref), 6)
ELSE
NULL
END As ReceiptId
FROM imp
fiddle here
This will return null if there is no match,
when a match is a sub string that contains A followed by 4 or 5 digits, separated by spaces from the rest of the string, and can be found at the start, middle or end of the string.
Try this, it will remove all characters before the A[number][number][number][number] and take the first 6 characters after that:
UPDATE app
SET
[...]
FROM imp
INNER JOIN app
ON app.ReceiptId in
(
left(stuff(ref,1, patindex('%A[0-9][0-9][0-9][0-9][ ]%', imp.ref + ' ') - 1, ''), 5),
left(stuff(ref,1, patindex('%A[0-9][0-9][0-9][0-9][0-9][ ]%', imp.ref + ' ') - 1, ''), 6)
)
When using equal, the spaces after is not evaluated

Right pad a string with variable number of spaces

I have a customer table that I want to use to populate a parameter box in SSRS 2008. The cust_num is the value and the concatenation of the cust_name and cust_addr will be the label. The required fields from the table are:
cust_num int PK
cust_name char(50) not null
cust_addr char(50)
The SQL is:
select cust_num, cust_name + isnull(cust_addr, '') address
from customers
Which gives me this in the parameter list:
FIRST OUTPUT - ACTUAL
1 cust1 addr1
2 customer2 addr2
Which is what I expected but I want:
SECOND OUTPUT - DESIRED
1 cust1 addr1
2 customer2 addr2
What I have tried:
select cust_num, rtrim(cust_name) + space(60 - len(cust_name)) +
rtrim(cust_addr) + space(60 - len(cust_addr)) customer
from customers
Which gives me the first output.
select cust_num, rtrim(cust_name) + replicate(char(32), 60 - len(cust_name)) +
rtrim(cust_addr) + replicate(char(32), 60 - len(cust_addr)) customer
Which also gives me the first output.
I have also tried replacing space() with char(32) and vice versa
I have tried variations of substring, left, right all to no avail.
I have also used ltrim and rtrim in various spots.
The reason for the 60 is that I have checked the max length in both fields and it is 50 and I want some whitespace between the fields even if the field is maxed. I am not really concerned about truncated data since the city, state, and zip are in different fields so if the end of the street address is chopped off it is ok, I guess.
This is not a show stopper, the SSRS report is currently deployed with the first output but I would like to make it cleaner if I can.
Whammo blammo (for leading spaces):
SELECT
RIGHT(space(60) + cust_name, 60),
RIGHT(space(60) + cust_address, 60)
OR (for trailing spaces)
SELECT
LEFT(cust_name + space(60), 60),
LEFT(cust_address + space(60), 60),
The easiest way to right pad a string with spaces (without them being trimmed) is to simply cast the string as CHAR(length). MSSQL will sometimes trim whitespace from VARCHAR (because it is a VARiable-length data type). Since CHAR is a fixed length datatype, SQL Server will never trim the trailing spaces, and will automatically pad strings that are shorter than its length with spaces. Try the following code snippet for example.
SELECT CAST('Test' AS CHAR(20))
This returns the value 'Test '.
This is based on Jim's answer,
SELECT
#field_text + SPACE(#pad_length - LEN(#field_text)) AS RightPad
,SPACE(#pad_length - LEN(#field_text)) + #field_text AS LeftPad
Advantages
More Straight Forward
Slightly Cleaner (IMO)
Faster (Maybe?)
Easily Modified to either double pad for displaying in non-fixed width fonts or split padding left and right to center
Disadvantages
Doesn't handle LEN(#field_text) > #pad_length
Based on KMier's answer, addresses the comment that this method poses a problem when the field to be padded is not a field, but the outcome of a (possibly complicated) function; the entire function has to be repeated.
Also, this allows for padding a field to the maximum length of its contents.
WITH
cte AS (
SELECT 'foo' AS value_to_be_padded
UNION SELECT 'foobar'
),
cte_max AS (
SELECT MAX(LEN(value_to_be_padded)) AS max_len
)
SELECT
CONCAT(SPACE(max_len - LEN(value_to_be_padded)), value_to_be_padded AS left_padded,
CONCAT(value_to_be_padded, SPACE(max_len - LEN(value_to_be_padded)) AS right_padded;
declare #t table(f1 varchar(50),f2 varchar(50),f3 varchar(50))
insert into #t values
('foooo','fooooooo','foo')
,('foo','fooooooo','fooo')
,('foooooooo','fooooooo','foooooo')
select
concat(f1
,space(max(len(f1)) over () - len(f1))
,space(3)
,f2
,space(max(len(f2)) over () - len(f2))
,space(3)
,f3
)
from #t
result
foooo fooooooo foo
foo fooooooo fooo
foooooooo fooooooo foooooo

Resources