T SQL Extract String between penultimate and last comma

T SQL Extract String between penultimate and last comma - sql-server

I'm using MS SQL Server 2014 SP3.
I have a column called person_loader that contains one large string. I have no control over this as its from a 3rd party system.
Sample data:
1. Bob Smith, 01/01/1980, "email: bob#test.com, mobile: 012345687",USA, Joiner, 05/04/2022
2. Dolly Smith, 02/03/1978, "email: dolly#test.com", UK, Singer,
3. Dave Smith, 09/08/78,"mobile: 98745632", USA, Unemployed, 04/04/2022
4. Bud Smith, 07/07/80,"email:bud.smith#test.com, mobile: 0147852369", UK, Dr,
I want to extract the string between the penultimate and last ','. Here is the result:
1. Joiner
2. Singer
3. Unemployed
4. Dr
Sometimes the string won't end with the date, but there will always be a comma.
I can extract everything right of the last comma, but how do I build on this?
SELECT RIGHT([person_loader], CHARINDEX(',', REVERSE([person_loader])) - 1)
FROM tblCustomer;

In newer (and supported) versions of SQL Server, this can be much easier with, say, OPENJSON. In older, unsupported versions, you're stuck with ugly string parsing... not one of SQL Server's strong suits.
;WITH level1 AS
(
SELECT pl = SUBSTRING
(
person_loader,
1,
LEN(person_loader) - CHARINDEX(',', REVERSE(person_loader))
)
FROM dbo.tblCustomer
)
SELECT LTRIM(RIGHT(pl, CHARINDEX(',', REVERSE(pl))-1)) FROM level1;
Output:
(No column name)
Joiner
Singer
Unemployed
Dr
Example db<>fiddle
This will fail, of course, if there are strings in the table with one or zero commas. You can deal with this (along with empty strings and NULL) using a bunch of additional (and even uglier) COALESCE(NULLIF( handling:
;WITH level1 AS
(
SELECT pl = SUBSTRING
(
person_loader,
1,
LEN(person_loader)
- COALESCE(NULLIF(CHARINDEX(',',
REVERSE(person_loader)),0),0)
)
FROM dbo.tblCustomer
)
SELECT COALESCE(LTRIM(RIGHT(pl,
COALESCE(NULLIF(CHARINDEX(',',
REVERSE(pl)), 0),1)-1)), '')
FROM level1;

Related

Snowflake: Trouble getting numbers to return from a PIVOT function

I am moving a query from SQL Server to Snowflake. Part of the query creates a pivot table. The pivot table part works fine (I have run it in isolation, and it pulls numbers I expect).
However, the following parts of the query rely on the pivot table- and those parts fail. Some of the fields return as a string-type. I believe that the problem is Snowflake is having issues converting string data to numeric data. I have tried CAST, TRY_TO_DOUBLE/NUMBER, but these just pull up 0.
I will put the code down below, and I appreciate any insight as to what I can do!
CREATE OR REPLACE TEMP TABLE ATTR_PIVOT_MONTHLY_RATES AS (
SELECT
Market,
Coverage_Mo,
ZEROIFNULL(TRY_TO_DOUBLE('Starting Membership')) AS Starting_Membership,
ZEROIFNULL(TRY_TO_DOUBLE('Member Adds')) AS Member_Adds,
ZEROIFNULL(TRY_TO_DOUBLE('Member Attrition')) AS Member_Attrition,
((ZEROIFNULL(CAST('Starting Membership' AS FLOAT))
+ ZEROIFNULL(CAST('Member Adds' AS FLOAT))
+ ZEROIFNULL(CAST('Member Attrition' AS FLOAT)))-ZEROIFNULL(CAST('Starting Membership' AS FLOAT)))
/ZEROIFNULL(CAST('Starting Membership' AS FLOAT)) AS "% Change"
FROM
(SELECT * FROM ATTR_PIVOT
WHERE 'Starting Membership' IS NOT NULL) PT)
I realize this is a VERY big question with a lot of moving parts... So my main question is: How can I successfully change the data type to numeric value, so that hopefully the formulas work in the second half of the query?
Thank you so much for reading through it all!
EDITED FOR SHORTENING THE QUERY WITH UNNEEDED SYNTAX
CAST(), TRY_TO_DOUBLE(), TRY_TO_NUMBER(). I have also put the fields (Starting Membership, Member Adds) in single and double quotation marks.

Unless you are quoting your field names in this post just to highlight them for some reason, the way you've written this query would indicate that you are trying to cast a string value to a number.
For example:
ZEROIFNULL(TRY_TO_DOUBLE('Starting Membership'))
This is simply trying to cast a string literal value of Starting Membership to a double. This will always be NULL. And then your ZEROIFNULL() function is turning your NULL into a 0 (zero).
Without seeing the rest of your query that defines the column names, I can't provide you with a correction, but try using field names, not quoted string values, in your query and see if that gives you what you need.

You first mistake is all your single quoted columns names are being treated as strings/text/char
example your inner select:
with ATTR_PIVOT(id, studentname) as (
select * from values
(1, 'student_a'),
(1, 'student_b'),
(1, 'student_c'),
(2, 'student_z'),
(2, 'student_a')
)
SELECT *
FROM ATTR_PIVOT
WHERE 'Starting Membership' IS NOT NULL
there is no "starting membership" column and we get all the rows..
ID
STUDENTNAME
1
student_a
1
student_b
1
student_c
2
student_z
2
student_a
So you need to change 'Starting Membership' -> "Starting Membership" etc,etc,etc
As Mike mentioned, the 0 results is because the TRY_TO_DOUBLE always fails, and thus the null is always turned to zero.
now, with real "string" values, in real named columns:
with ATTR_PIVOT(Market, Coverage_Mo, "Starting Membership", "Member Adds", "Member Attrition") as (
select * from values
(1, 10 ,'student_a', '23', '150' )
)
SELECT
Market,
Coverage_Mo,
ZEROIFNULL(TRY_TO_DOUBLE("Starting Membership")) AS Starting_Membership,
ZEROIFNULL(TRY_TO_DOUBLE("Member Adds")) AS Member_Adds,
ZEROIFNULL(TRY_TO_DOUBLE("Member Attrition")) AS Member_Attrition
FROM ATTR_PIVOT
WHERE "Starting Membership" IS NOT NULL
we get what we would expect:
MARKET
COVERAGE_MO
STARTING_MEMBERSHIP
MEMBER_ADDS
MEMBER_ATTRITION
1
10
0
23
150

Changing character in a string of characters

I was wondering regarding how to edit the following column that exists in oracle DB
PPPPFPPPPPPPPPPPPPPPPPPPPPPPPFPPPPPPPP
I want to only set the 5th F with P without affecting other structure.
I've around 700 records and I want to change that position (5th) on all users to P
I was thinking of PLSQL instead of a query, so could you please advice.
Thanks

Use REGEXP_REPLACE:
> SELECT REGEXP_REPLACE('PPPPFPPPPPPPPPPPPPPPPPPPPPPPPFPPPPPPPP', '^(\w{4}).(.*)', '\1P\2') AS COL_REGX FROM dual
COL_REGX
--------------------------------------
PPPPPPPPPPPPPPPPPPPPPPPPPPPPPFPPPPPPPP

Klashxx answer is a good one - REGEXP_REPLACE is the way to go. Old fashioned way built up bit by bit so you can see what's going on :
WITH
test_data (text)
AS (SELECT '1234F1234F1234F1234F1234F1234F1234' FROM DUAL
)
SELECT
text
,INSTR(text,'F',1,5) --fifth occurence
,SUBSTR(text,1,INSTR(text,'F',1,5)-1) --substr up to that point
,SUBSTR(text,1,INSTR(text,'F',1,5)-1)||'P' --add P
,SUBSTR(text,1,INSTR(text,'F',1,5)-1)||'P'||SUBSTR(text,INSTR(text,'F',1,5)+1) --add remainder of string
FROM
test_data
;
So what you're trying to do would be something like
UPDATE <your table>
SET <your column> = SUBSTR(<your column>,1,INSTR(<your column>,'F',1,5)-1)||'P'||SUBSTR(<your column>,INSTR(<your column>,'F',1,5)+1)
..assuming you want to update all rows

The solution below looks for the first five characters at the beginning of the input string. If found, it keeps the first four unchanged and it replaces the fifth with the letter P. Note that if the input string is four characters or less, it is left unchanged. (This includes NULL as the input string, shown in the WITH clause which creates sample strings and also in the output - note that the output has FIVE rows, even though there is nothing visible in the last one.)
with
test_data ( str ) as (
select 'ABCDEFGH' from dual union all
select 'PPPPF' from dual union all
select 'PPPPP' from dual union all
select '1234' from dual union all
select null from dual
)
select str, regexp_replace(str, '^(.{4}).', '\1P') as replaced
from test_data
;
STR REPLACED
-------- --------
ABCDEFGH ABCDPFGH
PPPPF PPPPP
PPPPP PPPPP
1234 1234
5 rows selected.

Flip the 5th 'bit' to a 'P' where it's currently an 'F'.
update table
set column = regexp_replace(column , '^(.{4}).', '\1P')
where regexp_like(column , '^.{4}F');

SQL Server - Split string on last occuring number

I have following column (with fictional data):
Location
---------------
15630London
45680Edinburg
138739South Wales
This column contains both Zipcodes and City names. I want to split those 2 into 2 seperate columns.
So in this case, my output would be:
Zip | City
-------|---------
15630 | London
45680 | Edinburg
138739 | South Wales
I tried the zipcode with
LEFT(location,LEN(location)-CHARINDEX('L',location))
But I couldn't find out how to set the CHARINDEX to work on all letters.
Any suggestions / other ideas?

Here is one way using PATINDEX and some string functions
SELECT LEFT(Location, Patindex('%[a-z]%', Location) - 1),
Substring(Location, Patindex('%[a-z]%', Location), Len(Location))
FROM (VALUES ('15630London'),
('45680Edinburg'),
('138739South Wales'))tc(Location)
Note : Above code considers always zip codes are numbers and characters start only with city name and city is present in all the rows (ie) strings are present in every row

Detect the first non-numeric character and pull of that many chars from the left, then read beyond that point to the end:
select
left(Location, patindex('%[^0-9]%', Location) - 1),
substring(Location, patindex('%[^0-9]%', Location), len(Location))
from t

declare #string varchar(200)='15630London'
select substring(#string,1,patindex('%[a-z,A-Z]%',#string)-1),
substring(#string,patindex('%[a-z,A-Z]%',#string),len(#string))

How do I match a substring of variable length?

I am importing data into my SQL database from an Excel spreadsheet.
The imp table is the imported data, the app table is the existing database table.
app.ReceiptId is formatted as "A" followed by some numbers. Formerly it was 4 digits, but now it may be 4 or 5 digits.
Examples:
A1234
A9876
A10001
imp.ref is a free-text reference field from Excel. It consists of some arbitrary length description, then the ReceiptId, followed by an irrelevant reference number in the format " - BZ-0987654321" (which is sometimes cropped short, or even missing entirely).
Examples:
SHORT DESC A1234 - BZ-0987654321
LONGER DESCRIPTION A9876 - BZ-123
REALLY LONG DESCRIPTION A2345 - B
REALLY REALLY LONG DESCRIPTION A23456
The code below works for a 4-digit ReceiptId, but will not correctly capture a 5-digit one.
UPDATE app
SET
[...]
FROM imp
INNER JOIN app
ON app.ReceiptId = right(right(rtrim(replace(replace(imp.ref,'-',''),'B','')),5)
+ rtrim(left(imp.ref,charindex(' - BZ-',imp.ref))),5)
How can I change the code so it captures either 4 (A1234) or 5 (A12345) digits?

As ughai rightfully wrote in his comment, it's not recommended to use anything other then columns in the on clause of a join.
The reason for that is that using functions prevents sql server for using any indexes on the columns that it might use without the functions.
Therefor, I would suggest adding another column to imp table that will hold the actual ReceiptId and be calculated during the import process itself.
I think the best way of extracting the ReceiptId from the ref column is using substring with patindex, as demonstrated in this fiddle:
SELECT ref,
RTRIM(SUBSTRING(ref, PATINDEX('%A[0-9][0-9][0-9][0-9]%', ref), 6)) As ReceiptId
FROM imp
Update
After the conversation with t-clausen-dk in the comments, I came up with this:
SELECT ref,
CASE WHEN PATINDEX('%[ ]A[0-9][0-9][0-9][0-9][0-9| ]%', ref) > 0
OR PATINDEX('A[0-9][0-9][0-9][0-9][0-9| ]%', ref) = 1 THEN
SUBSTRING(ref, PATINDEX('%A[0-9][0-9][0-9][0-9][0-9| ]%', ref), 6)
ELSE
NULL
END As ReceiptId
FROM imp
fiddle here
This will return null if there is no match,
when a match is a sub string that contains A followed by 4 or 5 digits, separated by spaces from the rest of the string, and can be found at the start, middle or end of the string.

Try this, it will remove all characters before the A[number][number][number][number] and take the first 6 characters after that:
UPDATE app
SET
[...]
FROM imp
INNER JOIN app
ON app.ReceiptId in
(
left(stuff(ref,1, patindex('%A[0-9][0-9][0-9][0-9][ ]%', imp.ref + ' ') - 1, ''), 5),
left(stuff(ref,1, patindex('%A[0-9][0-9][0-9][0-9][0-9][ ]%', imp.ref + ' ') - 1, ''), 6)
)
When using equal, the spaces after is not evaluated

Right pad a string with variable number of spaces

I have a customer table that I want to use to populate a parameter box in SSRS 2008. The cust_num is the value and the concatenation of the cust_name and cust_addr will be the label. The required fields from the table are:
cust_num int PK
cust_name char(50) not null
cust_addr char(50)
The SQL is:
select cust_num, cust_name + isnull(cust_addr, '') address
from customers
Which gives me this in the parameter list:
FIRST OUTPUT - ACTUAL
1 cust1 addr1
2 customer2 addr2
Which is what I expected but I want:
SECOND OUTPUT - DESIRED
1 cust1 addr1
2 customer2 addr2
What I have tried:
select cust_num, rtrim(cust_name) + space(60 - len(cust_name)) +
rtrim(cust_addr) + space(60 - len(cust_addr)) customer
from customers
Which gives me the first output.
select cust_num, rtrim(cust_name) + replicate(char(32), 60 - len(cust_name)) +
rtrim(cust_addr) + replicate(char(32), 60 - len(cust_addr)) customer
Which also gives me the first output.
I have also tried replacing space() with char(32) and vice versa
I have tried variations of substring, left, right all to no avail.
I have also used ltrim and rtrim in various spots.
The reason for the 60 is that I have checked the max length in both fields and it is 50 and I want some whitespace between the fields even if the field is maxed. I am not really concerned about truncated data since the city, state, and zip are in different fields so if the end of the street address is chopped off it is ok, I guess.
This is not a show stopper, the SSRS report is currently deployed with the first output but I would like to make it cleaner if I can.

Whammo blammo (for leading spaces):
SELECT
RIGHT(space(60) + cust_name, 60),
RIGHT(space(60) + cust_address, 60)
OR (for trailing spaces)
SELECT
LEFT(cust_name + space(60), 60),
LEFT(cust_address + space(60), 60),

The easiest way to right pad a string with spaces (without them being trimmed) is to simply cast the string as CHAR(length). MSSQL will sometimes trim whitespace from VARCHAR (because it is a VARiable-length data type). Since CHAR is a fixed length datatype, SQL Server will never trim the trailing spaces, and will automatically pad strings that are shorter than its length with spaces. Try the following code snippet for example.
SELECT CAST('Test' AS CHAR(20))
This returns the value 'Test '.

This is based on Jim's answer,
SELECT
#field_text + SPACE(#pad_length - LEN(#field_text)) AS RightPad
,SPACE(#pad_length - LEN(#field_text)) + #field_text AS LeftPad
Advantages
More Straight Forward
Slightly Cleaner (IMO)
Faster (Maybe?)
Easily Modified to either double pad for displaying in non-fixed width fonts or split padding left and right to center
Disadvantages
Doesn't handle LEN(#field_text) > #pad_length

Based on KMier's answer, addresses the comment that this method poses a problem when the field to be padded is not a field, but the outcome of a (possibly complicated) function; the entire function has to be repeated.
Also, this allows for padding a field to the maximum length of its contents.
WITH
cte AS (
SELECT 'foo' AS value_to_be_padded
UNION SELECT 'foobar'
),
cte_max AS (
SELECT MAX(LEN(value_to_be_padded)) AS max_len
)
SELECT
CONCAT(SPACE(max_len - LEN(value_to_be_padded)), value_to_be_padded AS left_padded,
CONCAT(value_to_be_padded, SPACE(max_len - LEN(value_to_be_padded)) AS right_padded;

declare #t table(f1 varchar(50),f2 varchar(50),f3 varchar(50))
insert into #t values
('foooo','fooooooo','foo')
,('foo','fooooooo','fooo')
,('foooooooo','fooooooo','foooooo')
select
concat(f1
,space(max(len(f1)) over () - len(f1))
,space(3)
,f2
,space(max(len(f2)) over () - len(f2))
,space(3)
,f3
)
from #t
result
foooo fooooooo foo
foo fooooooo fooo
foooooooo fooooooo foooooo

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

T SQL Extract String between penultimate and last comma - sql-server

Related

Snowflake: Trouble getting numbers to return from a PIVOT function

Changing character in a string of characters

SQL Server - Split string on last occuring number

How do I match a substring of variable length?

Right pad a string with variable number of spaces

Categories

Resources