SQL Server - Split string on last occuring number - sql-server

I have following column (with fictional data):
Location
---------------
15630London
45680Edinburg
138739South Wales
This column contains both Zipcodes and City names. I want to split those 2 into 2 seperate columns.
So in this case, my output would be:
Zip | City
-------|---------
15630 | London
45680 | Edinburg
138739 | South Wales
I tried the zipcode with
LEFT(location,LEN(location)-CHARINDEX('L',location))
But I couldn't find out how to set the CHARINDEX to work on all letters.
Any suggestions / other ideas?

Here is one way using PATINDEX and some string functions
SELECT LEFT(Location, Patindex('%[a-z]%', Location) - 1),
Substring(Location, Patindex('%[a-z]%', Location), Len(Location))
FROM (VALUES ('15630London'),
('45680Edinburg'),
('138739South Wales'))tc(Location)
Note : Above code considers always zip codes are numbers and characters start only with city name and city is present in all the rows (ie) strings are present in every row

Detect the first non-numeric character and pull of that many chars from the left, then read beyond that point to the end:
select
left(Location, patindex('%[^0-9]%', Location) - 1),
substring(Location, patindex('%[^0-9]%', Location), len(Location))
from t

declare #string varchar(200)='15630London'
select substring(#string,1,patindex('%[a-z,A-Z]%',#string)-1),
substring(#string,patindex('%[a-z,A-Z]%',#string),len(#string))

Related

String concatenation based of column length

i have telephone number like this in one table:
ID Telephone extention
------------------------------
1 9986323422 4
2 9992108 2222
3 9962718 241
Final result wanted is number of digit in extention will be taken and replace the end digit/(s) of "Telephone" column.
want my result to be:
ID Telephone extention result
-----------------------------------------
1 9986323422 4 9986323424
2 9992108 2222 9992222
3 9962718 241 9962241
I have 100k records like this. What is the best and quick way to achieve this? Thanks.
This may be a little too cute1 but is an alternative to the STUFF approaches:
SELECT ID,Telephone,Extension,
SUBSTRING(Telephone,1-LEN(Extension),LEN(Telephone)) + Extension as Result
It works because negative arguments to the start parameter for SUBSTRING allow you to truncate the end of the string by those amounts.
1It avoid repetitive calls to LEN(), but the optimizer should be able to avoid duplication anyway and avoids having to reverse the entire string, but this does come at a readability cost.
You can use STUFF() together with some calculations with LEN()
DECLARE #dummyTable TABLE(ID INT,Telephone VARCHAR(100), extention VARCHAR(100));
INSERT INTO #dummyTable VALUES
(1,'9986323422','4')
,(2,'9992108','2222')
,(3,'9962718','241');
SELECT *
,STUFF(t.Telephone,LEN(t.Telephone)-LEN(t.extention)+1,LEN(t.extention),t.extention) AS result
FROM #dummyTable AS t;
You might have to add some validations to avoid errors (e.g. length of extension should be smaller than of phone number)
In similar way use reverse() function with stuff() function to replace ends digits of Telephone value with extention value
select *, reverse(stuff(reverse(Telephone), 1, len(extention), reverse(extention)))
from table

Find valid combinations based on matrix

I have a in CALC the following matrix: the first row (1) contains employee numbers, the first column (A) contains productcodes.
Everywhere there is an X that productitem was sold by the corresponding employee above
| 0302 | 0303 | 0304 | 0402 |
1625 | X | | X | X |
1643 | | X | X | |
...
We see that product 1643 was sold by employees 0303 and 0304
What I would like to see is a list of what product was sold by which employees but formatted like this:
1625 | 0302, 0304, 0402 |
1643 | 0303, 0304 |
The reason for this is that we need this matrix ultimately imported into an SQL SERVER table. We have no access to the origins of this matrix. It contains about 50 employees and 9000+ products.
Thanx for thinking with us!
try something like this
;with data as
(
SELECT *
FROM ( VALUES (1625,'X',NULL,'X','X'),
(1643,NULL,'X','X',NULL))
cs (col1, [0302], [0303], [0304], [0402])
),cte
AS (SELECT col1,
col
FROM data
CROSS apply (VALUES ('0302',[0302]),
('0303',[0303]),
('0304',[0304]),
('0402',[0402])) cs (col, val)
WHERE val IS NOT NULL)
SELECT col1,
LEFT(cs.col, Len(cs.col) - 1) AS col
FROM cte a
CROSS APPLY (SELECT col + ','
FROM cte B
WHERE a.col1 = b.col1
FOR XML PATH('')) cs (col)
GROUP BY col1,
LEFT(cs.col, Len(cs.col) - 1)
I think there are two problems to solve:
get the product codes for the X marks;
concatenate them into a single, comma-separated string.
I can't offer a solution for both issues in one step, but you may handle both issues separately.
1.
To replace the X marks by the respective product codes, you could use an array function to create a second table (matrix). To do so, create a new sheet, copy the first column / first row, and enter the following formula in cell B2:
=IF($B2:$E3="X";$B$1:$E$1;"")
You'll have to adapt the formula, so it covers your complete input data (If your last data cell is Z9999, it would be =IF($B2:$Z9999="X";$B$1:$Z$1;"")). My example just covers two rows and four columns.
After modifying it, confirm with CTRL+SHIFT+ENTER to apply it as array formula.
2.
Now, you'll have to concatenate the product codes. LO Calc lacks a feature to concatenate an array, but you could use a simple user-defined function. For such a string-join function, see this answer. Just create a new macro with the StarBasic code provided there and save it. Now, you have a STRJOIN() function at hand that accepts an array and concatenates its values, leaving empty values out.
You could add that function using a helper column on the second sheet and apply it by dragging it down. Finally, to get rid of the cells with the single product IDs, copy the complete second sheet, paste special into a third sheet, pasting only the values. Now, you can remove all columns except the first one (employee IDs) and the last one (with the concatenated product ids).
I created a table in sql for holding the data:
CREATE TABLE [dbo].[mydata](
[prod_code] [nvarchar](8) NULL,
[0100] [nvarchar](10) NULL,
[0101] [nvarchar](10) NULL,
[and so on...]
I created the list of columns in Calc by copying and pasting them transposed. After that I used the concatenate function to create the columnlist + datatype for the create table statement
I cleaned up the worksheet and imported it into this table using SQL Server's import wizard. Cleaning meant removing unnecessary rows/columns. Since the columnnames were identical mapping was done correctly for 99%.
Now I had the data in SQL Server.
I adapted the code MM93 suggested a bit:
;with data as
(
SELECT *
FROM dbo.mydata <-- here i simply referenced the whole table
),cte
and in the next part I uses the same 'worksheet' trick to list and format all the column names and pasted them in.
),cte
AS (SELECT prod_code, <-- had to replace col1 with 'prod_code'
col
FROM data
CROSS apply (VALUES ('0100',[0100]),
('0101', [0101] ),
(and so on... ),
The result of this query was inserted into a new table and my colleagues and I are querying our harts out :)
PS: removing the 'FOR XML' clause resulted in a table with two columns :
prodcode | employee
which containes al the unique combinations of prodcode + employeenumber which is a lot faster and much more practical to query.

How do I match a substring of variable length?

I am importing data into my SQL database from an Excel spreadsheet.
The imp table is the imported data, the app table is the existing database table.
app.ReceiptId is formatted as "A" followed by some numbers. Formerly it was 4 digits, but now it may be 4 or 5 digits.
Examples:
A1234
A9876
A10001
imp.ref is a free-text reference field from Excel. It consists of some arbitrary length description, then the ReceiptId, followed by an irrelevant reference number in the format " - BZ-0987654321" (which is sometimes cropped short, or even missing entirely).
Examples:
SHORT DESC A1234 - BZ-0987654321
LONGER DESCRIPTION A9876 - BZ-123
REALLY LONG DESCRIPTION A2345 - B
REALLY REALLY LONG DESCRIPTION A23456
The code below works for a 4-digit ReceiptId, but will not correctly capture a 5-digit one.
UPDATE app
SET
[...]
FROM imp
INNER JOIN app
ON app.ReceiptId = right(right(rtrim(replace(replace(imp.ref,'-',''),'B','')),5)
+ rtrim(left(imp.ref,charindex(' - BZ-',imp.ref))),5)
How can I change the code so it captures either 4 (A1234) or 5 (A12345) digits?
As ughai rightfully wrote in his comment, it's not recommended to use anything other then columns in the on clause of a join.
The reason for that is that using functions prevents sql server for using any indexes on the columns that it might use without the functions.
Therefor, I would suggest adding another column to imp table that will hold the actual ReceiptId and be calculated during the import process itself.
I think the best way of extracting the ReceiptId from the ref column is using substring with patindex, as demonstrated in this fiddle:
SELECT ref,
RTRIM(SUBSTRING(ref, PATINDEX('%A[0-9][0-9][0-9][0-9]%', ref), 6)) As ReceiptId
FROM imp
Update
After the conversation with t-clausen-dk in the comments, I came up with this:
SELECT ref,
CASE WHEN PATINDEX('%[ ]A[0-9][0-9][0-9][0-9][0-9| ]%', ref) > 0
OR PATINDEX('A[0-9][0-9][0-9][0-9][0-9| ]%', ref) = 1 THEN
SUBSTRING(ref, PATINDEX('%A[0-9][0-9][0-9][0-9][0-9| ]%', ref), 6)
ELSE
NULL
END As ReceiptId
FROM imp
fiddle here
This will return null if there is no match,
when a match is a sub string that contains A followed by 4 or 5 digits, separated by spaces from the rest of the string, and can be found at the start, middle or end of the string.
Try this, it will remove all characters before the A[number][number][number][number] and take the first 6 characters after that:
UPDATE app
SET
[...]
FROM imp
INNER JOIN app
ON app.ReceiptId in
(
left(stuff(ref,1, patindex('%A[0-9][0-9][0-9][0-9][ ]%', imp.ref + ' ') - 1, ''), 5),
left(stuff(ref,1, patindex('%A[0-9][0-9][0-9][0-9][0-9][ ]%', imp.ref + ' ') - 1, ''), 6)
)
When using equal, the spaces after is not evaluated

TSQL search exact match into a string

I stumbling on an issue with string parsing; what I'm trying to achieve is substitute a marker string with a value but the string match needs to be perfect.
Keep in mind that before the compare I split the entire string in a table (rowID int, segment nvarchar(max)) wherever i find a space so, a thing like 'The delta_s is §delta_s' will look like:
rowID | segment
1 | the
2 | deltaT_s
3 | is
4 | §deltaT_s
After this i cycle each row with my table of "replacements" (idString nvarchar(max), val float); example:
Marker string (#segment): '§deltaT_s'
String to replace (#idString): '§deltaT_s'
The instruction I am using (since "like" is a lost cause as far I can see):
SELECT STUFF(#segment, PATINDEX('%'+#idString+'[^a-z]%', #segment), LEN(#idString), CAST(#val AS NVARCHAR(MAX)))
with #val being the number to substitute taken from the "replacements" table.
Now, in my table of "replacements" i have 2 delta like markers
1) §deltaT_s
2) §deltaT
My issue is that when the cycle start comparing the segments with the markers and the §deltaT comes up it will substitute the first part of the string in this way
'§deltaT_s' -> '10_s'
I don't understand what I am doing wrong with the REGEX anyone can give me and hand on this matter?
I am available in case more info are required.
Thank you,
F.
If possible you should change the marking style putting a paragraph symbol (§) at both side of the token, making one of the example in your comment
the deltaT_s is §deltaT_s§, see ya!
doing that the sentence will be split as
rowID | segment
--------------------
1 | the
2 | deltaT_s
3 | is
4 | §deltaT_s§,
5 | see
6 | ya!
if the replace values are stored in a fact table you will have something like
token | value
------------------
§deltaT§ | foo
§deltaT_s§ | 10
or you can fake it putting the symbol at the end of the token in you query.
Than it's possible to search for the substitution with a LIKE and a LEFT JOIN between the two tables
SELECT COALESCE(REPLACE(segment, t.token, t.value), segment) Replaced
FROM Sentence s
LEFT JOIN Token t ON s.segment LIKE '%' + t.token + '%'
SQLFiddle demo
If you cannot change the fact table you can fake the change adding the symbol after the token
SELECT COALESCE(REPLACE(segment, t.token, t.value), segment) Replaced
FROM Sentence s
LEFT JOIN Token t ON s.segment LIKE '%' + t.token + '§%'
Maybe it is not an option, but for me helped ones.
If you can use Regex in sql or create CLR functions, look at this link http://www.sqllion.com/2010/12/pattern-matching-regex-in-t-sql/ last 2 options.
For you the best will be to take last choice using CLR function.
Then you will can do like this:
Text: the deltaT_s is §delta, see ya!
Regex: (?<=[^a-z])§delta(?![a-z_]) - this (?<=[^a-z]) means that will not take to match and (?![a-z_]) is not followed by letters and underline.
Replace to : 10
I also have tried regex \b§delta\b (\b :Start or End of word), but it seems it doesn't like §

Right pad a string with variable number of spaces

I have a customer table that I want to use to populate a parameter box in SSRS 2008. The cust_num is the value and the concatenation of the cust_name and cust_addr will be the label. The required fields from the table are:
cust_num int PK
cust_name char(50) not null
cust_addr char(50)
The SQL is:
select cust_num, cust_name + isnull(cust_addr, '') address
from customers
Which gives me this in the parameter list:
FIRST OUTPUT - ACTUAL
1 cust1 addr1
2 customer2 addr2
Which is what I expected but I want:
SECOND OUTPUT - DESIRED
1 cust1 addr1
2 customer2 addr2
What I have tried:
select cust_num, rtrim(cust_name) + space(60 - len(cust_name)) +
rtrim(cust_addr) + space(60 - len(cust_addr)) customer
from customers
Which gives me the first output.
select cust_num, rtrim(cust_name) + replicate(char(32), 60 - len(cust_name)) +
rtrim(cust_addr) + replicate(char(32), 60 - len(cust_addr)) customer
Which also gives me the first output.
I have also tried replacing space() with char(32) and vice versa
I have tried variations of substring, left, right all to no avail.
I have also used ltrim and rtrim in various spots.
The reason for the 60 is that I have checked the max length in both fields and it is 50 and I want some whitespace between the fields even if the field is maxed. I am not really concerned about truncated data since the city, state, and zip are in different fields so if the end of the street address is chopped off it is ok, I guess.
This is not a show stopper, the SSRS report is currently deployed with the first output but I would like to make it cleaner if I can.
Whammo blammo (for leading spaces):
SELECT
RIGHT(space(60) + cust_name, 60),
RIGHT(space(60) + cust_address, 60)
OR (for trailing spaces)
SELECT
LEFT(cust_name + space(60), 60),
LEFT(cust_address + space(60), 60),
The easiest way to right pad a string with spaces (without them being trimmed) is to simply cast the string as CHAR(length). MSSQL will sometimes trim whitespace from VARCHAR (because it is a VARiable-length data type). Since CHAR is a fixed length datatype, SQL Server will never trim the trailing spaces, and will automatically pad strings that are shorter than its length with spaces. Try the following code snippet for example.
SELECT CAST('Test' AS CHAR(20))
This returns the value 'Test '.
This is based on Jim's answer,
SELECT
#field_text + SPACE(#pad_length - LEN(#field_text)) AS RightPad
,SPACE(#pad_length - LEN(#field_text)) + #field_text AS LeftPad
Advantages
More Straight Forward
Slightly Cleaner (IMO)
Faster (Maybe?)
Easily Modified to either double pad for displaying in non-fixed width fonts or split padding left and right to center
Disadvantages
Doesn't handle LEN(#field_text) > #pad_length
Based on KMier's answer, addresses the comment that this method poses a problem when the field to be padded is not a field, but the outcome of a (possibly complicated) function; the entire function has to be repeated.
Also, this allows for padding a field to the maximum length of its contents.
WITH
cte AS (
SELECT 'foo' AS value_to_be_padded
UNION SELECT 'foobar'
),
cte_max AS (
SELECT MAX(LEN(value_to_be_padded)) AS max_len
)
SELECT
CONCAT(SPACE(max_len - LEN(value_to_be_padded)), value_to_be_padded AS left_padded,
CONCAT(value_to_be_padded, SPACE(max_len - LEN(value_to_be_padded)) AS right_padded;
declare #t table(f1 varchar(50),f2 varchar(50),f3 varchar(50))
insert into #t values
('foooo','fooooooo','foo')
,('foo','fooooooo','fooo')
,('foooooooo','fooooooo','foooooo')
select
concat(f1
,space(max(len(f1)) over () - len(f1))
,space(3)
,f2
,space(max(len(f2)) over () - len(f2))
,space(3)
,f3
)
from #t
result
foooo fooooooo foo
foo fooooooo fooo
foooooooo fooooooo foooooo

Resources