SQL Server String manipulation column to new column - sql-server

Need to create new column from the existing column with string manipulation on the fly
Existing stored string in the column is nvarchar
Existing column string example value 0001134564444
Need, the first three positions '000' to become 'AB' (there is a good number of variations)
the following 11 shall be 1 (there are eight variations results (1 to 8))
the following 3456 shall take the last two characters 56
the last 4444 shall not change.
The new string should then be AB5614444
Now need this done in SQL server,...
have tried with, substring, stuff, replace, charindex
Have reached the point...never too late to give up, this is my last chance.
Oracle version,
select column NewColumn, decode(substr(column, 1,3),'000','AB', '010', 'BC' ...)||substr(column,8,2)
||substr(column,5,1)||substr(column,10,5) NewStringColumn from table
getting absolutely nowhere with SQL server for this task, oracle remains no longer an option.
any help is appreciated,

select
case substring(c, 1, 3) when '000' then 'AB' when '010' then 'BC' ... end
+
case substring(c, 4, 2) when '11' then '1' ... end
+
substring(c, 8, 6)
with first_mapping as (
select k, v
from (values
('000', 'AB'),
('010', 'BC'),
...
) v(k, v)
),
second_mapping as (
select k, v
from (values
('11', '1'),
...
) v(k, v)
)
select
(select v from first_mapping where k = substring(c, 1, 3))
+
(select v from second_mapping where k = substring(c, 4, 2))
+
substring(c, 8, 6)

Related

Count 0's between 1's - SQL

I need a query or function to count the 0's between 1's in a string.
For example:
String1 = '10101101' -> Result=3
String2 = '11111001101' -> Result=1
String3 = '01111111111' -> Result=1
I only need to search for 101 pattern or 01 pattern if its at the beginning of the string.
You may try to decompose the input strings using SUBTRING() and a number table:
SELECT
String, COUNT(*) AS [101Count]
FROM (
SELECT
v.String,
SUBSTRING(v.String, t.No - 1, 1) AS PreviousChar,
SUBSTRING(v.String, t.No, 1) AS CurrentChar,
SUBSTRING(v.String, t.No + 1, 1) AS NextChar
FROM (VALUES
('10101101'),
('11111001101'),
('01111111111')
) v (String)
CROSS APPLY (VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10)) t (No)
) cte
WHERE
CASE WHEN PreviousChar = '' THEN '1' ELSE PreviousChar END = '1' AND
CurrentChar = '0' AND
NextChar = '1'
GROUP BY String
Result:
String 101Count
10101101 3
11111001101 1
01111111111 1
Notes:
The table with alias v is the source table, the table with alias t is the number table. If the input strings have more than 10 characters, use an appropriate number (tally) table.
-- This converts "111101101010111" in "01101010" and "011101000" in "01110"
regexp_replace(field, '^1*(.*)1*0*$', '\1')
-- This converts "01101010" in "0000"
regexp_replace(field, '1', '')
-- This counts the string length, returning 4 for '0000':
LENGTH(field)
-- Put all together:
LENGTH(
regexp_replace(
regexp_replace(field, '^1*(.*)1*0*$', '\1')
, '1', '')
)
Different or more complicated cases require a modification of the regular expression.
Update
For "zeros between 1s" I see now you mean "101" sequences. This is more complicated because of the possibility of having "10101". Suppose you want to count this as 2:
replace 101 with 11011. Now 10101 will become either 1101101 or 1101111011. In either case, you have the "101" sequence well apart and still only have two of them.
replace all 101s with 'X'. You now have 1X11X1
replace [01] with the empty string. You now have XX.
use LENGTH to count the X's.
Any extra special sequence like "01" at the beginning you can convert as first thing with "X1" ("10" at the end would become "1X"), which will then neatly fold back in the above workflow.
By using the LIKE operator with % you can decide how to search a specific string. In this SQL query I am saying that I want every record that starts as 101 or 01.
SELECT ColumnsYouWant FROM TableYouWant
WHERE ColumnYouWant LIKE '101%' OR '01%';
You can simple COUNT the ColumnYouWant, like this:
SELECT COUNT(ColumnYouWant) FROM TableYouWant
WHERE ColumnYouWant LIKE '101%' OR '01%';
Or you can use a method of your backend language to count the results that the first query returns. This count method will depend on the language you are working with.
SQL Documentation for LIKE: https://www.w3schools.com/sql/sql_like.asp
SQL Documentation for COUNT; https://www.w3schools.com/sql/sql_count_avg_sum.asp
The other solutions do not account for all of the characters (max of 11, of the examples shown)
Data
drop table if exists #tTEST;
go
select * INTO #tTEST from (values
(1, '10101101'),
(2, '11111001101'),
(3, '01111111111')) V(id, string);
Query
;with
split_cte as (
select id, n, substring(t.string, v.n, 1) subchar
from #tTEST t
cross apply (values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),
(11),(12),(13),(14),(15),(16),(17),(18),(19),(20)) v(n)
where v.n<=len(t.string)),
lead_lag_cte as (
select id, n, lead(subchar, 1, 9) over (partition by id order by n) lead_c, subchar,
lag(subchar, 1, 9) over (partition by id order by n) lag_c
from split_cte)
select id, sum(case when (lead_c=1 and lag_c=9) then 1 else
case when (lead_c=1 and lag_c=1) then 1 else 0 end end) zero_count
from lead_lag_cte
where subchar=0
group by id;
Results
id zero_count
1 3
2 1
3 1
Another way, perhasp quicker:
DECLARE #T TABLE (ID INT, STRING VARCHAR(32));
INSERT INTO #T
VALUES (1, '10101101'),
(2, '11111001101'),
(3, '01111111111');
SELECT *, LEN(STRING) - LEN(REPLACE(STRING, '0', '')) AS NUMBER_OF_ZERO
FROM #T
Result:
ID STRING NUMBER_OF_ZERO
----------- -------------------------------- --------------
1 10101101 3
2 11111001101 3
3 01111111111 1
select (len(replace('1'+x, '101', '11011')) - len(replace(replace('1'+x, '101', '11011'), '101', '')))/3
from
(
values
('10101101'),
('11111001101'),
('01111111111'),
('01010101010101010101')
) v(x);

How to convert this regex Query in Oracle to SQL Server?

I am a beginner with regex query therefore I want to ask you how to convert this regex query in Oracle to SQL Server?
select *
from Sales
where regexp_like(productname,'^[A-Z]{3}[0-9]+$')
I convert it to this query:
select *
from Sales
where substr(productname, 1, 3) in ([A-Z])
Is that correct?
Thankyou.
You can use the following query:
SELECT *
FROM Sales
WHERE LEFT(productname, 3) LIKE '[A-Z][A-Z][A-Z]' -- three chars ([A-Z]{3})
AND NOT RIGHT(productname, LEN(productname) - 3) LIKE '%[^0-9]%' -- only numbers after the first three chars
AND NOT LEN(RIGHT(productname, LEN(productname) - 3)) = 0 -- at least one number
demo on dbfiddle.uk
Based on regexp '^[A-Z]{3}[0-9]+$':
-- exactly 3 letters
-- at least one trailing digits
You could use TRIM:
SELECT *
FROM Sales
WHERE productname LIKE '[A-Z][A-Z][A-Z][0-9]%'
AND TRIM( '0123456789' FROM productname) LIKE '[A-Z][A-Z][A-Z]';
-- warning! expression is not SARGable
db<>fiddle demo
Think the most simple would be to use
WHERE
SUBSTRING(<column>, 1, 1) IN('A', 'B', 'C')
AND
SUBSTRING(<column>, 2, 1) IN('A', 'B', 'C')
AND
SUBSTRING(<column>, 2, 1) IN('A', 'B', 'C')
AND
TRY_CONVERT(
INT
, SUBSTRING(<column>, 4, LEN(<column>))
) <> 0
as you don't really need regex to do this.
see demo

Error with SQL Server 2008 R2 Japanese (NVARCHAR) String Comparison?

Problem
When I run the following query on SQL Server 2008 R2, two distinct japanese unicode strings are treated as being equal:
SELECT
CASE
WHEN N'食料' = N'食料ㇰ ㇱ ㇲ ㇳ'
THEN 1
ELSE 0
END;
--result: 1
I know that the kana following the kanji are half-width but since there are no similar full-width kana I wouldn't expect width sensitivity or kana sensitivity to matter. However, if the kana are replaced with full-width versions the comparison behaves as expected:
SELECT
CASE
WHEN N'食料' = N'食料ク シ ス ト'
THEN 1
ELSE 0
END;
--result: 0
Attempted Solutions
This led me to think the issue might be related to my collation which is SQL_Latin1_General_CP1_CI_AS.
First, I tried Latin1_General_CI_AS in case it was a quirk of SQL Unicode comparison but that did not solve the issue.
Then, I figured I would use the most restrictive collation possible (all sensitivities on) but other collations including Latin1_General_CS_AS_KS_WS and Japanese_Unicode_CS_AS_KS_WS did not change the result when using half-width trailing kana (all correctly identified the difference with full-width trailing kana).
To verify that the strings are different at a byte level, I ran the query with half-width trailing kana after removing the N(nvarchar) designation for the strings and verified it returns the expected result of 0.
Questions
What is going on here? Am I simply not trying the right collation? Is this an error in SQL Server 2008 R2? Is there something specific about Japanese Unicode that I am not aware of? Why would the presence of half-width trailing kana not make these strings different?
PS I don't know Japanese so if I messed up my description of the characters I apologize.
The long and short of it is that, in a large number of collations, the characters in your first example are equal to a space.
When doing string comparisons, SQL server eliminates trailing spaces on the end of a string (one exception being when you use LIKE, but you're not doing that here).
So, for example, in the string N'食料ㇰ ㇱ ㇲ ㇳ', every character after 料 is treated as a trailing space and removed when doing your string comparison.
To do a quick check with a given collation, you could run the following query:
WITH
Vals AS (SELECT FullString, StringNum FROM (VALUES (N'食料', 1), (N'食料ㇰ ㇱ ㇲ ㇳ', 2), (N'食料ク シ ス ト', 3)) AS T(FullString, StringNum)),
CTE AS -- A recursive CTE to split the characters up in your strings and check the individual characters.
(
SELECT FullString,
StringNum,
IndividualCharacter = SUBSTRING(FullString, 1, 1),
UnicodeNumber = UNICODE(SUBSTRING(FullString, 1, 1)),
UnicodeBinary = CAST(SUBSTRING(FullString, 1, 1) AS VARBINARY(2)),
CharPosition = 1
FROM Vals
UNION ALL
SELECT V.FullString,
V.StringNum,
IndividualCharacter = SUBSTRING(V.FullString, C.CharPosition + 1, 1),
UnicodeNumber = UNICODE(SUBSTRING(V.FullString, C.CharPosition + 1, 1)),
UnicodeBinary = CAST(SUBSTRING(V.FullString, C.CharPosition + 1, 1) AS VARBINARY(2)),
CharPosition = C.CharPosition + 1
FROM Vals AS V
JOIN CTE AS C
ON C.StringNum = V.StringNum
WHERE C.CharPosition + 1 <= LEN(V.FullString)
)
SELECT C.*,
CharacterEqualToSpace = CASE WHEN NCHAR(C.UnicodeNumber) COLLATE Japanese_Unicode_CS_AS_KS_WS = NCHAR(32) THEN 1 ELSE 0 END,
FullStringWithoutSpace = SUBSTRING(C.FullString, 1, (SELECT MAX(CharPosition) FROM CTE AS C2 WHERE C2.StringNum = C.StringNum AND NCHAR(C2.UnicodeNumber) COLLATE Japanese_Unicode_CS_AS_KS_WS != NCHAR(32))) -- Eliminate white space on the end for this collation, with a substring ending at the last character that does not equal white space.
FROM CTE AS C
ORDER BY StringNum, CharPosition;
From doing some quick tests...
- Japanese collations that will not treat those specific characters as a space: Any BIN collation, Japanese_Bushu_Kakusu, Japanese_XJIS
- Japanese collations that will treat those specific characters as a space: Japanese, Japanese90, Japanese_Unicode
Note: There are over 21000 characters in Japanese_Unicode_CS_AS_KS_WS that are treated as white space. You can check this by running a query like the following for a given collation:
WITH T(N) AS (SELECT 1 FROM (VALUES (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) AS A(B)), -- 16
T2(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) - 1 FROM T AS A CROSS JOIN T AS B CROSS JOIN T AS C CROSS JOIN T) -- 16^4.
SELECT WhiteSpaceCharacters = NCHAR(N)
FROM T2
WHERE NCHAR(N) COLLATE Japanese_Unicode_CS_AS_KS_WS = NCHAR(32);

sort float numbers as a natural numbers in SQL Server

Well I had asked the same question for jquery on here, Now my question is same with SQL Server Query :) But this time this is not comma separated, this is separate row in Database like
I have separated rows having float numbers.
Name
K1.1
K1.10
K1.2
K3.1
K3.14
K3.5
and I want to sort this float numbers like,
Name
K1.1
K1.2
K1.10
K3.1
K3.5
K3.14
actually in my case, the numbers which are after decimals will consider as a natural numbers, so 1.2 will consider as '2' and 1.10 will consider as '10' thats why 1.2 will come first than 1.10.
You can remove 'K' because it is almost common and suggestion or example would be great for me, thanks.
You can use PARSENAME (which is more of a hack) or String functions like CHARINDEX , STUFF, LEFT etc to achieve this.
Input data
;WITH CTE AS
(
SELECT 'K1.1' Name
UNION ALL SELECT 'K1.10'
UNION ALL SELECT 'K1.2'
UNION ALL SELECT 'K3.1'
UNION ALL SELECT 'K3.14'
UNION ALL SELECT 'K3.5'
)
Using PARSENAME
SELECT Name,PARSENAME(REPLACE(Name,'K',''),2),PARSENAME(REPLACE(Name,'K',''),1)
FROM CTE
ORDER BY CONVERT(INT,PARSENAME(REPLACE(Name,'K',''),2)),
CONVERT(INT,PARSENAME(REPLACE(Name,'K',''),1))
Using String Functions
SELECT Name,LEFT(Name,CHARINDEX('.',Name) - 1), STUFF(Name,1,CHARINDEX('.',Name),'')
FROM CTE
ORDER BY CONVERT(INT,REPLACE((LEFT(Name,CHARINDEX('.',Name) - 1)),'K','')),
CONVERT(INT,STUFF(Name,1,CHARINDEX('.',Name),''))
Output
K1.1 K1 1
K1.2 K1 2
K1.10 K1 10
K3.1 K3 1
K3.5 K3 5
K3.14 K3 14
This works if there is always one char before the first number and the number is not higher than 9:
SELECT name
FROM YourTable
ORDER BY CAST(SUBSTRING(name,2,1) AS INT), --Get the number before dot
CAST(RIGHT(name,LEN(name)-CHARINDEX('.',name)) AS INT) --Get the number after the dot
Perhaps, more verbal, but should do the trick
declare #source as table(num varchar(12));
insert into #source(num) values('K1.1'),('K1.10'),('K1.2'),('K3.1'),('K3.14'),('K3.5');
-- create helper table
with data as
(
select num,
cast(SUBSTRING(replace(num, 'K', ''), 1, CHARINDEX('.', num) - 2) as int) as [first],
cast(SUBSTRING(replace(num, 'K', ''), CHARINDEX('.', num), LEN(num)) as int) as [second]
from #source
)
-- Select and order accordingly
select num
from data
order by [first], [second]
sqlfiddle:
http://sqlfiddle.com/#!6/a9b06/2
The shorter solution is this one :
Select Num
from yourtable
order by cast((Parsename(Num, 1) ) as Int)

String functions in a SQL where clause

Lets say I have a field. Lets call it Barcode1. Right now all Barcodes1 are 22 characters with each character is an integer.
Suppose there is a second field Barcode2. Both of these are varchar(22)
My condition in plain english terms is:
Barcode1 is identical to barcode2 except in digits 7,8 where for barcode2, digits 7 and 8 are the same thing in barcode1 plus 20
so
001214**54**54545654521523
549462**74**48634842135782
I also would like the negation of the where clause where rows that do NOT match the condition are returned.
Thank you.
I think this is what you want:
Example Data:
DECLARE #table TABLE ( barcode VARCHAR(22) )
INSERT INTO #table
(
barcode
)
SELECT '0012145454545654521523'
UNION ALL
SELECT '0012142454545654521523'
UNION ALL
SELECT '5494627448634842135782'
UNION ALL
SELECT '5494625448634842135782'
First Condition - meets 7,8 + 20
SELECT a.barcode,
b.barcode,
SUBSTRING(a.barcode, 7, 2) a,
SUBSTRING(b.barcode, 7, 2) b
FROM #table a
INNER JOIN #table b
ON SUBSTRING(a.barcode, 7, 2) + 20 = SUBSTRING(b.barcode, 7, 2)
AND a.barcode != b.barcode
returns:
barcode barcode a b
0012145454545654521523 5494627448634842135782 54 74
5494625448634842135782 5494627448634842135782 54 74
Negation where 7,8 + 20 doesn't exist
SELECT *
FROM #table a
WHERE NOT EXISTS ( SELECT TOP 1 1
FROM #table b
WHERE SUBSTRING(a.barcode, 7, 2) + 20 = SUBSTRING(b.barcode, 7, 2) )
returns:
0012142454545654521523
5494627448634842135782
You'll have to break that barcode up using string operations, something like:
WHERE
substring(barcode1, 0, 6) = substring(barcode2, 0, 6) AND
substring(barcode1, 9, 2) = substring(barcode2, 0, 9) AND
etc...
And since you'll be doing these comparisons on function results, indexes aren't going to be used. If this is a frequent operation, you'd be better off splitting up the barcode strings into individual fields so you can compare the individual chunks as fully separate indexable fields.

Resources