Weird SQL query behavior - sql-server

I am using a select query to parse address, for some reason both like and charindex don't work.
The data rows are like below:
Address
204‐101 xx CREEK DR
168‐906 xx RD
168‐906 xx RD
I tried query like
select top 10 ascii(substring(address,4,1)),
ascii('-'),
charindex(substring(address,4,1) collate SQL_Latin1_General_CP1_CI_AS,address),
charindex('-',address collate SQL_Latin1_General_CP1_CI_AS),
addr1=case when charindex('‐',address)>0 then substring(address,charindex('‐',address)+1,200) else address end
The result is like below
45 45 4 0 204‐101 xx CREEK DR
45 45 4 0 168‐906 xx RD
45 45 4 0 168‐906 xx RD
So the dash in the address has ascii code 45. But for some reason, charindex with dash returns 0, while charindex with the substring to get the dash in the address works.
I suspect it is the collation, but I did convert the addresses into my server's collation already.
Any idea? thanks!

They are different characters. The one in the data is NCHAR(8208) and the one in the string literal is NCHAR(45) - you should use the UNICODE function to see this not ASCII. (DB Fiddle)
Calling ASCII will implicitly convert the string to varchar and characters can be silently changed to close equivalents.

Related

Defeat these dashed dashes in SQL server

I have a table that contains the names of various recording artists. One of them has a dash in their name. If I run the following:
Select artist
, substring(artist,8,1) as substring_artist
, ascii(substring(artist,8,1)) as ascii_table
, ascii('-') as ascii_dash_key /*The dash key next to zero */
, len(artist) as len_artist
From [dbo].[mytable] where artist like 'Sleater%'
Then the following is returned. This seems to indicate that a dash (ascii 45) is being stored in the artist column
However, if I change the where clause to:
From [dbo].[mytable] where artist like 'Sleater' + char(45) + '%'
I get no results returned. If I copy and paste the output from the artist column into a hex editor, I can see that the dash is actually stored as E2 80 90, the Unicode byte sequence for the multi-byte hyphen character.
So, I'd like to find and replace such occurrences with a standard ascii hyphen, but I'm am at a loss as to what criteria to use to find these E2 80 90 hyphens?
Your char is the hyphen, information on it here :
https://www.charbase.com/2010-unicode-hyphen
You can see that the UTF16 code is 2010 so in T-SQL you can build it with
SELECT NCHAR(2010)
From there you can use any SQL command with that car, for example in a select like :
Select artist
From [dbo].[mytable] where artist like N'Sleater' + NCHAR(2010) + '%'
or as you want in a
REPLACE( artist, NCHAR(2010), '-' )
with a "real" dash
EDIT:
If the collation of your DB give you some trouble with the NCHAR(2010) you can also try to use the car N'‐' that you'll copy/paste from the charbase link I gave you so :
REPLACE( artist , N'‐' , '-' )
that you can even take from the string here (made with the special car) so all made for you :
update mytable set artist=REPLACE( artist, N'‐' , '-' )
I don't know your table definition and COLLATION but I'm almost sure that you are mixing NCHAR and CHAR types and convert unicode, multibyte characters to sinle byte representations. Take a look at this demo:
WITH Demo AS
(
SELECT N'ABC'+NCHAR(0x2010)+N'DEF' T
)
SELECT
T,
CASE WHEN T LIKE 'ABC'+CHAR(45)+'%' THEN 1 ELSE 0 END [Char],
CASE WHEN T LIKE 'ABC-%' THEN 1 ELSE 0 END [Hyphen],
CASE WHEN T LIKE N'ABC‐%' THEN 1 ELSE 0 END [Unicode-Hyphen],--unicode hyphen us used here
CASE WHEN T LIKE N'ABC'+NCHAR(45)+N'%' THEN 1 ELSE 0 END [NChar],
CASE WHEN CAST(T AS varchar(MAX)) LIKE 'ABC-%' THEN 1 ELSE 0 END [ConvertedToAscii],
ASCII(NCHAR(0x2010)) ConvertedToAscii,
CAST(SUBSTRING(T, 4, 1) AS varbinary) VarbinaryRepresentation
FROM Demo
My results:
T Char Hyphen Unicode-Hyphen NChar ConvertedToAscii ConvertedToAscii VarbinaryRepresentation
------- ----------- ----------- -------------- ----------- ---------------- ---------------- --------------------------------------------------------------
ABC‐DEF 0 0 1 0 1 45 0x1020
UTF-8 (3 bytes) representation is the same as 2010 in unicode.

how does SQL Server actually store russian symbols in char?

I have a column NAME, which is CHAR(50).
It contains the value 'Рулон комбинированный СТЕРИТ 50мм ? 200 м'
which integer representation is:
'1056,1091,1083,1086,1085,32,1082,1086,1084,1073,1080,1085,1080,1088,1086,1074,1072,1085,1085,1099,1081,32,1057,1058,1045,1056,1048,1058,32,53,48,1084,1084,32,63,32,50,48,48,32,1084'
but CHAR implies that it contains 8 bit. How does SQL Server store values like '1056,1091,1083,1086,1085' which are UNICODE symbols?
OK, and also ? symbol is actually × (215) (Multiplication Sign)
If SQL Server can represent '1056' why it can't represent '215'?
What the 255 values in a char mean is determined by the database collation. For Russia this is typically Cyrillic_General_CI_AS (where CI means Case Insentitive and AS means Accent Sensitive.)
There's a good chance this matches Windows code page 1251, so л is stored as hex EB or decimal 235. You can verify this with T-SQL:
create database d1 collate Cyrillic_General_CI_AS;
use d1
select ascii('л')
-->
235
In the Cyrillic code page, decimal 215 means Ч, not the multiplication sign. Because SQL Server can't match the multiplication sign to the Cyrillic code page, it replaces it with a question mark:
select ascii('×'), ascii('?')
-->
63 63
In the Cyrillic code page, the char 8-bit representation of the multiplication sign and the question mark are both decimal 63, the question mark.
I have a column NAME, which is CHAR(50).
It contains the value 'Рулон комбинированный СТЕРИТ 50мм ? 200 м'
which integer representation is:
'1056,1091,1083,1086,1085,32,1082,1086,1084,1073,1080,1085,1080,1088,1086,1074,1072,1085,1085,1099,1081,32,1057,1058,1045,1056,1048,1058,32,53,48,1084,1084,32,63,32,50,48,48,32,1084'
Cyted above is wrong.
I make a test within a database with Cyrillic collation and integer representation is different from what you showed us, so or your data type is not char, or your integer representation is wrong, and yes, "but CHAR implies that it contains 8 bit" is correct and here is how you can prove it to youerself:
--create table dbo.t (name char(50));
--insert into dbo.t values ('Рулон комбинированный СТЕРИТ 50мм ? 200 м')
select cast (name as binary(50))
from dbo.t;
select substring(cast (name as binary(50)), n, 1) as bin_substr,
cast(substring(cast (name as binary(50)), n, 1) as int) as int_,
char(substring(cast (name as binary(50)), n, 1)) as cyr_char
from dbo.t cross join nums.dbo.nums;
Here dbo.Nums is an auxiliary table containig integers. I just convert your string from char field into binary, split it byte per byte and convert into int and char.

SQL Server - How to get last numeric value in the given string

I am trying to get last numeric part in the given string.
For Example, below are the given strings and the result should be last numeric part only
SB124197 --> 124197
287276ACBX92 --> 92
R009321743-16 --> 16
How to achieve this functionality. Please help.
Try this:
select right(#str, patindex('%[^0-9]%',reverse(#str)) - 1)
Explanation:
Using PATINDEX with '%[^0-9]%' as a search pattern you get the starting position of the first occurrence of a character that is not a number.
Using REVERSE you get the position of the first non numeric character starting from the back of the string.
Edit:
To handle the case of strings not containing non numeric characters you can use:
select case
when patindex(#str, '%[^0-9]%') = 0 then #str
else right(#str, patindex('%[^0-9]%',reverse(#str)) - 1)
end
If your data always contains at least one non-numeric character then you can use the first query, otherwise use the second one.
Actual query:
So, if your table is something like this:
mycol
--------------
SB124197
287276ACBX92
R009321743-16
123456
then you can use the following query (works in SQL Server 2012+):
select iif(x.i = 0, mycol, right(mycol, x.i - 1))
from mytable
cross apply (select patindex('%[^0-9]%', reverse(mycol) )) as x(i)
Output:
mynum
------
124197
92
16
123456
Demo here
Here is one way using Patindex
SELECT RIGHT(strg, COALESCE(NULLIF(Patindex('%[^0-9]%', Reverse(strg)), 0) - 1, Len(strg)))
FROM (VALUES ('SB124197'),
('287276ACBX92'),
('R009321743-16')) tc (strg)
After reversing the string, we are finding the position of first non numeric character and extracting the data from that position till the end..
Result :
-----
124197
92
16

select integer before a certain character

hie am trying to select the integer value before the char C in my SQL database table which contains the information below.
240mm2 X 15C WIRING CABLE
150mm2 X 3C flex
10mm2 x 4C swa
so far i have used the query
select left ('C',CHARINDEX ('C',product_name)) from product
and i get 'C' on my results which is correct. Now am stuck does anyone know how i can modify the above select query to get a result which only lists the integers for eg
15
3
4
Two observations: the integer before "C" has a space before it and there is no space between the integer and "C".
If these are generally true, then you can do what you want using substring_index():
select substring_index(substring_index(product_name, 'C', 1), ' ', -1) + 0 as thenumber
The + 0 simply converts the value to a number.
If you're doing this in SQL Server you could try the following:
Select Substring(product_name,
PATINDEX('% [0-9]%',product_name) + 1,
PATINDEX('%[0-9]C%',product_name) - PATINDEX('% [0-9]%',product_name)
) as num
from Product
This assumes that there is a space before the number and always a C after the number with no space.
It works out the starting point and then the length based on the start and end and performs a substring with the results.
You could use a combination of instring and substring.
First get the position of the C
Then substring till C
It goes like this:
SELECT INSTR('foobarbar', 'bar');
= 4
And then you select substring from 1 to 4.

What's after Z in SQL_Latin1_General_CP1_CI_AS?

I am trying to prove a table design flaw in a production db, that a table must not have a clustered primary key on a column that can have a random data, in this case a code keyed in by end user.
Though we know the solution is to make the PK as non-clustered, I still need to add rows to it for testing purpose on its replica. Therefore, I will need to know what would be the character I can use after 'Z' as a prefix.
More, the column is not a unicode, and it would be a mess to prefix my fake data with a series of Zs. The table is now having hundred-thousands rows, and each insertion is taking seconds.
Just run this and go down the list. I added the sandwiching dots for clarity, esp. when non-visible characters are involved.
select number, '.' + char(number) + '.' collate SQL_Latin1_General_CP1_CI_AS thechar
from master..spt_values
where type='p' and number between 28 and 255
order by thechar
There are only 4 characters coming after 'Z', since you say the column is not N(Var)Char.
121 .y.
89 .Y.
253 .ý.
221 .Ý.
255 .ÿ.
90 .Z.
122 .z.
208 .Ð.
240 .ð.
254 .þ.
222 .Þ.

Resources