how does SQL Server actually store russian symbols in char? - sql-server

I have a column NAME, which is CHAR(50).
It contains the value 'Рулон комбинированный СТЕРИТ 50мм ? 200 м'
which integer representation is:
'1056,1091,1083,1086,1085,32,1082,1086,1084,1073,1080,1085,1080,1088,1086,1074,1072,1085,1085,1099,1081,32,1057,1058,1045,1056,1048,1058,32,53,48,1084,1084,32,63,32,50,48,48,32,1084'
but CHAR implies that it contains 8 bit. How does SQL Server store values like '1056,1091,1083,1086,1085' which are UNICODE symbols?
OK, and also ? symbol is actually × (215) (Multiplication Sign)
If SQL Server can represent '1056' why it can't represent '215'?

What the 255 values in a char mean is determined by the database collation. For Russia this is typically Cyrillic_General_CI_AS (where CI means Case Insentitive and AS means Accent Sensitive.)
There's a good chance this matches Windows code page 1251, so л is stored as hex EB or decimal 235. You can verify this with T-SQL:
create database d1 collate Cyrillic_General_CI_AS;
use d1
select ascii('л')
-->
235
In the Cyrillic code page, decimal 215 means Ч, not the multiplication sign. Because SQL Server can't match the multiplication sign to the Cyrillic code page, it replaces it with a question mark:
select ascii('×'), ascii('?')
-->
63 63
In the Cyrillic code page, the char 8-bit representation of the multiplication sign and the question mark are both decimal 63, the question mark.

I have a column NAME, which is CHAR(50).
It contains the value 'Рулон комбинированный СТЕРИТ 50мм ? 200 м'
which integer representation is:
'1056,1091,1083,1086,1085,32,1082,1086,1084,1073,1080,1085,1080,1088,1086,1074,1072,1085,1085,1099,1081,32,1057,1058,1045,1056,1048,1058,32,53,48,1084,1084,32,63,32,50,48,48,32,1084'
Cyted above is wrong.
I make a test within a database with Cyrillic collation and integer representation is different from what you showed us, so or your data type is not char, or your integer representation is wrong, and yes, "but CHAR implies that it contains 8 bit" is correct and here is how you can prove it to youerself:
--create table dbo.t (name char(50));
--insert into dbo.t values ('Рулон комбинированный СТЕРИТ 50мм ? 200 м')
select cast (name as binary(50))
from dbo.t;
select substring(cast (name as binary(50)), n, 1) as bin_substr,
cast(substring(cast (name as binary(50)), n, 1) as int) as int_,
char(substring(cast (name as binary(50)), n, 1)) as cyr_char
from dbo.t cross join nums.dbo.nums;
Here dbo.Nums is an auxiliary table containig integers. I just convert your string from char field into binary, split it byte per byte and convert into int and char.

Related

How to convert VARCHAR columns to DECIMAL without rounding in SQL Server?

In my SQL class, I'm working with a table that is all VARCHAR. I'm trying to convert each column to a more correct data type.
For example. I have a column called Item_Cost that has a value like:
1.25000000000000000000
I tried to run this query:
ALTER TABLE <table>
ALTER COLUMN Item_Cost DECIMAL
This query does run successfully, but it turns it into 1 instead of 1.25.
How do I prevent the rounding?
Check out the documentation for the data type decimal. The type is defined by optional parameters p (precision) and s (scale). The latter determines the numbers to the right of the decimal point.
Extract from the documentation (I highlighted the important bit in bold):
s (scale)
The number of decimal digits that are stored to the right of
the decimal point. This number is subtracted from p to determine the
maximum number of digits to the left of the decimal point. Scale must
be a value from 0 through p, and can only be specified if precision is
specified. The default scale is 0 and so 0 <= s <= p. Maximum storage
sizes vary, based on the precision.
Defining a suitable precision and scale fixes your issue.
Sample data
create table MyData
(
Item_Cost nvarchar(100)
);
insert into MyData (Item_Cost) values ('1.25000000000000000000');
Solution
ALTER TABLE MyData Alter Column Item_Cost DECIMAL(10, 3);
Result
Item_Cost
---------
1.250
Fiddle

converting TEXT to VARCHAR

I 've noticed that when converting TEXT to VARCHAR the converted value is silently clipped at 30 characters.
CREATE TABLE foo (x TEXT)
-- insert a string that's 50 characters long
INSERT INTO foo(x) VALUES('xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx')
SELECT CHAR_LENGTH(CONVERT(VARCHAR, x)) FROM foo -- retuns 30
SELECT CHAR_LENGTH(CONVERT(VARCHAR(3000), x)) FROM foo -- returns 50
My questions are:
where is that limit documented / originate from?
what's an idiomatic way to make the conversion without having to add an arbitrarily high value? (as in the second SELECT statement above)
You can better always specify the varchar-length and the maximum length of a varchar is in Sybase ASE 15.7 and 16.0 16384.
If you try to create a longer varchar, you'll get following error:
Length or precision specification 16385 is not within the range of 1 to 16384.
Tim

Some doubts related Microsoft SQL Server bigint

I have the following doubt related to Microsoft SQL Server. If a bigint column has a value as 0E-9, does it mean that this cell can contain value with 9 decimal digits or what?
BIGINT: -9,223,372,036,854,775,808 TO 9,223,372,036,854,775,807
INT: -2,147,483,648 TO 2,147,483,647
SMALLINT: -32,768 TO 32,767
TININT: 0 TO 255
These are for storing non-decimal values. You need to use DECIMAL or NUMERIC to store values shuch as 112.455. When maximum precision is used, valid values are from - 10^38 +1 through 10^38 - 1.
OE-9 isn't NUMERICor INTEGER value. It's a VARCHAR unless you are meaning something else like scientific notation.
https://msdn.microsoft.com/en-us/library/ms187746.aspx
https://msdn.microsoft.com/en-us/library/ms187745.aspx
No, the value would be stored as an integer. Any decimal amounts will be dropped off and only values to the left of the decimal will be kept (no rounding).
More specifically, bigint stores whole numbers between -2^63 and 2^63-1.

Postive Integers and Unsigned in SQL Server 2012

I have read that SQL Server has the ability to create an unsigned integer column and I also read that SQL Server does not allow creation of integer unsigned column. So I'm confused as to which is actually correct.
I need to create a new column in my table called QuantityonHand. This column should be an integer of 5 characters and only accept positive numbers.
So do I create the column as;
(1) QuantityonHand [unsigned] int (5) which means the number can only be positive OR
(2) QuantityonHand int (5) default 0 - which means the number cannot be less than zero, the default 0 being the condition in the column.
I am leaning towards the second one, but I was hoping to get some guidance before I add the column and mess up my table.
Thanks everyone
Josie
There are no unsigned data types, but you can use a check constraint to allow only certain range of values. You can find information for example from here.
First of all there is no UNSIGNED version of INT see: UNSIGNED INTEGER Data Type
I need to create a new column in my table called QuantityonHand. This
column should be an integer of 5 characters and only accept positive
numbers.
Use standard INT and add CHECK constraint.
QuantityonHand INT CHECK (QuantityonHand >= 0 AND QuantityonHand <= 99999)
LiveDemo

Why SQL binary convert function results a non-0101... value?

Why when I use the command in SQL Server 2005
select convert(varbinary(16),N'123')
results 0x310032003300 and not 1111011?
Basically each letter of '123' gets converted to it's UCS-2(basically the ASCII value padded to make it a double byte) value in the three double bytes of 0x3100, 0x3200, 0x3300, and concatenated together in a varbinar.
Hopefully that answers why you see this behavior. If you convert to an int first you may see what you were perhaps hoping for instead:
select convert(varbinary(16),cast(N'123' as int))
produces hex value 0x0000007B which in binary is 1111011
See http://www.asciitable.com/ the entry for numeric 3, the hex representation is 0x33 which corresponds to the same entry in unicode: http://www.ssec.wisc.edu/~tomw/java/unicode.html (this pattern does not necessarily hold true for all ASCII/unicode characters, but does for the 10 integers).

Resources