SQL is removing leading zeros - sql-server

Problem: I have large tables from the data warehouse with Course Numbers stored as nvarchar(20).
For the last 50 years, these course numbers have all been 8-digits, including many leading zeros for lower numbers.
But now, our cloud provider has decided to operate with 10-digit Course Numbers
A numerically equivalent value with leading zeroes is NOT recognized by the cloud provider as being equivalent. They are using strict string logic. But SQL Server 2008 seems to be trying to do me "favors" by treating strings of digits as if they were numbers.
If I query a table with no WHERE clauses, I get only the 8-digit (8-character) records.
The records with 10-digit (10-character) CourseNbr have disappeared.
But If I explicitly ask for records WHERE (LEN([CourseNbr]) = 10), it returns the almost 200 missing records. This just seems intollerable. How can I do business this way? Migrating to a new SQL Server 2017 server. Maybe this anomaly will disappear?
##VERSION = Microsoft SQL Server 2008 R2 (SP1) - 10.50.2500.0 (X64)
EDIT: I changed the storage type in my local copy of the table [CourseNbr] from nvarchar(20) to varchar(10) and now the 10-digit string/numbers appear! And I can search for 10-character/digit records and they appear OK. Have I answered my own question? Why should that make a difference?

You can try this
Return varchar without leading zeros
SELECT CONVERT(varchar(20), CONVERT(int, [CourseNbr])) as CourseNbrWithNoLeadingZeros
FROM CourseNbrTable
Return int without leading zeros
SELECT CONVERT(int, [CourseNbr]) as CourseNbrWithNoLeadingZeros
FROM CourseNbrTable

Related

Cannot store particular Unicode code points / characters in NVARCHAR fields

I'm doing some tests with SQL Server 2017.
I'm trying to store arbitrary Unicode code points in an NVARCHAR column.
I've tried different collations.
I have no problem with common characters in the BMP plane of Unicode.
For more exotic symbols, for example if I try to store the "𝌹" character (U+1D33), the following happens:
If I do it within Management Studio, I only see the infamous square symbol. But Management Studio has the proper font since I can paste it in the query editor.
If I send the text from Visual Studio, the value I see in Management Studio is "??", that's what I retrieve from Visual Studio, too, after performing a query.
My understanding is, for non-supplementary character collations, characters outside the UCS-2 subset shouldn't be interpreted correctly because NCHAR fields are limited to 2 bytes.
But, I tried with Latin1_General_100_CS_AS_KS_WS_SC, both at the DB level and column level, and it doesn't seem to work either.
Any ideas?
Thanks
I can't reproduce any data loss or encoding issue. I can reproduce a squares that becomes 𝌹 when copied. It's probably caused by the font used to display results in the SSMS grid or the Visual Studio debugger windows.
SQL Server and Windows use UTF16 for some time now, not UCS-2. Few fonts support the full UTF16 range though.
When I tried this in SSMS :
create table #tc(name nvarchar(20));
insert into #tc values (N'𝌹');
select name,len(name),DATALENGTH(name) from #tc;
I saw a square, 2 and 4 in the grid. This means the character was stored properly and took 4 bytes. When I tried to copy those results to SO though I saw :
name (No column name) (No column name)
𝌹 2 4
When I used Result to Text I got the actual character :
name
-------------------- ----------- -----------
𝌹 2 4
The correct character is there but the SSMS grid's font can't display it
Update
As Dan Guzman noted,the font can be changed from Tools-->Options-->Environment-->Fonts and Colors-->Show settings for:-->Grid Results. The default font is Microsoft Sans Serif, a small font (855KB) used as the default font on Windows. It contains "only" 3000 glyphs. Chinese characters aren't included, which is why squares are displayed.
Chinese computers use SimShun as the default though, whose file is 17.1MB. They wouldn't have any problem displaying chinese characters.
I'm trying to store arbitrary unicode points in an nvarchar column. I've tried different collations. I have no problem with common characters in the PBS plane of Unicode.
Collations have nothing to do with what code points you can store in an NVARCHAR / NCHAR / NTEXT (deprecated) column, variable, or literal. Those datatypes can store all 1,114,112 Unicode code points (even though most haven't been mapped to a character yet).
if I try to store 𝌹 character(U+1D33), ... within Management Studio, i only see the infamous square symbol. But management studio has the proper font since i can paste it in the query editor.
As others have explained already: this is merely a font issue. Fonts can hold a max of 65k characters, so you might need multiple fonts to cover all of the characters you are trying to use. I prefer Code2003 which you can find on FontSpace.com.
If i send the text from Visual Studio, the value i see in management studio is '??'
This should be due to forgetting to prefix the string literal with an upper-case "N" ;-).
SELECT '𝌹' AS [Oops], N'𝌹' AS [No Oops];
-- ?? 𝌹
My understanding is, for non supplementary character collations, characters outside the UCS-2 subset shouldn't be interpreted correctly because nchar fields are limited to 2 bytes.
The Supplementary Character-Aware (SCA) collations — those ending with _SC or with _140_ in their names — do support supplementary characters. BUT, "support" only means that the built-in functions handle the surrogate pair as a single, supplementary code point instead a pair of surrogate code points. But, support for sorting and comparison of supplementary characters actually started in SQL Server 2005 with the introduction of the version 90 collations.
All code units in UCS-2 and UTF-16 are 16 bits / 2 bytes. Supplementary characters are merely two of those 2-byte code units. Hence, being able to store supplementary characters should have been available back in SQL Server 7.0 when NVARCHAR was introduced. Even though no supplementary characters were defined until years later (after SQL Server 2000 was released), the NVARCHAR types were still capable of storing and retrieving them. I don't have SQL Server 7.0 to test with, but I have confirmed this on SQL Server 2000.
For more info, please see:
How Many Bytes Per Character in SQL Server: a Completely Complete Guide
Collations Info

RODBC ERROR: 'Calloc' could not allocate memory

I am setting up a SQL Azure database. I need to write data into the database on daily basis. I am using 64-bit R version 3.3.3 on Windows10. Some of the columns contain text (more than 4000 characters). Initially, I have imported some data from a csv into the SQL Azure database using Microsoft SQL Server Management Studios. I set up the text columns as ntext format, because when I tried using nvarchar the max was 4000 and some of the values got truncated even though they were about 1100 characters long.
In order to append to the database I am first saving the records in a temp table when I have predefined the varTypes:
varTypesNewFile <- c("Numeric", rep("NTEXT", ncol(newFileToAppend) - 1))
names(varTypesNewFile) <- names(newFileToAppend)
sqlSave(dbhandle, newFileToAppend, "newFileToAppendTmp", rownames = F, varTypes = varTypesNewFile, safer = F)
and then append them by using:
insert into mainTable select * from newFileToAppendTmp
If the text is not too long, the above does work. However, sometimes I get the following error during the sqlSave command:
Error in odbcUpdate(channel, query, mydata, coldata[m, ], test = test, :
'Calloc' could not allocate memory (1073741824 of 1 bytes)
My questions are:
How can I counter this issue?
Is this the format I should be using?
Additionally, even when the above works, it takes about an hour to upload about 5k of records. Is it not too long? Is this the normal amount of time it should take? If not, what could I do better.
RODBC is very old, and can be a bit flaky with NVARCHAR columns. Try using the RSQLServer package instead, which offers an alternative means to connect to SQL Server (and also provides a dplyr backend).

SQL Server 2014 filter by length and value

I'm using SQL Server 2014 Management Studio.
I need a way to filter out values in my table column of any length, for example I want to filter out all the zeros of any length.
A select statement that can return '0' or '00' or '000' and so on.
SELECT *
FROM <my_table>
WHERE <my_column> LIKE <condition>
Returns
0
00
000
etc
For the specific case of filtering by one repeated character:
WHERE <my_column> LIKE '%0%' AND <my_column> NOT LIKE '%[^0]%'
That is, the column must contain at least one zero and must not contain non-zeroes anywhere.
For more complicated patterns, LIKE tends to break down quickly.

MS SQL server - convert HEX string to integer

This answer to what looks like the same question:
Convert integer to hex and hex to integer
..does not work for me.
I am not able to go to a HEX string to an integer using MS SQL server 2005 CAST or CONVERT. Am I missing something trivial? I have searched extensively, and the best I can find are long-winded user functions to go from a hex string value to something that looks like a decimal int. Surely there is a simple way to do this directly in a query using built in functions rather than writing a user function?
Thanks
Edit to include examples:
select CONVERT(INT, 0x89)
works as expected, but
select CONVERT(INT, '0x' + substring(msg, 66, 2)) from sometable
gets me:
"Conversion failed when converting the varchar value '0x89' to data type int."
an extra explicit CAST:
select CONVERT(INT, CAST('0x89' AS VARBINARY))
executes, but returns 813185081.
Substituting 'Int', 'Decimal', etc for 'Varbinary' results in an error. In general, strings that appear to be numeric are interpreted as numeric if required, but not in this case, and there does not appear to be a CAST that recognizes HEX. I would like to think there is something simple and obvious and I've just missed it.
Microsoft SQL Server Management Studio Express 9.00.3042.00
Microsoft SQL Server 2005 - 9.00.3080.00 (Intel X86) Sep 6 2009 01:43:32 Copyright (c) 1988-2005 Microsoft Corporation Express Edition with Advanced Services on Windows NT 5.1 (Build 2600: Service Pack 3)
To sum up: I want to take a hex string which is a value in a table, and display it as part of a query result as a decimal integer, using only system defined functions, not a UDF.
Thanks for giving some more explicit examples. As far as I can tell from the documentation and Googling, this is not possible in MSSQL 2005 without a UDF or other procedural code. In MSSQL 2008 the CONVERT() function's style parameter now supoprts binary data, so you can do it directly like this:
select convert(int, convert(varbinary, '0x89', 1))
In previous versions, your choices are:
Use a UDF (TSQL or CLR; CLR might actually be easier for this)
Wrap the SELECT in a stored procedure (but you'll probably still have the equivalent of a UDF in it anyway)
Convert it in the application front end
Upgrade to MSSQL 2008
If converting the data is only for display purposes, the application might be the easiest solution: data formatting usually belongs there anyway. If you must do it in a query, then a UDF is easiest but the performance may not be great (I know you said you preferred not to use a UDF but it's not clear why). I'm guessing that upgrading to MSSQL 2008 just for this probably isn't realistic.
Finally, FYI the version number you included is the version of Management Studio, not the version number of your server. To get that, query the server itself with select ##version or select serverproperty('ProductVersion').

Some questions about HierarchyId (SQL Server 2008)

I am a newbie in SQL Server 2008 and just got introduced to HierarchyId's.
I am learning from SQL Server 2008 - HIERARCHYID - PART I. So basically I am following the article line by line and while practicing in SSMS I found that for every ChildId some hexadecimal values are generated like 0x,0x58,0x5AC0 etc.
My questions are
What are these hexadecimal values?
Why are these generated and what is their use? I mean where can I use those hexa values?
Do we have any control over those hexa values? I mean can we update etc.
How to determine the hierarchy by looking into those hexa values.. I mean how can I determine which is the parent and which is the child?
Those hex values are simply a binary representation of the hierarchy level. In general, you should not use them directly.
You may want to check out the following example, which I think should be self-explanatory. I hope it will get you going in the right direction.
Create a table with a hierarchyid field:
CREATE TABLE groups (
group_name nvarchar(100) NOT NULL,
group_hierarchy hierarchyid NOT NULL
);
Insert some values:
INSERT INTO groups (group_name, group_hierarchy)
VALUES
('root', hierarchyid::Parse('/')),
('domain-a', hierarchyid::Parse('/1/')),
('domain-b', hierarchyid::Parse('/2/')),
('sub-a-1', hierarchyid::Parse('/1/1/')),
('sub-a-2', hierarchyid::Parse('/1/2/'));
Query the table:
SELECT
group_name,
group_hierarchy.ToString()
FROM
groups
WHERE
(group_hierarchy.IsDescendantOf(hierarchyid::Parse('/1/')) = 1);
Adam Milazzo wrote a great article about the innards of hierarchyid here:
http://www.adammil.net/blog/view.php?id=100
In a nutshell, it's not meaningful to work with things in straight hex, but rather convert the numbers out to binary. The reason is that things are not cut up on even byte boundaries. Representing a single node can be as short as 5 bits if it's one of the first four nodes. Becomes longer and longer as more nodes are used, 6 bits each for the next 4 nodes, 7 bits each for the next 8 nodes, and then it jumps to 12 bits each for the next 64 nodes! And then up to 18 bits each for the next 1024.
I needed to convert a database to Postgres, and wrote a script which parses these hex values. You can check out a version I made for AdventureWorks here, search for "hierarchyid":
https://github.com/lorint/AdventureWorks-for-Postgres/blob/master/install.sql
I'll let others address your specific questions, but I will tell you, that, IMO, the HierarchyId in SQL Server 2008 isn't one of Microsoft's greatest contributions to SQL Server. They are complex and somewhat awkward. I think you will find that for many hierarchical needs, common table expressions (CTE) work great.
Randy

Resources