Some questions about HierarchyId (SQL Server 2008) - sql-server

I am a newbie in SQL Server 2008 and just got introduced to HierarchyId's.
I am learning from SQL Server 2008 - HIERARCHYID - PART I. So basically I am following the article line by line and while practicing in SSMS I found that for every ChildId some hexadecimal values are generated like 0x,0x58,0x5AC0 etc.
My questions are
What are these hexadecimal values?
Why are these generated and what is their use? I mean where can I use those hexa values?
Do we have any control over those hexa values? I mean can we update etc.
How to determine the hierarchy by looking into those hexa values.. I mean how can I determine which is the parent and which is the child?

Those hex values are simply a binary representation of the hierarchy level. In general, you should not use them directly.
You may want to check out the following example, which I think should be self-explanatory. I hope it will get you going in the right direction.
Create a table with a hierarchyid field:
CREATE TABLE groups (
group_name nvarchar(100) NOT NULL,
group_hierarchy hierarchyid NOT NULL
);
Insert some values:
INSERT INTO groups (group_name, group_hierarchy)
VALUES
('root', hierarchyid::Parse('/')),
('domain-a', hierarchyid::Parse('/1/')),
('domain-b', hierarchyid::Parse('/2/')),
('sub-a-1', hierarchyid::Parse('/1/1/')),
('sub-a-2', hierarchyid::Parse('/1/2/'));
Query the table:
SELECT
group_name,
group_hierarchy.ToString()
FROM
groups
WHERE
(group_hierarchy.IsDescendantOf(hierarchyid::Parse('/1/')) = 1);

Adam Milazzo wrote a great article about the innards of hierarchyid here:
http://www.adammil.net/blog/view.php?id=100
In a nutshell, it's not meaningful to work with things in straight hex, but rather convert the numbers out to binary. The reason is that things are not cut up on even byte boundaries. Representing a single node can be as short as 5 bits if it's one of the first four nodes. Becomes longer and longer as more nodes are used, 6 bits each for the next 4 nodes, 7 bits each for the next 8 nodes, and then it jumps to 12 bits each for the next 64 nodes! And then up to 18 bits each for the next 1024.
I needed to convert a database to Postgres, and wrote a script which parses these hex values. You can check out a version I made for AdventureWorks here, search for "hierarchyid":
https://github.com/lorint/AdventureWorks-for-Postgres/blob/master/install.sql

I'll let others address your specific questions, but I will tell you, that, IMO, the HierarchyId in SQL Server 2008 isn't one of Microsoft's greatest contributions to SQL Server. They are complex and somewhat awkward. I think you will find that for many hierarchical needs, common table expressions (CTE) work great.
Randy

Related

How is build the format of geography data type in sql server?

I'm not being able to understand how is the data type geography in SQL server...
For example I have the following data:
0xE6100000010CCEAACFD556484340B2F336363BCA21C0
what I know:
0x is prefix for hexadecimal
last 16 numbers are longitude: B2F336363BCA21C0 (double of decimal format)
16 numbers before the last 16 are latitude: CEAACFD556484340 (double of decimal format)
4 first numbers are SRID: E610 (hexadecimal for WGS84)
what I don't understand:
numbers from 5 to 12 : 0000010C
what is this?
From what I read this seems linked to WKB(Well Known Binary) or EWKB(Extended Well Known Binary) anyway i was not abble to find a definition for EWKB...
And for WKB this is supposed to be geometry type (4-byte integer) but the value doesn't match with the Geometry types codes (this example is for one point coordinate)
Can you help to understand this format?
The spatial types (geometry and geography) in SQL Server are implemented as CLR data types. As with any such data types, you get a binary representation when you query the value directly. Unfortunately, it's not (as far as I know) WKB but rather whatever format Microsoft decided was best for their implementation. For us (the users), we should work with the published interface of methods that have been published by MS (for instance the geography method reference). Which is to say that you should only try to decipher the MS binary representation if you're curious (and not for actually working with it).
That said, if you need/want to work with WKB, you can! For example, you can use the STGeomFromWKB() static method to create a geography instance from WKB that you provide and STAsBinary() can be called on a geography instance to return WKB to you.
The Format spec can be found here:
https://msdn.microsoft.com/en-us/library/ee320529(v=sql.105).aspx
As that page shows, it used to change very frequently, but has slowed down significantly over the past 2 years
I am currently needing to dig into the spec to serialize from JVM code into a bcp file so that I can use SQLServerBulkCopy rather than plain JDBC to upload data into tables (it is about 7x faster to write a bcp file than using JDBC), but this is proving to be more complicated than what I originally anticipated.
After testing with bcp, you can upload geographies by specifying an off row format ( varchar(max) ) and store the well known text, SQL server will see this and assume you wanted a geography based on the WKT it sees.
In my case converting to nvarchar resolved the issue.

SQL Server varbinary trailing 0 interpretation

CREATE TABLE [sql_table1] ([c0] varbinary(25) NOT NULL primary key)
go
insert into sql_table1 values (0x3200),(0x32);
go
I get
Cannot insert duplicate key in object 'dbo.sql_table'. The duplicate
key value is (0x32).
Why? 0x32 does not equal 0x3200
It gets right padded. BINARY data gets tricky when you try to specify what should normally be equivalent numerically hex values. If you try this it will work:
insert into #sql_table1 values (0x32),(CAST(50 as VARBINARY(25)));
-- inserts 0x00000032
-- and 0x32
But these are numerically equivalent. Generally speaking, it's a bad idea to have a BINARY column of any sort be a primary key or try to put a unique index on it (moreso than a CHAR/VARCHAR/NVARCHAR column) - any application that inserts into it is going to almost certainly be CASTing from some native format/representation to binary, but there's no guarantee that that CAST actually works in a unique manner in either direction - did the application insert the value 50 (= 0x32), or did it try to insert the literal 0x32, or did it try to insert the ASCII value of 2 (= 0x32), or did it insert the first byte(s) of something else....? If one app submits 0x32 and another 0x0032 are they the same or different (SQL Server says different - see below)?
The bottom line is that SQL Server is going to behave unusually if you try to compare binary columns flat out without some context. What will work is comparing binary data using BINARY_CHECKSUM
SELECT BINARY_CHECKSUM(0x32) -- 50
SELECT BINARY_CHECKSUM(0x320) -- 16! -- it's treating this as having a different number or ordering of bytes
SELECT BINARY_CHECKSUM(0x3200) -- 50
SELECT BINARY_CHECKSUM(0x32000) -- 16
SELECT BINARY_CHECKSUM(0x0032) -- 50
SELECT BINARY_CHECKSUM(0x00032) -- 50
SELECT BINARY_CHECKSUM(0x000000000032) -- 50
but again, this only helps you see that the hexidecimal represenation of the binary data isn't going to work exactly the way it would seem. The point is, your primary key is going to be based on the BINARY_CHECKSUMs of the data instead of any particular format/representation of the data. Normally that's a good thing, but with binary data (and padding) it becomes a lot trickier. Even then, in my example above the BINARY_CHECKSUM of both columns will be exactly the same (SELECT BINARY_CHECKSUM(c0) FROM sql_table1 will output 50 for both rows). Weird - a little further testing is showing that any different number of leading 0s that fit into the column length will bypass the unique check even though the checksum is the same (e.g. VALUES (0x32), (0x032), (0x0032) etc.).
This only gets worse if you start throwing different versions of SQL Server into the mix (per MSDN documentation).
What you should do for PK/Unique design on a table is figure out what context will make sense of this data - an order number, a file reference, a timestamp, a device ID, some other business or logical identifier, etc.... If nothing else, pseudokey it with an IDENTITY column.

Access linked tables truncating my Decimal values from the SQL server

Since migrating the Access data to a SQL server I am having multiple problems with the decimal values. In my SQL tables on the SQL 2012 server I am using the Decimal data type for multiple fields. A while a go I first tried to set the decimal values to 18,2 but Access acted weird on this by truncating all the values (55,55 became 50 and so on).
So after multiple changes it seemed that Access accepted the 30,2 decimal setting in the SQL server (now the values were linked correct in the linked Access tables).
A few days ago I stumbled however back on this problem because a user had problems with editing a number in the access form. So I checked the linked table data type and there it seemed that Access converts the decimal 30,2 value to a Short Text data type, which is obviously wrong. So I did a bit of research and found out that Access cannot handle a 30,2 decimal, thus it is converted to text by the ODBC driver. (See my previously post: Access 2013 form field value gets cut off on changing the number before the point)
So to fix this latter error I tried, once again (forgetting that I already messed around with it) to change the decimal value to 17,2 / 18,2 and some other decimal values but on all these changes I am getting back to the truncating problem...
I found some posts about it but nothing concrete or answers on how to solve it.
Some additional information:
Using a SQL 2012 server
Using Access 2013
Got a SQL Server Native Client 10 and 11 installed.
Looking in the register key I found out that I am using ODBC driver version 02.50
The SQL native client 11 has/uses DriverODBC ver 03.80 and the native client 10 uses DriverODBC ver 10.00 (not sure this is relevant though).
UPDATE WITH IMAGES
In a access form I have multiple lines that have a linked table (sql table) as record source. These lines get populated with the data in the SQL server.
Below you can see a line with a specific example, the eenh. prijs is loaded from the linked (SQL) table.
Now when I change the 5 in front of the point (so making it 2555,00 instead of 5555,00) the value gets cut off:
======>>>
So I did research on it and understand that my SQL decimal 30,2 isn't accepted by Access. So I looked in my access linked table to see what kind of data type the field is:
So the specific column (CorStukPrijs) is in the SQL server a decimal 30,2 but here a short text (sorry for the dutch words).
The other numerics (which are OK) are just normal integers by the way.
In my linked table on access - datasheet view the values look like this:
I also added a decimal value of how it looks in my linked table:
In my SQL server the (same) data looks like this:
Though, because of the changing number problem before the point (back in the form - first images) I changed the decimal type of 30,2 in the server to 18,2.
This is the result in the linked table on that same 5555 value:
It gives #Errors and the error message:
Scaling of decimal values has resulted in truncated values
(translated it so wont be probably exactly like that in English)
The previous 0,71 value results with the decimal 18,2 in:
Hope its a bit clearer now!
P.S. I just changed one decimal field to 18,2 now.
Recently I found a solution for this problem! It all had to do with language settings after all.. (and the decimal 30,2 which is not accepted as a decimal in Access 2013).
I changed the Native client from 10 to 11 and in my connection string I added one vital value: regional=no. This fixed the problem!
So now my connection string is:
szSQLConnectionString = "DRIVER=SQL Server Native Client 11.0;SERVER=" & szSQLServer & ";DATABASE=" & szSQLDatabase & ";UID=" & szSQLUsername & ";PWD=" & szSQLPassword & ";regional=no;Application Name=OPS-FE;MARS_Connection=yes;"
A few things:
No real good reason to try a decimal value of 30 digits?
Access only supports 28 digits for a packed decimal column. So going to 30 will force Access to see that value as a string.
If you keep the total digits below 28, then you should be ok.
You also left out what driver you are using. (legacy, or native 10 or native 11). However, all 3 should have no trouble with decimal.
As a few noted here, after ANY change to the sql table, you have to refresh the linked table else such changes will not show up.
There is NO need to have some re-link code every time on startup. And it not clear how your re-link code works. If the re-link code makes a copy of the tabledef object, and then re-instates the same tabledef then changes to the back end may well not show up.
I would suggest during testing, you DO NOT use your re-link routines, but simply right click on the given linked table and choose the linked table manager. Then click on the one table, and ok to refresh.
Also, in Access during this testing, dump (remove) any formatting you have in the table settings for testing (the format setting).
I suggest you start over, and take the original tables and re-up-size them again.
Access should and can handle the decimal types with ease, but it not clear what your original settings were. If the values never require more than 4 significant digits beyond the decimal, then I would consider using currency, but decimal should also work.

Natural Sort in NHibernate

I've got a hibernate query that returns a list of objects and I want to order by a title. This is a user maintained field and some of our customers like prefixing their titles with numbers, this isn't something I can control. The data is something like this:
- 1 first thing
- 2 second thing
- 5 fifth thing
- 10 tenth thing
- 20 twentieth thing
- A thing with no number
A traditional
.AddOrder(Order.Asc("Name"))
Is resulting in a textual sort:
- 1 first thing
- 10 tenth thing
- 2 second thing
- 20 twentieth thing
- 5 fifth thing
- A thing with no number
This is correct as this is a nvarchar field but is there any way I can get the numbers to sort as well?
There seem to be several workarounds involving prefixing all fields with leading 0's and such like but do any of these work through NHibernate?
This application runs interchangeably on Oracle and MsSQL.
Cheers,
Matt
You might consider creating a custom sort order implementation that allows you to specify the collation on the column to achieve your desired results. Something along the lines of:
ORDER BY Name COLLATE Latin1_General_BIN ASC
But with an appropriate collation (I don't believe Latin1_General_BIN is what you need specifically)
Add additional field (e.g. named Name_Number ) to object defenition and query on server side.
For Oracle select in this field to_number(Name) as Name_Number .
For For MS SQL select in this field cast(Name as numeric) as Name_Number .
Then sort on client side as .AddOrder(Order.Asc("Name_Number"))
P.S. I am not sure, because don't have enough experience with hibernate/nhibernate and hate ORM at all :)

Ordering numbers that are stored as strings in the database

I have a bunch of records in several tables in a database that have a "process number" field, that's basically a number, but I have to store it as a string both because of some legacy data that has stuff like "89a" as a number and some numbering system that requires that process numbers be represented as number/year.
The problem arises when I try to order the processes by number. I get stuff like:
1
10
11
12
And the other problem is when I need to add a new process. The new process' number should be the biggest existing number incremented by one, and for that I would need a way to order the existing records by number.
Any suggestions?
Maybe this will help.
Essentially:
SELECT process_order FROM your_table ORDER BY process_order + 0 ASC
Can you store the numbers as zero padded values? That is, 01, 10, 11, 12?
I would suggest to create a new numeric field used only for ordering and update it from a trigger.
Can you split the data into two fields?
Store the 'process number' as an int and the 'process subtype' as a string.
That way:
you can easily get the MAX processNumber - and increment it when you need to generate a
new number
you can ORDER BY processNumber ASC,
processSubtype ASC - to get the
correct order, even if multiple records have the same base number with different years/letters appended
when you need the 'full' number you
can just concatenate the two fields
Would that do what you need?
Given that your process numbers don't seem to follow any fixed patterns (from your question and comments), can you construct/maintain a process number table that has two fields:
create table process_ordering ( processNumber varchar(N), processOrder int )
Then select all the process numbers from your tables and insert into the process number table. Set the ordering however you want based on the (varying) process number formats. Join on this table, order by processOrder and select all fields from the other table. Index this table on processNumber to make the join fast.
select my_processes.*
from my_processes
inner join process_ordering on my_process.processNumber = process_ordering.processNumber
order by process_ordering.processOrder
It seems to me that you have two tasks here.
• Convert the strings to numbers by legacy format/strip off the junk• Order the numbers
If you have a practical way of introducing string-parsing regular expressions into your process (and your issue has enough volume to be worth the effort), then I'd
• Create a reference table such as
CREATE TABLE tblLegacyFormatRegularExpressionMaster(
LegacyFormatId int,
LegacyFormatName varchar(50),
RegularExpression varchar(max)
)
• Then, with a way of invoking the regular expressions, such as the CLR integration in SQL Server 2005 and above (the .NET Common Language Runtime integration to allow calls to compiled .NET methods from within SQL Server as ordinary (Microsoft extended) T-SQL, then you should be able to solve your problem.
• See
http://www.codeproject.com/KB/string/SqlRegEx.aspx
I apologize if this is way too much overhead for your problem at hand.
Suggestion:
• Make your column a fixed width text (i.e. CHAR rather than VARCHAR).
• Pad the existing values with enough leading zeros to fill each column and a trailing space(s) where the values do not end in 'a' (or whatever).
• Add a CHECK constraint (or equivalent) to ensure new values conform to the pattern e.g. something like
CHECK (process_number LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][ab ]')
• In your insert/update stored procedures (or equivalent), pad any incoming values to fit the pattern.
• Remove the leading/trailing zeros/spaces as appropriate when displaying the values to humans.
Another advantage of this approach is that the incoming values '1', '01', '001', etc would all be considered to be the same value and could be covered by a simple unique constraint in the DBMS.
BTW I like the idea of splitting the trailing 'a' (or whatever) into a separate column, however I got the impression the data element in question is an identifier and therefore would not be appropriate to split it.
You need to cast your field as you're selecting. I'm basing this syntax on MySQL - but the idea's the same:
select * from table order by cast(field AS UNSIGNED);
Of course UNSIGNED could be SIGNED if required.

Resources