I have a mssql database with Id in the form of hex values.
For example, when viewed in Management studio, a typical id column looks like
id, | userName
0x8189CF203DEA4A44B8ADEFF1C8246866, | John
0xAF4845C8A34A48EF8B6D481F2D20D561, | Peter
0x70B1F5E3B3F8417BBB99912640C54520, | Alan
To query the user table, I need to write something like
SELECT * FROM users Where Id = 0x8189CF203DEA4A44B8ADEFF1C8246866
I use a lot of sequelize.query to run a bunch of SQL statements directly.
When such table is read in sequelize, the id gets converted into buffer type in sequelize. So my question is, how can I keep this hex value? Is there a config that keep the string hex value of these ids? Did I have to convert these buffer type manually to hex string by hand, and attach a 0x in the front?
For example, when viewed in Management studio, a typical id column looks like
For me, a typical ID does not look like this, but I'm quite sure, that your hex-values are UNIQUEIDENTIFIERs (=GUID) actually (see option 2).
Option 1: HEX-string
You might store the hex string as its string representation:
SELECT sys.fn_varbintohexstr(0x8189CF203DEA4A44B8ADEFF1C8246866)
returns "0x8189cf203dea4a44b8adeff1c8246866" (which is a string now)
But - how ever - the function meant to do the opposite truncates part of this
select sys.fn_cdc_hexstrtobin(N'0x8189CF203DEA4A44B8ADEFF1C8246866')
returns 0x8189CF203DEA4A44B8AD (which is to short!!!)
OPTION 2: GUID
I would cast these values to GUIDs (if none of them is wider than 16 Bytes!) and store them typesafe. It is easy and fully out-of-the-box to get a GUID as its string representation (e.g. to write this in XML) and to cast it back to GUID.
SELECT CAST(0x8189CF203DEA4A44B8ADEFF1C8246866 AS uniqueidentifier);
returns 20CF8981-EA3D-444A-B8AD-EFF1C8246866
SELECT CAST('20CF8981-EA3D-444A-B8AD-EFF1C8246866' AS uniqueidentifier)
returns the same as above, just to show, that this string value is casted to a real GUID
SELECT CAST(CAST('20CF8981-EA3D-444A-B8AD-EFF1C8246866' AS uniqueidentifier) AS varbinary(max))
returns 0x8189CF203DEA4A44B8ADEFF1C8246866
Now you have your original HEX-string back again.
Related
We are using a software that has limited Oracle capabilities. I need to filter through a CLOB field by making sure it has a specific value. Normally, outside of this software I would do something like:
DBMS_LOB.SUBSTR(t.new_value) = 'Y'
However, this isn't supported so I'm attempting to use CAST instead. I've tried many different attempts but so far these are what I found:
The software has a built-in query checker/validator and these are the ones it shows as invalid:
DBMS_LOB.SUBSTR(t.new_value)
CAST(t.new_value AS VARCHAR2(10))
CAST(t.new_value AS NVARCHAR2(10))
However, the validator does accept these:
CAST(t.new_value AS VARCHAR(10))
CAST(t.new_value AS NVARCHAR(10))
CAST(t.new_value AS CHAR(10))
Unfortunately, even though the validator lets these ones go through, when running the query to fetch data, I get ORA-22835: Buffer too small when using VARCHAR or NVARCHAR. And I get ORA-25137: Data value out of range when using CHAR.
Are there other ways I could try to check that my CLOB field has a specific value when filtering the data? If not, how do I fix my current issues?
The error you're getting indicates that Oracle is trying to apply the CAST(t.new_value AS VARCHAR(10)) to a row where new_value has more than 10 characters. That makes sense given your description that new_value is a generic audit field that has values from a large number of different tables with a variety of data lengths. Given that, you'd need to structure the query in a way that forces the optimizer to reduce the set of rows you're applying the cast to down to just those where new_value has just a single character before applying the cast.
Not knowing what sort of scope the software you're using provides for structuring your code, I'm not sure what options you have there. Be aware that depending on how robust you need this, the optimizer has quite a bit of flexibility to choose to apply predicates and functions on the projection in an arbitrary order. So even if you find an approach that works once, it may stop working in the future when statistics change or the database is upgraded and Oracle decides to choose a different plan.
Using this as sample data
create table tab1(col clob);
insert into tab1(col) values (rpad('x',3000,'y'));
You need to use dbms_lob.substr(col,1) to get the first character (from the default offset= 1)
select dbms_lob.substr(col,1) from tab1;
DBMS_LOB.SUBSTR(COL,1)
----------------------
x
Note that the default amount (= length) of the substring is 32767 so using only DBMS_LOB.SUBSTR(COL) will return more than you expects.
CAST for CLOB does not cut the string to the casted length, but (as you observes) returns the exception ORA-25137: Data value out of range if the original string is longert that the casted length.
As documented for the CAST statement
CAST does not directly support any of the LOB data types. When you use CAST to convert a CLOB value into a character data type or a BLOB value into the RAW data type, the database implicitly converts the LOB value to character or raw data and then explicitly casts the resulting value into the target data type. If the resulting value is larger than the target type, then the database returns an error.
For the purpose of getting the content-derived key of a longer text, I do calculate HASHBYTES('SHA1', text). It returns 20 bytes long varbinary. As I know the length of the result, I am storing it as binary(20).
To make it shorter (to be used as a key), I would like to follow the Git idea of a short hash -- as if the first (or last) characters of the hexadecimal representation. Instead of characters, I would like to get binary(5) value from the binary(20).
When trying with the SQL Server 2016 it seems that the following simple way:
DECLARE #hash binary(20) = HASHBYTES('SHA1', N'příšerně žluťoučký kůň úpěl ďábelské ódy')
DECLARE #short binary(5) = #hash
SELECT #hash, #short
Returns the leading bytes (higher order bytes):
(No column name) (No column name)
0xE02C3C55FBA0DF13ADA1B626B1E31746D57B4602 0xE02C3C55FB
However, the documentation (https://learn.microsoft.com/en-us/sql/t-sql/data-types/binary-and-varbinary-transact-sql?view=sql-server-ver15) warns that:
Conversions between any data type and the binary data types are not guaranteed to be the same between versions of SQL Server.
Well, this is not exactly a conversion. Still, does this uncertainty hold also for getting shorter version of binary from the longer version of binary? What should I expect for future versions of SQL Server?
I have a problem with encoding of string constants in queries to NVARCHAR field in SQL Server v12.0.2. I need to use national characters (all in the same single code page e.g. cyrillic WIN1251) in queries without N prefix.
Is it possible?
Example:
1. CREATE TABLE TEST (VALUE NVARCHAR(100) COLLATE Cyrillic_General_CI_AS);
2. INSERT INTO TEST VALUES (N'привет мир');
3. INSERT INTO TEST VALUES ('привет мир');
4. SELECT * FROM TEST;
This will return two rows:
| привет мир |
| ?????? ??? |
So the first insert works correctly, I expect the second to do the same because TEST.VALUE column collated in Cyrillic_General_CI_AS. But it looks like national characters ignores field collation and use code page from somewhere else.
I realize that in this case I won't be able to use characters from more than one code page and languages that doesn't fit 1-byte encoding, but that is fine for me. Other option is to modify all queries to use N prefix before string constants, but it is not possible.
Without the N prefix, the string is converted to the default code page of the database, not the table you're inserting into (see MSDN for details)
So either you should change database collation to Cyrillic_General_CI_AS, or find all the string constants and insert N prefix.
I have a spreadsheet that gets all values loaded into SQL Server. One of the fields in the spreadsheet happens to be money. Now in order for everything to be displayed correcctly - i added a field in my tbl with Money as DataType.
When i read the value from spreadsheet I pretty much store it as a String, such as this... "94259.4". When it get's inserted in sql server it looks like this "94259.4000". Is there a way for me to basically get rid of the 0's in the sql server value when I grab it from DB - because the issue I'm running across is that - even though these two values are the same - because they are both compared as Strings - it thinks that there not the same values.
I'm foreseeing another issue when the value might look like this...94,259.40 I think what might work is limiting the numbers to 2 after the period. So as long as I select the value from Server with this format 94,259.40 - I thin I should be okay.
EDIT:
For Column = 1 To 34
Select Case Column
Case 1 'Field 1
If Not ([String].IsNullOrEmpty(CStr(excel.Cells(Row, Column).Value)) Or CStr(excel.Cells(Row, Column).Value) = "") Then
strField1 = CStr(excel.Cells(Row, Column).Value)
End If
Case 2 'Field 2
' and so on
I go through each field and store the value as a string. Then I compare it against the DB and see if there is a record that has the same values. The only field in my way is the Money field.
You can use the Format() to compare strings, or even Float For example:
Declare #YourTable table (value money)
Insert Into #YourTable values
(94259.4000),
(94259.4500),
(94259.0000)
Select Original = value
,AsFloat = cast(value as float)
,Formatted = format(value,'0.####')
From #YourTable
Returns
Original AsFloat Formatted
94259.40 94259.4 94259.4
94259.45 94259.45 94259.45
94259.00 94259 94259
I should note that Format() has some great functionality, but it is NOT known for its performance
The core issue is that string data is being used to represent numeric information, hence the problems comparing "123.400" to "123.4" and getting mismatches. They should mismatch. They're strings.
The solution is to store the data in the spreadsheet in its proper form - numeric, and then select a proper format for the database - which is NOT the "Money" datatype (insert shudders and visions of vultures circling overhead). Otherwise, you are going to have an expanding kluge of conversions between types as you go back and forth between two improperly designed solutions, and finding more and more edge cases that "don't quite work," and require more special cases...and so on.
CREATE TABLE [sql_table1] ([c0] varbinary(25) NOT NULL primary key)
go
insert into sql_table1 values (0x3200),(0x32);
go
I get
Cannot insert duplicate key in object 'dbo.sql_table'. The duplicate
key value is (0x32).
Why? 0x32 does not equal 0x3200
It gets right padded. BINARY data gets tricky when you try to specify what should normally be equivalent numerically hex values. If you try this it will work:
insert into #sql_table1 values (0x32),(CAST(50 as VARBINARY(25)));
-- inserts 0x00000032
-- and 0x32
But these are numerically equivalent. Generally speaking, it's a bad idea to have a BINARY column of any sort be a primary key or try to put a unique index on it (moreso than a CHAR/VARCHAR/NVARCHAR column) - any application that inserts into it is going to almost certainly be CASTing from some native format/representation to binary, but there's no guarantee that that CAST actually works in a unique manner in either direction - did the application insert the value 50 (= 0x32), or did it try to insert the literal 0x32, or did it try to insert the ASCII value of 2 (= 0x32), or did it insert the first byte(s) of something else....? If one app submits 0x32 and another 0x0032 are they the same or different (SQL Server says different - see below)?
The bottom line is that SQL Server is going to behave unusually if you try to compare binary columns flat out without some context. What will work is comparing binary data using BINARY_CHECKSUM
SELECT BINARY_CHECKSUM(0x32) -- 50
SELECT BINARY_CHECKSUM(0x320) -- 16! -- it's treating this as having a different number or ordering of bytes
SELECT BINARY_CHECKSUM(0x3200) -- 50
SELECT BINARY_CHECKSUM(0x32000) -- 16
SELECT BINARY_CHECKSUM(0x0032) -- 50
SELECT BINARY_CHECKSUM(0x00032) -- 50
SELECT BINARY_CHECKSUM(0x000000000032) -- 50
but again, this only helps you see that the hexidecimal represenation of the binary data isn't going to work exactly the way it would seem. The point is, your primary key is going to be based on the BINARY_CHECKSUMs of the data instead of any particular format/representation of the data. Normally that's a good thing, but with binary data (and padding) it becomes a lot trickier. Even then, in my example above the BINARY_CHECKSUM of both columns will be exactly the same (SELECT BINARY_CHECKSUM(c0) FROM sql_table1 will output 50 for both rows). Weird - a little further testing is showing that any different number of leading 0s that fit into the column length will bypass the unique check even though the checksum is the same (e.g. VALUES (0x32), (0x032), (0x0032) etc.).
This only gets worse if you start throwing different versions of SQL Server into the mix (per MSDN documentation).
What you should do for PK/Unique design on a table is figure out what context will make sense of this data - an order number, a file reference, a timestamp, a device ID, some other business or logical identifier, etc.... If nothing else, pseudokey it with an IDENTITY column.