Encoding (?) issues fetching binary data (image type column) from SQL Server via pyodbc/Python3 [duplicate] - sql-server

I'm executing this query
SELECT CMDB_ID FROM DB1.[dbo].[CDMID]
when I do this on SSMS 18 I get this:
I'm aware these are HEX values, although I'm not an expert on the topic.
I need to excute this exact query on python so I can process that information through a script, this script need as input the HEX values without any manipulation (as you see in the SSMS output).
So, through pyodbc library with a regular connection:
SQLserver_Connection("Driver={SQL Server Native Client 11.0};"
"Server=INSTANCE;"
"Database=DB1;"
"UID=USER;"
"PWD=PASS;")
I get this:
0 b'#\x12\x90\xb2\xbb\x92\xbbe\xa3\xf9:\xe2\x97#...
1 b'#"\xaf\x13\x18\xc9}\xc6\xb0\xd4\x87\xbf\x9e\...
2 b'#G\xc5rLh5\x1c\xb8h\xe0\xf0\xe4t\x08\xbb'
3 b'#\x9f\xe65\xf8tR\xda\x85S\xdcu\xd3\xf6*\xa2'
4 b'#\xa4\xcb^T\x06\xb2\xd0\x91S\x9e\xc0\xa7\xe543'
... ...
122 b'O\xa6\xe1\xd8\tA\xe9E\xa0\xf7\x96\x7f!"\xa3\...
123 b'O\xa9j,\x02\x89pF\xb9\xb4:G]y\xc4\xb6'
124 b'O\xab\xb6gy\xa2\x17\x1b\xadd\xc3\r\xa6\xee50'
125 b'O\xd7ogpWj\xee\xb0\xd8!y\xec\x08\xc7\xfa'
126 b"O\xf0u\x14\xcd\x8cT\x06\x9bm\xea\xddY\x08'\xef"
I have three questions:
How can this data be intepreted, why am I getting this?
Is there a way to manipulate this data back at the original HEX value? and if not...
What can I do to receive the original HEX value?
I've looking for a solution but haven't found anything yet, as you can see I'm not an expert on this kind of topics so if you are not able to provide a solution I will, also, really appreciate documents with some background knowledge that I need to get so I can provide a solution by myself.

I think your issue is simply due to the fact that SSMS and Python produce different hexadecimal representations of binary data. Your column is apparently a binary or varbinary column, and when you query it in SSMS you see its fairly standard hex representation of the binary values, e.g., 0x01476F726400.
When you retrieve the value using pyodbc you get a <class 'bytes'> object which is represented as b'hex_representation' with one twist: Instead of simply displaying b'\x01\x47\x6F\x72\x64\x00', Python will render any byte that corresponds to a printable ASCII character as that character, so we get b'\x01Gord\x00' instead.
That minor annoyance (IMO) aside, the good news it that you already have the correct bytes in a <class 'bytes'> object, ready to pass along to any Python function that expects to receive binary data.

Related

Convert DT_BYTES to Integer

I need to read a Microfocus CoBOL data file (on PC) containing COMP fields. FYI, a COMP stores an integer in binary format.
If I transfer the raw binary in SQL Server, I can convert it to a BigInt using
CONVERT(bigint,compField,1).
That way, CONVERT(bigint,0x0000002B17,1) will become 11031.
I also need to deal with negative values. In T-SQL it looks like this:
CONVERT(bigint,0xFFFFFFD4E9,1) - CONVERT(bigint,0xFFFFFFFFFF,1)-0x0000000001
will give -11031.
Is there a way to do this directly in the data flow? I'm sure the info is out there somewhere, but I'm too dumb to find it.
I'm working with SSIS 2019 btw.
Thank you!
Simon.

Converting binary data containing null (0x00) characters to ASCII in SQL Server

On SQL Server (2016+), I have data stored in a varbinary column, saved by some Java application, which contains a mixture of binary data and ASCII text. I want to search the column using a like operator or otherwise to look for certain ASCII strings, and then view the returned values as ASCII (so that I can read the surrounding text).
The data contains non-characters such as "00" (0x00), and these seem to stop SQL Server from converting the string as might otherwise be possible according to the answers at Hex to ASCII string conversion on the fly . In the example below, it can be seen that the byte "00" stops the parsing of the ASCII.
select convert(varchar(max),0x48454C4C004F205000455445,0) as v1 -- HELL
select convert(varchar(max),0x48454C4C4F205000455445,0) as v2 -- HELLO P
select convert(varchar(max),0x48454C4C4F2050455445,0) as v3 -- HELLO PETE
How can I have
select convert(varchar(max), 0x48454C4C004F205000455445, 0)
...return something like this?:
HELL?O P?ETE
(Or, less ideally, have an expression similar to
convert(varchar(max), 0x48454C4C004F205000455445, 0) like '%HE%ETE%'
...return the row?)
It works on the website https://www.rapidtables.com/convert/number/hex-to-ascii.html with 48454C4C004F205000455445 as input.
I'm not overly concerned about performance, but I want to stay within SQL Server, and ideally within the scope of T-SQL which can be copied and pasted easily.
I've tried using replace on "00", but this could causes problems with characters ending with 0, as in "5000" in the examples above. There may be bytes other than 0x00 which cause string conversion to stop as well.
To return the row (the more limited version of this question), a simple like operator on the value appears to work when run directly on the binary value, despite the intervening 0x00 values:
0x48454C4C004F205000455445 like 'HE%ETE%'
In other words, like can cope where convert can't.
To view the actual value, the best I've managed so far is this:
convert(varchar(max),convert(varbinary(max),
REPLACE(
convert(varchar(max), 0x48454C4C004F205000455445, 1)
,'00',''
)
,1),0)
This gives HELLO PETE, and works well enough on the actual data, getting to its end.
(It depends on the heuristic of not caring about converting e.g. 0x50 0x03 to 0x53 and similar, but I can live with that, as 0x0z, where z is 1 to f, represents control characters, which don't occur around the text I'm interested in).
(thanks to Panagiotis Kanavos for prodding me in a useful direction!)

How is build the format of geography data type in sql server?

I'm not being able to understand how is the data type geography in SQL server...
For example I have the following data:
0xE6100000010CCEAACFD556484340B2F336363BCA21C0
what I know:
0x is prefix for hexadecimal
last 16 numbers are longitude: B2F336363BCA21C0 (double of decimal format)
16 numbers before the last 16 are latitude: CEAACFD556484340 (double of decimal format)
4 first numbers are SRID: E610 (hexadecimal for WGS84)
what I don't understand:
numbers from 5 to 12 : 0000010C
what is this?
From what I read this seems linked to WKB(Well Known Binary) or EWKB(Extended Well Known Binary) anyway i was not abble to find a definition for EWKB...
And for WKB this is supposed to be geometry type (4-byte integer) but the value doesn't match with the Geometry types codes (this example is for one point coordinate)
Can you help to understand this format?
The spatial types (geometry and geography) in SQL Server are implemented as CLR data types. As with any such data types, you get a binary representation when you query the value directly. Unfortunately, it's not (as far as I know) WKB but rather whatever format Microsoft decided was best for their implementation. For us (the users), we should work with the published interface of methods that have been published by MS (for instance the geography method reference). Which is to say that you should only try to decipher the MS binary representation if you're curious (and not for actually working with it).
That said, if you need/want to work with WKB, you can! For example, you can use the STGeomFromWKB() static method to create a geography instance from WKB that you provide and STAsBinary() can be called on a geography instance to return WKB to you.
The Format spec can be found here:
https://msdn.microsoft.com/en-us/library/ee320529(v=sql.105).aspx
As that page shows, it used to change very frequently, but has slowed down significantly over the past 2 years
I am currently needing to dig into the spec to serialize from JVM code into a bcp file so that I can use SQLServerBulkCopy rather than plain JDBC to upload data into tables (it is about 7x faster to write a bcp file than using JDBC), but this is proving to be more complicated than what I originally anticipated.
After testing with bcp, you can upload geographies by specifying an off row format ( varchar(max) ) and store the well known text, SQL server will see this and assume you wanted a geography based on the WKT it sees.
In my case converting to nvarchar resolved the issue.

Access linked tables truncating my Decimal values from the SQL server

Since migrating the Access data to a SQL server I am having multiple problems with the decimal values. In my SQL tables on the SQL 2012 server I am using the Decimal data type for multiple fields. A while a go I first tried to set the decimal values to 18,2 but Access acted weird on this by truncating all the values (55,55 became 50 and so on).
So after multiple changes it seemed that Access accepted the 30,2 decimal setting in the SQL server (now the values were linked correct in the linked Access tables).
A few days ago I stumbled however back on this problem because a user had problems with editing a number in the access form. So I checked the linked table data type and there it seemed that Access converts the decimal 30,2 value to a Short Text data type, which is obviously wrong. So I did a bit of research and found out that Access cannot handle a 30,2 decimal, thus it is converted to text by the ODBC driver. (See my previously post: Access 2013 form field value gets cut off on changing the number before the point)
So to fix this latter error I tried, once again (forgetting that I already messed around with it) to change the decimal value to 17,2 / 18,2 and some other decimal values but on all these changes I am getting back to the truncating problem...
I found some posts about it but nothing concrete or answers on how to solve it.
Some additional information:
Using a SQL 2012 server
Using Access 2013
Got a SQL Server Native Client 10 and 11 installed.
Looking in the register key I found out that I am using ODBC driver version 02.50
The SQL native client 11 has/uses DriverODBC ver 03.80 and the native client 10 uses DriverODBC ver 10.00 (not sure this is relevant though).
UPDATE WITH IMAGES
In a access form I have multiple lines that have a linked table (sql table) as record source. These lines get populated with the data in the SQL server.
Below you can see a line with a specific example, the eenh. prijs is loaded from the linked (SQL) table.
Now when I change the 5 in front of the point (so making it 2555,00 instead of 5555,00) the value gets cut off:
======>>>
So I did research on it and understand that my SQL decimal 30,2 isn't accepted by Access. So I looked in my access linked table to see what kind of data type the field is:
So the specific column (CorStukPrijs) is in the SQL server a decimal 30,2 but here a short text (sorry for the dutch words).
The other numerics (which are OK) are just normal integers by the way.
In my linked table on access - datasheet view the values look like this:
I also added a decimal value of how it looks in my linked table:
In my SQL server the (same) data looks like this:
Though, because of the changing number problem before the point (back in the form - first images) I changed the decimal type of 30,2 in the server to 18,2.
This is the result in the linked table on that same 5555 value:
It gives #Errors and the error message:
Scaling of decimal values has resulted in truncated values
(translated it so wont be probably exactly like that in English)
The previous 0,71 value results with the decimal 18,2 in:
Hope its a bit clearer now!
P.S. I just changed one decimal field to 18,2 now.
Recently I found a solution for this problem! It all had to do with language settings after all.. (and the decimal 30,2 which is not accepted as a decimal in Access 2013).
I changed the Native client from 10 to 11 and in my connection string I added one vital value: regional=no. This fixed the problem!
So now my connection string is:
szSQLConnectionString = "DRIVER=SQL Server Native Client 11.0;SERVER=" & szSQLServer & ";DATABASE=" & szSQLDatabase & ";UID=" & szSQLUsername & ";PWD=" & szSQLPassword & ";regional=no;Application Name=OPS-FE;MARS_Connection=yes;"
A few things:
No real good reason to try a decimal value of 30 digits?
Access only supports 28 digits for a packed decimal column. So going to 30 will force Access to see that value as a string.
If you keep the total digits below 28, then you should be ok.
You also left out what driver you are using. (legacy, or native 10 or native 11). However, all 3 should have no trouble with decimal.
As a few noted here, after ANY change to the sql table, you have to refresh the linked table else such changes will not show up.
There is NO need to have some re-link code every time on startup. And it not clear how your re-link code works. If the re-link code makes a copy of the tabledef object, and then re-instates the same tabledef then changes to the back end may well not show up.
I would suggest during testing, you DO NOT use your re-link routines, but simply right click on the given linked table and choose the linked table manager. Then click on the one table, and ok to refresh.
Also, in Access during this testing, dump (remove) any formatting you have in the table settings for testing (the format setting).
I suggest you start over, and take the original tables and re-up-size them again.
Access should and can handle the decimal types with ease, but it not clear what your original settings were. If the values never require more than 4 significant digits beyond the decimal, then I would consider using currency, but decimal should also work.

What datatype should I bind as query parameter to use with NUMBER(15) column in Oracle ODBC?

I have just been bitten by issue described in SO question Binding int64 (SQL_BIGINT) as query parameter causes error during execution in Oracle 10g ODBC.
I'm porting a C/C++ application using ODBC 2 from SQL Server to Oracle. For numeric fields exceeding NUMBER(9) it uses __int64 datatype which is bound to queries as SQL_C_SBIGINT. Apparently such binding is not supported by Oracle ODBC. I must now do an application wide conversion to another method. Since I don't have much time---it's an unexpected issue---I would rather use proved solution, not trial and error.
What datatype should be used to bind as e.g. NUMBER(15) in Oracle? Is there documented recommended solution? What are you using? Any suggestions?
I'm especially interested in solutions that do not require any additional conversions. I can easily provide and consume numbers in form of __int64 or char* (normal non-exponential form without thousands separator or decimal point). Any other format requires additional conversion on my part.
What I have tried so far:
SQL_C_CHAR
Looks like it's going to work for me. I was worried about variability of number format. But in my use case it doesn't seem to matter. Apparently only fraction point character changes with system language settings.
And I don't see why I should use explicit cast (e.g. TO_NUMERIC) in SQL INSERT or UPDATE command. Everything works fine when I bind parameter with SQL_C_CHAR as C type and SQL_NUMERIC (with proper precision and scale) as SQL type. I couldn't reproduce any data corruption effect.
SQL_NUMERIC_STRUCT
I've noticed SQL_NUMERIC_STRUCT added with ODBC 3.0 and decided to give it a try. I am disappointed.
In my situation it is enough, as the application doesn't really use fractional numbers. But as a general solution... Simply, I don't get it. I mean, I finally understood how it is supposed to be used. What I don't get is: why anyone would introduce new struct of this kind and then make it work this way.
SQL_NUMERIC_STRUCT has all the needed fields to represent any NUMERIC (or NUMBER, or DECIMAL) value with it's precision and scale. Only they are not used.
When reading, ODBC sets precision of the number (based on precision of the column; except that Oracle returns bigger precision, e.g. 20 for NUMBER(15)). But if your column has fractional part (scale > 0) it is by default truncated. To read number with proper scale you need to set precision and scale yourself with SQLSetDescField call before fetching data.
When writing, Oracle thankfully respects scale contained in SQL_NUMERIC_STRUCT. But ODBC spec doesn't mandate it and MS SQL Server ignores this value. So, back to SQLSetDescField again.
See HOWTO: Retrieving Numeric Data with SQL_NUMERIC_STRUCT and INF: How to Use SQL_C_NUMERIC Data Type with Numeric Data for more information.
Why ODBC doesn't fully use its own SQL_NUMERIC_STRUCT? I don't know. It looks like it works but I think it's just too much work.
I guess I'll use SQL_C_CHAR.
My personal preference is to make the bind variables character strings (VARCHAR2), and let Oracle do the conversion from character to it's own internal storage format. It's easy enough (in C) to get data values represented as null terminated strings, in an acceptable format.
So, instead of writing SQL like this:
SET MY_NUMBER_COL = :b1
, MY_DATE_COL = :b2
I write the SQL like this:
SET MY_NUMBER_COL = TO_NUMBER( :b1 )
, MY_DATE_COL = TO_DATE( :b2 , 'YYYY-MM-DD HH24:MI:SS')
and supply character strings as the bind variables.
There are a couple of advantages to this approach.
One is that works around the issues and bugs one encounters with binding other data types.
Another advantage is that bind values are easier to decipher on an Oracle event 10046 trace.
Also, an EXPLAIN PLAN (I believe) expects all bind variables to be VARCHAR2, so that means the statement being explained is slightly different than the actual statement being executed (due to the implicit data conversions when the datatypes of the bind arguments in the actual statement are not VARCHAR2.)
And (less important) when I'm testing of the statement in TOAD, it's easier just to be able to type in strings in the input boxes, and not have to muck with changing the datatype in a dropdown list box.
I also let the buitin TO_NUMBER and TO_DATE functions validate the data. (In earlier versions of Oracle at least, I encountered issues with binding a DATE value directly, and it bypassed (at least some of) the validity checking, and allowed invalid date values to be stored in the database.
This is just a personal preference, based on past experience. I use this same approach with Perl DBD.
I wonder what Tom Kyte (asktom.oracle.com) has to say about this topic?

Resources