Convert DT_BYTES to Integer - sql-server

I need to read a Microfocus CoBOL data file (on PC) containing COMP fields. FYI, a COMP stores an integer in binary format.
If I transfer the raw binary in SQL Server, I can convert it to a BigInt using
CONVERT(bigint,compField,1).
That way, CONVERT(bigint,0x0000002B17,1) will become 11031.
I also need to deal with negative values. In T-SQL it looks like this:
CONVERT(bigint,0xFFFFFFD4E9,1) - CONVERT(bigint,0xFFFFFFFFFF,1)-0x0000000001
will give -11031.
Is there a way to do this directly in the data flow? I'm sure the info is out there somewhere, but I'm too dumb to find it.
I'm working with SSIS 2019 btw.
Thank you!
Simon.

Related

Pandas read_sql changing large number IDs when reading

I transferred an Oracle database to SQL Server and all seems to have went well. The various ID columns are large numbers so I had to use Decimal as they were too large for BigInt.
I am now trying to read the data using pandas.read_sql using pyodbc connection with ODBC Driver 17 for SQL Server. df = pandas.read_sql("SELECT * FROM table1"),con)
The numbers are coming out as float64 and when I try to print them our use them in SQL statements they come out in scientific notation and when I try to use '{:.0f}'.format(df.loc[i,'Id']) It turns several numbers into the same number such as 90300111000003078520832. It is like precision is lost when it goes to scientific notation.
I also tried pd.options.display.float_format = '{:.0f}'.format before the read_sql but this did not help.
Clearly I must be doing something wrong as the Ids in the database are correct.
Any help is appreciated Thanks
pandas' read_sql method has an option named coerce_float which defaults to True and it …
Attempts to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets.
However, in your case it is not useful, so simply specify coerce_float=False.
I've had this problem too, especially working with long ids: read_sql works fine for the primary key, but not for other columns (like the retweeted_status_id from Twitter API calls). Setting coerce_float to false does nothing for me, so instead I cast retweeted_status_id to a character format in my sql query.
Using psql, I do:
df = pandas.read_sql("SELECT *, Id::text FROM table1"),con)
But in SQL server it'd be something like
df = pandas.read_sql("SELECT *, CONVERT(text, Id) FROM table1"),con)
or
df = pandas.read_sql("SELECT *, CAST(Id AS varchar) FROM table1"),con)
Obviously there's a cost here if you're asking to cast many rows, and a more efficient option might be to pull from SQL server without using pandas (as a nested list or JSON or something else) which will also preserve your long integer formats.

Encoding (?) issues fetching binary data (image type column) from SQL Server via pyodbc/Python3 [duplicate]

I'm executing this query
SELECT CMDB_ID FROM DB1.[dbo].[CDMID]
when I do this on SSMS 18 I get this:
I'm aware these are HEX values, although I'm not an expert on the topic.
I need to excute this exact query on python so I can process that information through a script, this script need as input the HEX values without any manipulation (as you see in the SSMS output).
So, through pyodbc library with a regular connection:
SQLserver_Connection("Driver={SQL Server Native Client 11.0};"
"Server=INSTANCE;"
"Database=DB1;"
"UID=USER;"
"PWD=PASS;")
I get this:
0 b'#\x12\x90\xb2\xbb\x92\xbbe\xa3\xf9:\xe2\x97#...
1 b'#"\xaf\x13\x18\xc9}\xc6\xb0\xd4\x87\xbf\x9e\...
2 b'#G\xc5rLh5\x1c\xb8h\xe0\xf0\xe4t\x08\xbb'
3 b'#\x9f\xe65\xf8tR\xda\x85S\xdcu\xd3\xf6*\xa2'
4 b'#\xa4\xcb^T\x06\xb2\xd0\x91S\x9e\xc0\xa7\xe543'
... ...
122 b'O\xa6\xe1\xd8\tA\xe9E\xa0\xf7\x96\x7f!"\xa3\...
123 b'O\xa9j,\x02\x89pF\xb9\xb4:G]y\xc4\xb6'
124 b'O\xab\xb6gy\xa2\x17\x1b\xadd\xc3\r\xa6\xee50'
125 b'O\xd7ogpWj\xee\xb0\xd8!y\xec\x08\xc7\xfa'
126 b"O\xf0u\x14\xcd\x8cT\x06\x9bm\xea\xddY\x08'\xef"
I have three questions:
How can this data be intepreted, why am I getting this?
Is there a way to manipulate this data back at the original HEX value? and if not...
What can I do to receive the original HEX value?
I've looking for a solution but haven't found anything yet, as you can see I'm not an expert on this kind of topics so if you are not able to provide a solution I will, also, really appreciate documents with some background knowledge that I need to get so I can provide a solution by myself.
I think your issue is simply due to the fact that SSMS and Python produce different hexadecimal representations of binary data. Your column is apparently a binary or varbinary column, and when you query it in SSMS you see its fairly standard hex representation of the binary values, e.g., 0x01476F726400.
When you retrieve the value using pyodbc you get a <class 'bytes'> object which is represented as b'hex_representation' with one twist: Instead of simply displaying b'\x01\x47\x6F\x72\x64\x00', Python will render any byte that corresponds to a printable ASCII character as that character, so we get b'\x01Gord\x00' instead.
That minor annoyance (IMO) aside, the good news it that you already have the correct bytes in a <class 'bytes'> object, ready to pass along to any Python function that expects to receive binary data.

How is build the format of geography data type in sql server?

I'm not being able to understand how is the data type geography in SQL server...
For example I have the following data:
0xE6100000010CCEAACFD556484340B2F336363BCA21C0
what I know:
0x is prefix for hexadecimal
last 16 numbers are longitude: B2F336363BCA21C0 (double of decimal format)
16 numbers before the last 16 are latitude: CEAACFD556484340 (double of decimal format)
4 first numbers are SRID: E610 (hexadecimal for WGS84)
what I don't understand:
numbers from 5 to 12 : 0000010C
what is this?
From what I read this seems linked to WKB(Well Known Binary) or EWKB(Extended Well Known Binary) anyway i was not abble to find a definition for EWKB...
And for WKB this is supposed to be geometry type (4-byte integer) but the value doesn't match with the Geometry types codes (this example is for one point coordinate)
Can you help to understand this format?
The spatial types (geometry and geography) in SQL Server are implemented as CLR data types. As with any such data types, you get a binary representation when you query the value directly. Unfortunately, it's not (as far as I know) WKB but rather whatever format Microsoft decided was best for their implementation. For us (the users), we should work with the published interface of methods that have been published by MS (for instance the geography method reference). Which is to say that you should only try to decipher the MS binary representation if you're curious (and not for actually working with it).
That said, if you need/want to work with WKB, you can! For example, you can use the STGeomFromWKB() static method to create a geography instance from WKB that you provide and STAsBinary() can be called on a geography instance to return WKB to you.
The Format spec can be found here:
https://msdn.microsoft.com/en-us/library/ee320529(v=sql.105).aspx
As that page shows, it used to change very frequently, but has slowed down significantly over the past 2 years
I am currently needing to dig into the spec to serialize from JVM code into a bcp file so that I can use SQLServerBulkCopy rather than plain JDBC to upload data into tables (it is about 7x faster to write a bcp file than using JDBC), but this is proving to be more complicated than what I originally anticipated.
After testing with bcp, you can upload geographies by specifying an off row format ( varchar(max) ) and store the well known text, SQL server will see this and assume you wanted a geography based on the WKT it sees.
In my case converting to nvarchar resolved the issue.

How to read Arabic characters from varchar datatype?

I have an old system that uses varchar datatype in its database to store Arabic names, now the names appear in the database like this:
"ãíÓÇÁ ÇáãÈíÖíä"
Now I am building a new system using VB.NET, how can I read these names to appear in Arabic characters?
Also I need to point out here that the old system even it stores the data as I mentioned earlier it converts the characters in a correct format.
How to display it properly in the new system and in the SQL Server Management Studio?
have you tried nvarchar? you may find some usefull information at the link below
When must we use NVARCHAR/NCHAR instead of VARCHAR/CHAR in SQL Server?
I faced the same Problem, and I solved it by two steps:
1.change the datatype of the column in DB into nvarchar
2.use the encoding to change the data into Arabic
I used the following function
private string GetDataWithArabic(string srcData)
{
Encoding iso = Encoding.GetEncoding("iso-8859-1");
Encoding unicode = Encoding.Default;
byte[] unicodeBytes = iso.GetBytes(srcData);
return unicode.GetString(unicodeBytes);
}
but make sure you use this method once on DB data, because it will corrupt the data if used twice
I think your answer is here: "storing and retrieving non english characters" http://aalamrangi.wordpress.com/2012/05/13/storing-and-retrieving-non-english-unicode-characters-hindi-czech-arabic-etc-in-sql-server/

What datatype should I bind as query parameter to use with NUMBER(15) column in Oracle ODBC?

I have just been bitten by issue described in SO question Binding int64 (SQL_BIGINT) as query parameter causes error during execution in Oracle 10g ODBC.
I'm porting a C/C++ application using ODBC 2 from SQL Server to Oracle. For numeric fields exceeding NUMBER(9) it uses __int64 datatype which is bound to queries as SQL_C_SBIGINT. Apparently such binding is not supported by Oracle ODBC. I must now do an application wide conversion to another method. Since I don't have much time---it's an unexpected issue---I would rather use proved solution, not trial and error.
What datatype should be used to bind as e.g. NUMBER(15) in Oracle? Is there documented recommended solution? What are you using? Any suggestions?
I'm especially interested in solutions that do not require any additional conversions. I can easily provide and consume numbers in form of __int64 or char* (normal non-exponential form without thousands separator or decimal point). Any other format requires additional conversion on my part.
What I have tried so far:
SQL_C_CHAR
Looks like it's going to work for me. I was worried about variability of number format. But in my use case it doesn't seem to matter. Apparently only fraction point character changes with system language settings.
And I don't see why I should use explicit cast (e.g. TO_NUMERIC) in SQL INSERT or UPDATE command. Everything works fine when I bind parameter with SQL_C_CHAR as C type and SQL_NUMERIC (with proper precision and scale) as SQL type. I couldn't reproduce any data corruption effect.
SQL_NUMERIC_STRUCT
I've noticed SQL_NUMERIC_STRUCT added with ODBC 3.0 and decided to give it a try. I am disappointed.
In my situation it is enough, as the application doesn't really use fractional numbers. But as a general solution... Simply, I don't get it. I mean, I finally understood how it is supposed to be used. What I don't get is: why anyone would introduce new struct of this kind and then make it work this way.
SQL_NUMERIC_STRUCT has all the needed fields to represent any NUMERIC (or NUMBER, or DECIMAL) value with it's precision and scale. Only they are not used.
When reading, ODBC sets precision of the number (based on precision of the column; except that Oracle returns bigger precision, e.g. 20 for NUMBER(15)). But if your column has fractional part (scale > 0) it is by default truncated. To read number with proper scale you need to set precision and scale yourself with SQLSetDescField call before fetching data.
When writing, Oracle thankfully respects scale contained in SQL_NUMERIC_STRUCT. But ODBC spec doesn't mandate it and MS SQL Server ignores this value. So, back to SQLSetDescField again.
See HOWTO: Retrieving Numeric Data with SQL_NUMERIC_STRUCT and INF: How to Use SQL_C_NUMERIC Data Type with Numeric Data for more information.
Why ODBC doesn't fully use its own SQL_NUMERIC_STRUCT? I don't know. It looks like it works but I think it's just too much work.
I guess I'll use SQL_C_CHAR.
My personal preference is to make the bind variables character strings (VARCHAR2), and let Oracle do the conversion from character to it's own internal storage format. It's easy enough (in C) to get data values represented as null terminated strings, in an acceptable format.
So, instead of writing SQL like this:
SET MY_NUMBER_COL = :b1
, MY_DATE_COL = :b2
I write the SQL like this:
SET MY_NUMBER_COL = TO_NUMBER( :b1 )
, MY_DATE_COL = TO_DATE( :b2 , 'YYYY-MM-DD HH24:MI:SS')
and supply character strings as the bind variables.
There are a couple of advantages to this approach.
One is that works around the issues and bugs one encounters with binding other data types.
Another advantage is that bind values are easier to decipher on an Oracle event 10046 trace.
Also, an EXPLAIN PLAN (I believe) expects all bind variables to be VARCHAR2, so that means the statement being explained is slightly different than the actual statement being executed (due to the implicit data conversions when the datatypes of the bind arguments in the actual statement are not VARCHAR2.)
And (less important) when I'm testing of the statement in TOAD, it's easier just to be able to type in strings in the input boxes, and not have to muck with changing the datatype in a dropdown list box.
I also let the buitin TO_NUMBER and TO_DATE functions validate the data. (In earlier versions of Oracle at least, I encountered issues with binding a DATE value directly, and it bypassed (at least some of) the validity checking, and allowed invalid date values to be stored in the database.
This is just a personal preference, based on past experience. I use this same approach with Perl DBD.
I wonder what Tom Kyte (asktom.oracle.com) has to say about this topic?

Resources