What datatype should I bind as query parameter to use with NUMBER(15) column in Oracle ODBC? - c

I have just been bitten by issue described in SO question Binding int64 (SQL_BIGINT) as query parameter causes error during execution in Oracle 10g ODBC.
I'm porting a C/C++ application using ODBC 2 from SQL Server to Oracle. For numeric fields exceeding NUMBER(9) it uses __int64 datatype which is bound to queries as SQL_C_SBIGINT. Apparently such binding is not supported by Oracle ODBC. I must now do an application wide conversion to another method. Since I don't have much time---it's an unexpected issue---I would rather use proved solution, not trial and error.
What datatype should be used to bind as e.g. NUMBER(15) in Oracle? Is there documented recommended solution? What are you using? Any suggestions?
I'm especially interested in solutions that do not require any additional conversions. I can easily provide and consume numbers in form of __int64 or char* (normal non-exponential form without thousands separator or decimal point). Any other format requires additional conversion on my part.
What I have tried so far:
SQL_C_CHAR
Looks like it's going to work for me. I was worried about variability of number format. But in my use case it doesn't seem to matter. Apparently only fraction point character changes with system language settings.
And I don't see why I should use explicit cast (e.g. TO_NUMERIC) in SQL INSERT or UPDATE command. Everything works fine when I bind parameter with SQL_C_CHAR as C type and SQL_NUMERIC (with proper precision and scale) as SQL type. I couldn't reproduce any data corruption effect.
SQL_NUMERIC_STRUCT
I've noticed SQL_NUMERIC_STRUCT added with ODBC 3.0 and decided to give it a try. I am disappointed.
In my situation it is enough, as the application doesn't really use fractional numbers. But as a general solution... Simply, I don't get it. I mean, I finally understood how it is supposed to be used. What I don't get is: why anyone would introduce new struct of this kind and then make it work this way.
SQL_NUMERIC_STRUCT has all the needed fields to represent any NUMERIC (or NUMBER, or DECIMAL) value with it's precision and scale. Only they are not used.
When reading, ODBC sets precision of the number (based on precision of the column; except that Oracle returns bigger precision, e.g. 20 for NUMBER(15)). But if your column has fractional part (scale > 0) it is by default truncated. To read number with proper scale you need to set precision and scale yourself with SQLSetDescField call before fetching data.
When writing, Oracle thankfully respects scale contained in SQL_NUMERIC_STRUCT. But ODBC spec doesn't mandate it and MS SQL Server ignores this value. So, back to SQLSetDescField again.
See HOWTO: Retrieving Numeric Data with SQL_NUMERIC_STRUCT and INF: How to Use SQL_C_NUMERIC Data Type with Numeric Data for more information.
Why ODBC doesn't fully use its own SQL_NUMERIC_STRUCT? I don't know. It looks like it works but I think it's just too much work.
I guess I'll use SQL_C_CHAR.

My personal preference is to make the bind variables character strings (VARCHAR2), and let Oracle do the conversion from character to it's own internal storage format. It's easy enough (in C) to get data values represented as null terminated strings, in an acceptable format.
So, instead of writing SQL like this:
SET MY_NUMBER_COL = :b1
, MY_DATE_COL = :b2
I write the SQL like this:
SET MY_NUMBER_COL = TO_NUMBER( :b1 )
, MY_DATE_COL = TO_DATE( :b2 , 'YYYY-MM-DD HH24:MI:SS')
and supply character strings as the bind variables.
There are a couple of advantages to this approach.
One is that works around the issues and bugs one encounters with binding other data types.
Another advantage is that bind values are easier to decipher on an Oracle event 10046 trace.
Also, an EXPLAIN PLAN (I believe) expects all bind variables to be VARCHAR2, so that means the statement being explained is slightly different than the actual statement being executed (due to the implicit data conversions when the datatypes of the bind arguments in the actual statement are not VARCHAR2.)
And (less important) when I'm testing of the statement in TOAD, it's easier just to be able to type in strings in the input boxes, and not have to muck with changing the datatype in a dropdown list box.
I also let the buitin TO_NUMBER and TO_DATE functions validate the data. (In earlier versions of Oracle at least, I encountered issues with binding a DATE value directly, and it bypassed (at least some of) the validity checking, and allowed invalid date values to be stored in the database.
This is just a personal preference, based on past experience. I use this same approach with Perl DBD.
I wonder what Tom Kyte (asktom.oracle.com) has to say about this topic?

Related

ORA-22835: Buffer too small and ORA-25137: Data value out of range

We are using a software that has limited Oracle capabilities. I need to filter through a CLOB field by making sure it has a specific value. Normally, outside of this software I would do something like:
DBMS_LOB.SUBSTR(t.new_value) = 'Y'
However, this isn't supported so I'm attempting to use CAST instead. I've tried many different attempts but so far these are what I found:
The software has a built-in query checker/validator and these are the ones it shows as invalid:
DBMS_LOB.SUBSTR(t.new_value)
CAST(t.new_value AS VARCHAR2(10))
CAST(t.new_value AS NVARCHAR2(10))
However, the validator does accept these:
CAST(t.new_value AS VARCHAR(10))
CAST(t.new_value AS NVARCHAR(10))
CAST(t.new_value AS CHAR(10))
Unfortunately, even though the validator lets these ones go through, when running the query to fetch data, I get ORA-22835: Buffer too small when using VARCHAR or NVARCHAR. And I get ORA-25137: Data value out of range when using CHAR.
Are there other ways I could try to check that my CLOB field has a specific value when filtering the data? If not, how do I fix my current issues?
The error you're getting indicates that Oracle is trying to apply the CAST(t.new_value AS VARCHAR(10)) to a row where new_value has more than 10 characters. That makes sense given your description that new_value is a generic audit field that has values from a large number of different tables with a variety of data lengths. Given that, you'd need to structure the query in a way that forces the optimizer to reduce the set of rows you're applying the cast to down to just those where new_value has just a single character before applying the cast.
Not knowing what sort of scope the software you're using provides for structuring your code, I'm not sure what options you have there. Be aware that depending on how robust you need this, the optimizer has quite a bit of flexibility to choose to apply predicates and functions on the projection in an arbitrary order. So even if you find an approach that works once, it may stop working in the future when statistics change or the database is upgraded and Oracle decides to choose a different plan.
Using this as sample data
create table tab1(col clob);
insert into tab1(col) values (rpad('x',3000,'y'));
You need to use dbms_lob.substr(col,1) to get the first character (from the default offset= 1)
select dbms_lob.substr(col,1) from tab1;
DBMS_LOB.SUBSTR(COL,1)
----------------------
x
Note that the default amount (= length) of the substring is 32767 so using only DBMS_LOB.SUBSTR(COL) will return more than you expects.
CAST for CLOB does not cut the string to the casted length, but (as you observes) returns the exception ORA-25137: Data value out of range if the original string is longert that the casted length.
As documented for the CAST statement
CAST does not directly support any of the LOB data types. When you use CAST to convert a CLOB value into a character data type or a BLOB value into the RAW data type, the database implicitly converts the LOB value to character or raw data and then explicitly casts the resulting value into the target data type. If the resulting value is larger than the target type, then the database returns an error.

Pandas read_sql changing large number IDs when reading

I transferred an Oracle database to SQL Server and all seems to have went well. The various ID columns are large numbers so I had to use Decimal as they were too large for BigInt.
I am now trying to read the data using pandas.read_sql using pyodbc connection with ODBC Driver 17 for SQL Server. df = pandas.read_sql("SELECT * FROM table1"),con)
The numbers are coming out as float64 and when I try to print them our use them in SQL statements they come out in scientific notation and when I try to use '{:.0f}'.format(df.loc[i,'Id']) It turns several numbers into the same number such as 90300111000003078520832. It is like precision is lost when it goes to scientific notation.
I also tried pd.options.display.float_format = '{:.0f}'.format before the read_sql but this did not help.
Clearly I must be doing something wrong as the Ids in the database are correct.
Any help is appreciated Thanks
pandas' read_sql method has an option named coerce_float which defaults to True and it …
Attempts to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets.
However, in your case it is not useful, so simply specify coerce_float=False.
I've had this problem too, especially working with long ids: read_sql works fine for the primary key, but not for other columns (like the retweeted_status_id from Twitter API calls). Setting coerce_float to false does nothing for me, so instead I cast retweeted_status_id to a character format in my sql query.
Using psql, I do:
df = pandas.read_sql("SELECT *, Id::text FROM table1"),con)
But in SQL server it'd be something like
df = pandas.read_sql("SELECT *, CONVERT(text, Id) FROM table1"),con)
or
df = pandas.read_sql("SELECT *, CAST(Id AS varchar) FROM table1"),con)
Obviously there's a cost here if you're asking to cast many rows, and a more efficient option might be to pull from SQL server without using pandas (as a nested list or JSON or something else) which will also preserve your long integer formats.

How is build the format of geography data type in sql server?

I'm not being able to understand how is the data type geography in SQL server...
For example I have the following data:
0xE6100000010CCEAACFD556484340B2F336363BCA21C0
what I know:
0x is prefix for hexadecimal
last 16 numbers are longitude: B2F336363BCA21C0 (double of decimal format)
16 numbers before the last 16 are latitude: CEAACFD556484340 (double of decimal format)
4 first numbers are SRID: E610 (hexadecimal for WGS84)
what I don't understand:
numbers from 5 to 12 : 0000010C
what is this?
From what I read this seems linked to WKB(Well Known Binary) or EWKB(Extended Well Known Binary) anyway i was not abble to find a definition for EWKB...
And for WKB this is supposed to be geometry type (4-byte integer) but the value doesn't match with the Geometry types codes (this example is for one point coordinate)
Can you help to understand this format?
The spatial types (geometry and geography) in SQL Server are implemented as CLR data types. As with any such data types, you get a binary representation when you query the value directly. Unfortunately, it's not (as far as I know) WKB but rather whatever format Microsoft decided was best for their implementation. For us (the users), we should work with the published interface of methods that have been published by MS (for instance the geography method reference). Which is to say that you should only try to decipher the MS binary representation if you're curious (and not for actually working with it).
That said, if you need/want to work with WKB, you can! For example, you can use the STGeomFromWKB() static method to create a geography instance from WKB that you provide and STAsBinary() can be called on a geography instance to return WKB to you.
The Format spec can be found here:
https://msdn.microsoft.com/en-us/library/ee320529(v=sql.105).aspx
As that page shows, it used to change very frequently, but has slowed down significantly over the past 2 years
I am currently needing to dig into the spec to serialize from JVM code into a bcp file so that I can use SQLServerBulkCopy rather than plain JDBC to upload data into tables (it is about 7x faster to write a bcp file than using JDBC), but this is proving to be more complicated than what I originally anticipated.
After testing with bcp, you can upload geographies by specifying an off row format ( varchar(max) ) and store the well known text, SQL server will see this and assume you wanted a geography based on the WKT it sees.
In my case converting to nvarchar resolved the issue.

How to read numeric/decimal/money columns from SQL Server?

I try to read numeric/decimal/money columns from SQL Server via ODBC in the following manner:
SQL_NUMERIC_STRUCT decimal;
SQLGetData(hSqlStmt, iCol, SQL_C_NUMERIC, &decimal, sizeof(decimal), &indicator);
All these types are returned as SQL_NUMERIC_STRUCT structure, and I specify the SQL_C_NUMERIC type to SQLGetData() API.
In the database, column is defined as decimal(18, 4) for example, or money. But the problem is that the returned data has decimal.precision always set to 38 (max possible value) and decimal.scale always set to zero. So if the actual stored number is 12.65, the returned value in SQL_NUMERIC_STRUCT structure is equal to 12. So the fractional part is simply discarded, for a very strange reason.
What can I be doing wrong?
OK, this article explains the problem. The solution is so cumbersome, that I decided to avoid using SQL_NUMERIC_STRUCT altogether, influenced by this post. Now I specify SQL_C_WCHAR (SQL_C_CHAR would do as well) and read the numeric/decimal/money columns as text strings directly. Looks like the driver does the conversion.

Is this sufficient to prevent query injection while using SQL Server?

I have recently taken on a project in which I need to integrate with PHP/SQL Server. I am looking for the quickest and easiest function to prevent SQL injection on SQL Server as I prefer MySQL and do not anticipate many more SQL Server related projects.
Is this function sufficient?
$someVal = mssql_escape($_POST['someVal']);
$query = "INSERT INTO tblName SET field = $someVal";
mssql_execute($query);
function mssql_escape($str) {
return str_replace("'", "''", $str);
}
If not, what additional steps should I take?
EDIT:
I am running on a Linux server - sqlsrv_query() only works if your hosting environment is windows
The best option: do not use SQL statements that get concatenated together - use parametrized queries.
E.g. do not create something like
string stmt = "INSERT INTO dbo.MyTable(field1,field2) VALUES(" + value1 + ", " + value2 + ")"
or something like that and then try to "sanitize" it by replacing single quotes or something - you'll never catch everything, someone will always find a way around your "safe guarding".
Instead, use:
string stmt = "INSERT INTO dbo.MyTable(field1,field2) VALUES(#value1, #value2)";
and then set the parameter values before executing this INSERT statement. This is really the only reliable way to avoid SQL injection - use it!
UPDATE: how to use parametrized queries from PHP - I found something here - does that help at all?
$tsql = "INSERT INTO DateTimeTable (myDate, myTime,
myDateTimeOffset, myDatetime2)
VALUES (?, ?, ?, ?)";
$params = array(
date("Y-m-d"), // Current date in Y-m-d format.
"15:30:41.987", // Time as a string.
date("c"), // Current date in ISO 8601 format.
date("Y-m-d H:i:s.u") // Current date and time.
);
$stmt = sqlsrv_query($conn, $tsql, $params);
So it seems you can't use "named" parameters like #value1, #value2, but instead you just use question marks ? for each parameter, and you basically just create a parameter array which you then pass into the query.
This article Accessing SQL Server Databases with PHP might also help - it has a similar sample of how to insert data using the parametrized queries.
UPDATE: after you've revealed that you're on Linux, this approach doesn't work anymore. Instead, you need to use an alternate library in PHP to call a database - something like PDO.
PDO should work both on any *nix type operating system, and against all sorts of databases, including SQL Server, and it supports parametrized queries, too:
$db = new PDO('your-connection-string-here');
$stmt = $db->prepare("SELECT priv FROM testUsers WHERE username=:username AND password=:password");
$stmt->bindParam(':username', $user);
$stmt->bindParam(':password', $pass);
$stmt->execute();
No, it's not sufficient. To my knowledge, string replacement can never really be sufficient in general (on any platform).
To prevent SQL injection, all queries need to be parameterized - either as parameterized queries or as stored procedures with parameters.
In these cases, the database calling library (i.e. ADO.NET and SQL Command) sends the parameters separately from the query and the server applies them, which eliminates the ability for the actual SQL to be altered in any way. This has numerous benefits besides injection, which include code page issues and date conversion issues - for that matter any conversions to string can be problematic if the server does not expect them done the way the client does them.
I partially disagree with other posters. If you run all your parameters through a function that double the quotes, this should prevent any possible injection attack. Actually in practice the more frequent problem is not deliberate sabotague but queries that break because a value legitimately includes a single quote, like a customer named "O'Hara" or a comment field of "Don't call Sally before 9:00". Anyway, I do escapes like this all the time and have never had a problem.
One caveat: On some database engines, there could be other dangerous characters besides a single quote. The only example I know is Postgres, where the backslash is magic. In this case your escape function must also double backslashes. Check the documentation.
I have nothing against using prepared statements, and for simple cases, where the only thing that changes is the value of the parameter, they are an excellent solution. But I routinely find that I have to build queries in pieces based on conditions in the program, like if parameter X is not null then not only do I need to add it to the where clause but I also need an additional join to get to the value I really need to test. Prepared statements can't handle this. You could, of course, build the SQL in pieces, turn it into a prepared statement, and then supply the parameters. But this is just a pain for no clear gain.
These days I mostly code in Java that allows functions to be overloaded, that is, have multiple implementations depending on the type of the passed in parameter. So I routine write a set of functions that I normally name simply "q" for "quote", that return the given type, suitably quoted. For strings, it doubles any quote marks, then slaps quote marks around the whole thing. For integers it just returns the string representation of the integer. For dates it converts to the JDBC (Java SQL) standard date format, which the driver is then supposed to convert to whatever is needed for the specific database being used. Etc. (On my current project I even included array as a passed in type, which I convert to a format suitable for use in an IN clause.) Then every time I want to include a field in a SQL statement, I just write "q(x)". As this is slapping quotes on when necessary, I don't need the extra string manipulation to put on quotes, so it's probably just as easy as not doing the escape.
For example, vulnerable way:
String myquery="select name from customer where customercode='"+custcode+"'";
Safe way:
String myquery="select name from customer where customercode="+q(custcode);
The right way is not particularly more to type than the wrong way, so it's easy to get in a good habit.
String replacement to escape quotes IS sufficient to prevent SQL injection attack vectors.
This only applies to SQL Server when QUOTED_IDENTIFIER is ON, and when you don't do something stoopid to your escaped string, such as truncating it or translating your Unicode string to an 8-bit string after escaping. In particular, you need to make sure QUOTED_IDENTIFIER is set to ON. Usually that's the default, but it may depend on the library you are using in PHP to access MSSQL.
Parameterization is a best practice, but there is nothing inherently insecure about escaping quotes to prevent SQL injection, with due care.
The rel issue with escaping strings is not the efficacy of the replacement, it is the potential for forgetting to do the replacement every time.
That said, your code escapes the value, but does not wrap the value in quotes. You need something like this instead:
function mssql_escape($str) {
return "N'" + str_replace("'", "''", $str) + "'";
}
The N above allows you to pass higher Unicode characters. If that's not a concern (i.e., your text fields are varchar rather than nvarchar), you can remove the N.
Now, if you do this, there are some caveats:
You need to make DAMNED SURE you call mssql_escape for every string value. And therein lies the rub.
Dates and GUID values also need escaping in the same manner.
You should validate numeric values, or at least escape them as well using the same function (MSSQL will cast the string to the appropriate numeric type).
Again, like others have said, parameterized queries are safer--not because escaping quotes doesn't work (it does except as noted above), but because it's easier to visually make sure you didn't forget to escape something.

Resources