How to escape special characters when retrieving data from database? - sql-server

I am going to generate XML file based on the data returned from SQL Server, but there are some special characters like  and  (there may be other characters like these), which will fail the XML.
Is there any way to escape them?
Thanks!

The control characters U+001C (file separator) and U+001F (unit separator) are not legal to include in an XML 1.0 document, whether verbatim or encoded using a &#...; numeric character reference.
They are allowed in XML 1.1 documents only when included as a character reference. However, XML 1.1 is not nearly as widely accepted as 1.0, and you can't have U+0000 (null) even as a character reference, so it's still not possible to put arbitrary binary data in an XML file — not that it was ever a good idea.
If you want to include data bytes in an XML file you should generally be using an ad hoc encoding of your own that is accepted by all consumers of your particular type of document. It is common to use base64 for the purpose of putting binary data into XML. For formats that do not accommodate any such special encoding scheme, you simply cannot insert these control characters.
What is the purpose of the control characters?

The exact same way you're escaping any other user-supplied input prior to insertion into a database; probably one of (from worst to best):
Escaping control characters prior to construction of an SQL statement
Use of parameterised queries
Use of a DAO or ORM which abstracts this problem away from you

Use parametrized queries and you won't have to worry about escaping. Can't really give you more help as to how to use them unless you mention which language you're using.

Well, I just use the pattern matching stuff to replace those special characters manually. Match for '&#.+?;'

Related

SAP FM EPS2_GET_DIRECTORY_LISTING file mask

The FM EPS2_GET_DIRECTORY_LISTING has a parameter file_mask which I guess that it should act as a pattern. I need to read from the AS the files containing a word but the file_mask is working faultly. For example if I pass "*ZIP" it returns a file named '.TXT'. Is there a proper way to use that parameter?
The parameters are described in SAP note 1860206 which I will not quote here because I'm not sure about the copyright status. However, wildcards generally do not work as expected in this case - your best bet is to read without the parameter and filter the table afterwards.
I had similar problem but due to poor(eg. * wildcard can be used only at the and of the file mask string :/ ) implementation of standard FILE_MASK-based filtering feature in EPS2_GET_DIRECTORY_LISTING I ended up with the solution where I read entire directory content and then process it with regular expressions to find matching files/directories.

PostgreSQL: unable to save special character (regional language) in blob

I am using PostgreSQL 9.0 and am trying to store a bytea file which contains certain special characters (regional language characters - UTF8 encoded). But I am not able to store the data as input by the user.
For example :
what I get in request while debugging:
<sp_first_name_gu name="sp_first_name_gu" value="ઍયેઍ"></sp_first_name_gu><sp_first_name name="sp_first_name" value="aaa"></sp_first_name>
This is what is stored in DB:
<sp_first_name_gu name="sp_first_name_gu" value="\340\252\215\340\252\257\340\253\207\340\252\215"></sp_first_name_gu><sp_first_name name="sp_first_name" value="aaa"></sp_first_name>
Note the difference in value tag. With this issue I am not able to retrieve the proper text input by the user.
Please suggest what do I need to do?
PS: My DB is UTF8 encoded.
The value is stored correctly, but is escaped into octal escape sequences upon retrieval.
To fix that - change the settings of the DB driver or chose different different encoding/escaping for bytea.
Or just use proper field types for the XML data - like varchar or XML.
Your string \340\252\215\340\252\257\340\253\207\340\252\215 is exactly ઍયેઍ in octal encoding, so postgres stores your data correctly. PostgreSQL escapes all non printable characters, for more details see postgresql documentation, especially section 8.4.2

DocViewProperty XML Serialization

DocViewProperty#format can be used to serialize a Java string to the JCR docview XML format. However, the output of that method still appears to respect carriage-return/line-feed rather than using a hex escaped control char. This means that even after I use DocViewProperty#format, I won't necessarily get the same String as what may show up in a .content.xml file in a CQ5 package -- where that whitespace is escaped. What in the serialization mechanism performs that work and how can I use that rather than rolling up a close approximation of it on my own?

Is this sufficient to prevent query injection while using SQL Server?

I have recently taken on a project in which I need to integrate with PHP/SQL Server. I am looking for the quickest and easiest function to prevent SQL injection on SQL Server as I prefer MySQL and do not anticipate many more SQL Server related projects.
Is this function sufficient?
$someVal = mssql_escape($_POST['someVal']);
$query = "INSERT INTO tblName SET field = $someVal";
mssql_execute($query);
function mssql_escape($str) {
return str_replace("'", "''", $str);
}
If not, what additional steps should I take?
EDIT:
I am running on a Linux server - sqlsrv_query() only works if your hosting environment is windows
The best option: do not use SQL statements that get concatenated together - use parametrized queries.
E.g. do not create something like
string stmt = "INSERT INTO dbo.MyTable(field1,field2) VALUES(" + value1 + ", " + value2 + ")"
or something like that and then try to "sanitize" it by replacing single quotes or something - you'll never catch everything, someone will always find a way around your "safe guarding".
Instead, use:
string stmt = "INSERT INTO dbo.MyTable(field1,field2) VALUES(#value1, #value2)";
and then set the parameter values before executing this INSERT statement. This is really the only reliable way to avoid SQL injection - use it!
UPDATE: how to use parametrized queries from PHP - I found something here - does that help at all?
$tsql = "INSERT INTO DateTimeTable (myDate, myTime,
myDateTimeOffset, myDatetime2)
VALUES (?, ?, ?, ?)";
$params = array(
date("Y-m-d"), // Current date in Y-m-d format.
"15:30:41.987", // Time as a string.
date("c"), // Current date in ISO 8601 format.
date("Y-m-d H:i:s.u") // Current date and time.
);
$stmt = sqlsrv_query($conn, $tsql, $params);
So it seems you can't use "named" parameters like #value1, #value2, but instead you just use question marks ? for each parameter, and you basically just create a parameter array which you then pass into the query.
This article Accessing SQL Server Databases with PHP might also help - it has a similar sample of how to insert data using the parametrized queries.
UPDATE: after you've revealed that you're on Linux, this approach doesn't work anymore. Instead, you need to use an alternate library in PHP to call a database - something like PDO.
PDO should work both on any *nix type operating system, and against all sorts of databases, including SQL Server, and it supports parametrized queries, too:
$db = new PDO('your-connection-string-here');
$stmt = $db->prepare("SELECT priv FROM testUsers WHERE username=:username AND password=:password");
$stmt->bindParam(':username', $user);
$stmt->bindParam(':password', $pass);
$stmt->execute();
No, it's not sufficient. To my knowledge, string replacement can never really be sufficient in general (on any platform).
To prevent SQL injection, all queries need to be parameterized - either as parameterized queries or as stored procedures with parameters.
In these cases, the database calling library (i.e. ADO.NET and SQL Command) sends the parameters separately from the query and the server applies them, which eliminates the ability for the actual SQL to be altered in any way. This has numerous benefits besides injection, which include code page issues and date conversion issues - for that matter any conversions to string can be problematic if the server does not expect them done the way the client does them.
I partially disagree with other posters. If you run all your parameters through a function that double the quotes, this should prevent any possible injection attack. Actually in practice the more frequent problem is not deliberate sabotague but queries that break because a value legitimately includes a single quote, like a customer named "O'Hara" or a comment field of "Don't call Sally before 9:00". Anyway, I do escapes like this all the time and have never had a problem.
One caveat: On some database engines, there could be other dangerous characters besides a single quote. The only example I know is Postgres, where the backslash is magic. In this case your escape function must also double backslashes. Check the documentation.
I have nothing against using prepared statements, and for simple cases, where the only thing that changes is the value of the parameter, they are an excellent solution. But I routinely find that I have to build queries in pieces based on conditions in the program, like if parameter X is not null then not only do I need to add it to the where clause but I also need an additional join to get to the value I really need to test. Prepared statements can't handle this. You could, of course, build the SQL in pieces, turn it into a prepared statement, and then supply the parameters. But this is just a pain for no clear gain.
These days I mostly code in Java that allows functions to be overloaded, that is, have multiple implementations depending on the type of the passed in parameter. So I routine write a set of functions that I normally name simply "q" for "quote", that return the given type, suitably quoted. For strings, it doubles any quote marks, then slaps quote marks around the whole thing. For integers it just returns the string representation of the integer. For dates it converts to the JDBC (Java SQL) standard date format, which the driver is then supposed to convert to whatever is needed for the specific database being used. Etc. (On my current project I even included array as a passed in type, which I convert to a format suitable for use in an IN clause.) Then every time I want to include a field in a SQL statement, I just write "q(x)". As this is slapping quotes on when necessary, I don't need the extra string manipulation to put on quotes, so it's probably just as easy as not doing the escape.
For example, vulnerable way:
String myquery="select name from customer where customercode='"+custcode+"'";
Safe way:
String myquery="select name from customer where customercode="+q(custcode);
The right way is not particularly more to type than the wrong way, so it's easy to get in a good habit.
String replacement to escape quotes IS sufficient to prevent SQL injection attack vectors.
This only applies to SQL Server when QUOTED_IDENTIFIER is ON, and when you don't do something stoopid to your escaped string, such as truncating it or translating your Unicode string to an 8-bit string after escaping. In particular, you need to make sure QUOTED_IDENTIFIER is set to ON. Usually that's the default, but it may depend on the library you are using in PHP to access MSSQL.
Parameterization is a best practice, but there is nothing inherently insecure about escaping quotes to prevent SQL injection, with due care.
The rel issue with escaping strings is not the efficacy of the replacement, it is the potential for forgetting to do the replacement every time.
That said, your code escapes the value, but does not wrap the value in quotes. You need something like this instead:
function mssql_escape($str) {
return "N'" + str_replace("'", "''", $str) + "'";
}
The N above allows you to pass higher Unicode characters. If that's not a concern (i.e., your text fields are varchar rather than nvarchar), you can remove the N.
Now, if you do this, there are some caveats:
You need to make DAMNED SURE you call mssql_escape for every string value. And therein lies the rub.
Dates and GUID values also need escaping in the same manner.
You should validate numeric values, or at least escape them as well using the same function (MSSQL will cast the string to the appropriate numeric type).
Again, like others have said, parameterized queries are safer--not because escaping quotes doesn't work (it does except as noted above), but because it's easier to visually make sure you didn't forget to escape something.

Any python/django function to check whether a string only contains characters included in my database collation?

As expected, I get an error when entering some characters not included in my database collation:
(1267, "Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='")
Is there any function I could use to make sure a string only contains characters existing in my database collation?
thanks
You can use a regular expression to only allow certain characters. The following allows only letters, numbers and _(underscore), but you can change to include whatever you want:
import re
exp = '^[A-Za-z0-9_]+$'
re.match(exp, my_string)
If an object is returned a match is found, if no return value, invalid string.
I'd look at Python's unicode.translate() and codec.encode() functions. Both of these would allow more elegant handling of non-legal input characters, and IIRC, translate() has been shown to be faster than a regexp for similar use-cases (should be easy to google the findings).
From Python's docs:
"For Unicode objects, the translate() method does not accept the optional deletechars argument. Instead, it returns a copy of the s where all characters have been mapped through the given translation table which must be a mapping of Unicode ordinals to Unicode ordinals, Unicode strings or None. Unmapped characters are left untouched. Characters mapped to None are deleted. Note, a more flexible approach is to create a custom character mapping codec using the codecs module (see encodings.cp1251 for an example)."
http://docs.python.org/library/stdtypes.html
http://docs.python.org/library/codecs.html

Resources