JCR SQL2 query with a property containing '.' char - jackrabbit

I'm currently migrating away form Xpath queries to JCR SQL2 ones.
The main problem I'm experiencing is that many of my properties contains the '.' char that is reserved and I couldn't find a proper way to escape it.
JCR offers a simple way to bind values, but I couldn't find any option for doing the same with properties.

The '.' character is not used in JCR names as a metacharacter, and therefore has no need to be escaped. JCR-SQL2 queries should allow such characters to be unescaped, though you can always try to surround the (qualified or expanded) property names with square brackets.
SELECT [acme:property.1], [acme:property.2] FROM [acme:myType] ...

Related

Escape special characters for Oracle and SQL Server in the same query

I have following query:
SELECT *
FROM PRODUCTS
WHERE REDUCTION LIKE '50%'
I'm required to use the LIKE clause. This query needs to run on both Oracle and SQL Server.
Now there is an issue because I want to match all products with a reduction of 50%. But the data might contain a reduction of 50.50%. Because '%' is a special character it matches both of them.
I want to escape all special characters, like the % in my query so that I only get the products with 50% reduction.
Is there an uniform solution to escape special characters on a dynamical way for both Oracle and SQL server?
Using a backslash is not a solution, because we don't know in practice what the input will be.
The ESCAPE clause works in Oracle and SQL Server.
As for your input, you need to replace the all occurrences of % with \% (preferably before passing the value to RDBMs). You can do this inside a query as well since, fortunately, Oracle REPLACE and SQL Server REPLACE functions have similar signature:
CREATE TABLE tests(test VARCHAR(100));
INSERT INTO tests VALUES('%WINDIR%\SYSTEM32');
SELECT *
FROM tests
WHERE test LIKE REPLACE(REPLACE('%WINDIR%\SYSTEM32', '\', '\\'), '%', '\%') ESCAPE '\'
The ESCAPE clause identifies the backslash (\) as the escape character
SELECT *
FROM PRODUCTS
WHERE REDUCTION LIKE '50\%'
You'll need something like the first answer above, but you don't need to use a \ as the escape. You can choose whatever you want using the ESCAPE clause.
But if:
users are allowed to enter wildcards;
and you need to use LIKE;
and you don't want them treated like wildcards;
then you have to escape them somehow.
Perhaps you can reserve some char you know the user will not need and make that the escape char.
As far as I can tell in Oracle you only need to escape the percent (%) and the underbar (_).
In SQL Server you also have to consider brackets.
A good thing is that overescaping does not look like it will cause problems, so even though you don't need to espace brackets in Oracle, doing so is ok.

How to escape special characters when retrieving data from database?

I am going to generate XML file based on the data returned from SQL Server, but there are some special characters like  and  (there may be other characters like these), which will fail the XML.
Is there any way to escape them?
Thanks!
The control characters U+001C (file separator) and U+001F (unit separator) are not legal to include in an XML 1.0 document, whether verbatim or encoded using a &#...; numeric character reference.
They are allowed in XML 1.1 documents only when included as a character reference. However, XML 1.1 is not nearly as widely accepted as 1.0, and you can't have U+0000 (null) even as a character reference, so it's still not possible to put arbitrary binary data in an XML file — not that it was ever a good idea.
If you want to include data bytes in an XML file you should generally be using an ad hoc encoding of your own that is accepted by all consumers of your particular type of document. It is common to use base64 for the purpose of putting binary data into XML. For formats that do not accommodate any such special encoding scheme, you simply cannot insert these control characters.
What is the purpose of the control characters?
The exact same way you're escaping any other user-supplied input prior to insertion into a database; probably one of (from worst to best):
Escaping control characters prior to construction of an SQL statement
Use of parameterised queries
Use of a DAO or ORM which abstracts this problem away from you
Use parametrized queries and you won't have to worry about escaping. Can't really give you more help as to how to use them unless you mention which language you're using.
Well, I just use the pattern matching stuff to replace those special characters manually. Match for '&#.+?;'

Is this sufficient to prevent query injection while using SQL Server?

I have recently taken on a project in which I need to integrate with PHP/SQL Server. I am looking for the quickest and easiest function to prevent SQL injection on SQL Server as I prefer MySQL and do not anticipate many more SQL Server related projects.
Is this function sufficient?
$someVal = mssql_escape($_POST['someVal']);
$query = "INSERT INTO tblName SET field = $someVal";
mssql_execute($query);
function mssql_escape($str) {
return str_replace("'", "''", $str);
}
If not, what additional steps should I take?
EDIT:
I am running on a Linux server - sqlsrv_query() only works if your hosting environment is windows
The best option: do not use SQL statements that get concatenated together - use parametrized queries.
E.g. do not create something like
string stmt = "INSERT INTO dbo.MyTable(field1,field2) VALUES(" + value1 + ", " + value2 + ")"
or something like that and then try to "sanitize" it by replacing single quotes or something - you'll never catch everything, someone will always find a way around your "safe guarding".
Instead, use:
string stmt = "INSERT INTO dbo.MyTable(field1,field2) VALUES(#value1, #value2)";
and then set the parameter values before executing this INSERT statement. This is really the only reliable way to avoid SQL injection - use it!
UPDATE: how to use parametrized queries from PHP - I found something here - does that help at all?
$tsql = "INSERT INTO DateTimeTable (myDate, myTime,
myDateTimeOffset, myDatetime2)
VALUES (?, ?, ?, ?)";
$params = array(
date("Y-m-d"), // Current date in Y-m-d format.
"15:30:41.987", // Time as a string.
date("c"), // Current date in ISO 8601 format.
date("Y-m-d H:i:s.u") // Current date and time.
);
$stmt = sqlsrv_query($conn, $tsql, $params);
So it seems you can't use "named" parameters like #value1, #value2, but instead you just use question marks ? for each parameter, and you basically just create a parameter array which you then pass into the query.
This article Accessing SQL Server Databases with PHP might also help - it has a similar sample of how to insert data using the parametrized queries.
UPDATE: after you've revealed that you're on Linux, this approach doesn't work anymore. Instead, you need to use an alternate library in PHP to call a database - something like PDO.
PDO should work both on any *nix type operating system, and against all sorts of databases, including SQL Server, and it supports parametrized queries, too:
$db = new PDO('your-connection-string-here');
$stmt = $db->prepare("SELECT priv FROM testUsers WHERE username=:username AND password=:password");
$stmt->bindParam(':username', $user);
$stmt->bindParam(':password', $pass);
$stmt->execute();
No, it's not sufficient. To my knowledge, string replacement can never really be sufficient in general (on any platform).
To prevent SQL injection, all queries need to be parameterized - either as parameterized queries or as stored procedures with parameters.
In these cases, the database calling library (i.e. ADO.NET and SQL Command) sends the parameters separately from the query and the server applies them, which eliminates the ability for the actual SQL to be altered in any way. This has numerous benefits besides injection, which include code page issues and date conversion issues - for that matter any conversions to string can be problematic if the server does not expect them done the way the client does them.
I partially disagree with other posters. If you run all your parameters through a function that double the quotes, this should prevent any possible injection attack. Actually in practice the more frequent problem is not deliberate sabotague but queries that break because a value legitimately includes a single quote, like a customer named "O'Hara" or a comment field of "Don't call Sally before 9:00". Anyway, I do escapes like this all the time and have never had a problem.
One caveat: On some database engines, there could be other dangerous characters besides a single quote. The only example I know is Postgres, where the backslash is magic. In this case your escape function must also double backslashes. Check the documentation.
I have nothing against using prepared statements, and for simple cases, where the only thing that changes is the value of the parameter, they are an excellent solution. But I routinely find that I have to build queries in pieces based on conditions in the program, like if parameter X is not null then not only do I need to add it to the where clause but I also need an additional join to get to the value I really need to test. Prepared statements can't handle this. You could, of course, build the SQL in pieces, turn it into a prepared statement, and then supply the parameters. But this is just a pain for no clear gain.
These days I mostly code in Java that allows functions to be overloaded, that is, have multiple implementations depending on the type of the passed in parameter. So I routine write a set of functions that I normally name simply "q" for "quote", that return the given type, suitably quoted. For strings, it doubles any quote marks, then slaps quote marks around the whole thing. For integers it just returns the string representation of the integer. For dates it converts to the JDBC (Java SQL) standard date format, which the driver is then supposed to convert to whatever is needed for the specific database being used. Etc. (On my current project I even included array as a passed in type, which I convert to a format suitable for use in an IN clause.) Then every time I want to include a field in a SQL statement, I just write "q(x)". As this is slapping quotes on when necessary, I don't need the extra string manipulation to put on quotes, so it's probably just as easy as not doing the escape.
For example, vulnerable way:
String myquery="select name from customer where customercode='"+custcode+"'";
Safe way:
String myquery="select name from customer where customercode="+q(custcode);
The right way is not particularly more to type than the wrong way, so it's easy to get in a good habit.
String replacement to escape quotes IS sufficient to prevent SQL injection attack vectors.
This only applies to SQL Server when QUOTED_IDENTIFIER is ON, and when you don't do something stoopid to your escaped string, such as truncating it or translating your Unicode string to an 8-bit string after escaping. In particular, you need to make sure QUOTED_IDENTIFIER is set to ON. Usually that's the default, but it may depend on the library you are using in PHP to access MSSQL.
Parameterization is a best practice, but there is nothing inherently insecure about escaping quotes to prevent SQL injection, with due care.
The rel issue with escaping strings is not the efficacy of the replacement, it is the potential for forgetting to do the replacement every time.
That said, your code escapes the value, but does not wrap the value in quotes. You need something like this instead:
function mssql_escape($str) {
return "N'" + str_replace("'", "''", $str) + "'";
}
The N above allows you to pass higher Unicode characters. If that's not a concern (i.e., your text fields are varchar rather than nvarchar), you can remove the N.
Now, if you do this, there are some caveats:
You need to make DAMNED SURE you call mssql_escape for every string value. And therein lies the rub.
Dates and GUID values also need escaping in the same manner.
You should validate numeric values, or at least escape them as well using the same function (MSSQL will cast the string to the appropriate numeric type).
Again, like others have said, parameterized queries are safer--not because escaping quotes doesn't work (it does except as noted above), but because it's easier to visually make sure you didn't forget to escape something.

When naming columns in a SQL Server table, are there any names I should avoid using?

I remember when I was working with PHP several years back I could blow up my application by naming a MySQL column 'desc' or any other term that was used as an operator.
So, in general are there names I should avoid giving my table columns?
As long as you surround every column name with '[' and ']', it really doesn't matter what you use. Even a space works (try it: [ ]).
Edit: If you can't use '[' and ']' in every case, check the documentation for characters that are not allowable as well as keywords that are intrinsic to the system; those would be out of bounds. Off the top of my head, the characters allowed (for SqlServer) for an identifier are: a-z, A-Z, 0-9, _, $, #.
in general don't start with a number, don't use spaces, don't use reserved words and don't use non alphanumeric characters
however if you really want to you can still do it but you need to surround it with brackets
this will fail
create table 1abc (id int)
this will not fail
create table [1abc] (id int)
but now you need to use [] all the time, I would avoid names as the ones I mentioned above
Check the list of reserved keywords as indicated in other answers.
Also avoid using the "quoting" using quotes or square brackets for the sake of having a space or other special character in the object name. The reason is that when quoted the object name becomes case sensitive in some database engines (not sure about MSSQL though)
Some teams use the prefix for database objects (tables, views, columns) like T_PERSON, V_PERSON, C_NAME etc. I personally do not like this convention, but it does help avoiding keyword issues.
You should avoid any reserved SQL keywords (ex. SELECT) and from a best practices should avoid spaces.
Yes, and no.
Yes, because it's annoying and confusing to have names that match keywords, and that you have to escape in funny ways (when you're not consistently escaping)
and No, because it's possible to have any sequence of characters as an identifier, if you escape it properly :)
Use [square brackets] or "double quotes" to escape multi-word identifiers or keywords, or even names that have backslashes or any other slightly odd character, if you must.
Strictly speaking, there's nothing you can't name your columns. However, it will make your life easier if you avoid names with spaces, SQL reserved words, and reserved words in the language you're programming in.
You can use pretty much anything as long as you surround it with square brackets:
SELECT [value], [select], [insert] FROM SomeTable
I however like to avoid doing this, partly because typing square brackets everywhere is anoying and partyly because I dont generally find that column names like 'value' particularly descriptive! :-)
Just stay away from SQL keywords and anything which contains something other than letters and you shouldn't need to use those pesky square brackets.
You can surround a word in square brackets [] and basically use anything you'd like.
I prefer not to use the brackets, and in order to do so you just have to avoid reserved words.
MS SQL Server 2008 has these reserved words
Beware of using square brackets on updates, I had a problem using the following query:
UPDATE logs SET locked=1 WHERE [id] IN (SELECT [id] FROM ids)
This caused all records to be updated, however, this appears to work fine:
UPDATE logs SET locked=1 WHERE id IN (SELECT [id] FROM ids)
Note that this problem appears specific to updates, as the following returns only the rows expected (not all rows):
SELECT * FROM logs WHERE [id] IN (SELECT [id] FROM ids)
This was using MSDE 2000 SP3 and connecting to the database using MS SQL (2000) Query Analyzer V 8.00.194
Very odd, possibly related to this Knowledgebase bug http://support.microsoft.com/kb/140215
In the end I just removed all the unnecessary square brackets.

How do you get leading wildcard full-text searches to work in SQL Server?

Note: I am using SQL's Full-text search capabilities, CONTAINS clauses and all - the * is the wildcard in full-text, % is for LIKE clauses only.
I've read in several places now that "leading wildcard" searches (e.g. using "*overflow" to match "stackoverflow") is not supported in MS SQL. I'm considering using a CLR function to add regex matching, but I'm curious to see what other solutions people might have.
More Info: You can add the asterisk only at the end of the word or phrase. - along with my empirical experience: When matching "myvalue", "my*" works, but "(asterisk)value" returns no match, when doing a query as simple as:
SELECT * FROM TABLENAME WHERE CONTAINS(TextColumn, '"*searchterm"');
Thus, my need for a workaround. I'm only using search in my site on an actual search page - so it needs to work basically the same way that Google works (in the eyes on a Joe Sixpack-type user). Not nearly as complicated, but this sort of match really shouldn't fail.
Workaround only for leading wildcard:
store the text reversed in a different field (or in materialised view)
create a full text index on this column
find the reversed text with an *
SELECT *
FROM TABLENAME
WHERE CONTAINS(TextColumnREV, '"mrethcraes*"');
Of course there are many drawbacks, just for quick workaround...
Not to mention CONTAINSTABLE...
The problem with leading Wildcards: They cannot be indexed, hence you're doing a full table scan.
It is possible to use the wildcard "*" at the end of the word or phrase (prefix search).
For example, this query will find all "datab", "database", "databases" ...
SELECT * FROM SomeTable WHERE CONTAINS(ColumnName, '"datab*"')
But, unforutnately, it is not possible to search with leading wildcard.
For example, this query will not find "database"
SELECT * FROM SomeTable WHERE CONTAINS(ColumnName, '"*abase"')
To perhaps add clarity to this thread, from my testing on 2008 R2, Franjo is correct above. When dealing with full text searching, at least when using the CONTAINS phrase, you cannot use a leading , only a trailing functionally. * is the wildcard, not % in full text.
Some have suggested that * is ignored. That does not seem to be the case, my results seem to show that the trailing * functionality does work. I think leading * are ignored by the engine.
My added problem however is that the same query, with a trailing *, that uses full text with wildcards worked relatively fast on 2005(20 seconds), and slowed to 12 minutes after migrating the db to 2008 R2. It seems at least one other user had similar results and he started a forum post which I added to... FREETEXT works fast still, but something "seems" to have changed with the way 2008 processes trailing * in CONTAINS. They give all sorts of warnings in the Upgrade Advisor that they "improved" FULL TEXT so your code may break, but unfortunately they do not give you any specific warnings about certain deprecated code etc. ...just a disclaimer that they changed it, use at your own risk.
http://social.msdn.microsoft.com/Forums/ar-SA/sqlsearch/thread/7e45b7e4-2061-4c89-af68-febd668f346c
Maybe, this is the closest MS hit related to these issues... http://msdn.microsoft.com/en-us/library/ms143709.aspx
One thing worth keeping in mind is that leading wildcard queries come at a significant performance premium, compared to other wildcard usages.
Note: this was the answer I submitted for the original version #1 of the question before the CONTAINS keyword was introduced in revision #2. It's still factually accurate.
The wildcard character in SQL Server is the % sign and it works just fine, leading, trailing or otherwise.
That said, if you're going to be doing any kind of serious full text searching then I'd consider utilising the Full Text Index capabilities. Using % and _ wild cards will cause your database to take a serious performance hit.
Just FYI, Google does not do any substring searches or truncation, right or left. They have a wildcard character * to find unknown words in a phrase, but not a word.
Google, along with most full-text search engines, sets up an inverted index based on the alphabetical order of words, with links to their source documents. Binary search is wicked fast, even for huge indexes. But it's really really hard to do a left-truncation in this case, because it loses the advantage of the index.
As a parameter in a stored procedure you can use it as:
ALTER procedure [dbo].[uspLkp_DrugProductSelectAllByName]
(
#PROPRIETARY_NAME varchar(10)
)
as
set nocount on
declare #PROPRIETARY_NAME2 varchar(10) = '"' + #PROPRIETARY_NAME + '*"'
select ldp.*, lkp.DRUG_PKG_ID
from Lkp_DrugProduct ldp
left outer join Lkp_DrugPackage lkp on ldp.DRUG_PROD_ID = lkp.DRUG_PROD_ID
where contains(ldp.PROPRIETARY_NAME, #PROPRIETARY_NAME2)
When it comes to full-text searching, for my money nothing beats Lucene. There is a .Net port available that is compatible with indexes created with the Java version.
There's a little work involved in that you have to create/maintain the indexes, but the search speed is fantastic and you can create all sorts of interesting queries. Even indexing speed is pretty good - we just completely rebuild our indexes once a day and don't worry about updating them.
As an example, this search functionality is powered by Lucene.Net.
Perhaps the following link will provide the final answer to this use of wildcards: Performing FTS Wildcard Searches.
Note the passage that states: "However, if you specify “Chain” or “Chain”, you will not get the expected result. The asterisk will be considered as a normal punctuation mark not a wildcard character. "
If you have access to the list of words of the full text search engine, you could do a 'like' search on this list and match the database with the words found, e.g. a table 'words' with following words:
pie
applepie
spies
cherrypie
dog
cat
To match all words containing 'pie' in this database on a fts table 'full_text' with field 'text':
to-match <- SELECT word FROM words WHERE word LIKE '%pie%'
matcher = ""
a = ""
foreach(m, to-match) {
matcher += a
matcher += m
a = " OR "
}
SELECT text FROM full_text WHERE text MATCH matcher
% Matches any number of characters
_ Matches a single character
I've never used Full-Text indexing but you can accomplish rather complex and fast search queries with simply using the build in T-SQL string functions.
From SQL Server Books Online:
To write full-text queries in
Microsoft SQL Server 2005, you must
learn how to use the CONTAINS and
FREETEXT Transact-SQL predicates, and
the CONTAINSTABLE and FREETEXTTABLE
rowset-valued functions.
That means all of the queries written above with the % and _ are not valid full text queries.
Here is a sample of what a query looks like when calling the CONTAINSTABLE function.
SELECT RANK , * FROM TableName ,
CONTAINSTABLE (TableName, *, '
"*WildCard" ') searchTable WHERE
[KEY] = TableName.pk ORDER BY
searchTable.RANK DESC
In order for the CONTAINSTABLE function to know that I'm using a wildcard search, I have to wrap it in double quotes. I can use the wildcard character * at the beginning or ending. There are a lot of other things you can do when you're building the search string for the CONTAINSTABLE function. You can search for a word near another word, search for inflectional words (drive = drives, drove, driving, and driven), and search for synonym of another word (metal can have synonyms such as aluminum and steel).
I just created a table, put a full text index on the table and did a couple of test searches and didn't have a problem, so wildcard searching works as intended.
[Update]
I see that you've updated your question and know that you need to use one of the functions.
You can still search with the wildcard at the beginning, but if the word is not a full word following the wildcard, you have to add another wildcard at the end.
Example: "*ildcar" will look for a single word as long as it ends with "ildcar".
Example: "*ildcar*" will look for a single word with "ildcar" in the middle, which means it will match "wildcard". [Just noticed that Markdown removed the wildcard characters from the beginning and ending of my quoted string here.]
[Update #2]
Dave Ward - Using a wildcard with one of the functions shouldn't be a huge perf hit. If I created a search string with just "*", it will not return all rows, in my test case, it returned 0 records.

Resources