How do SQL parameters work internally? - sql-server

A coworker and I were browsing SO when we came across a question about SQL Injection, and it got us wondering: how do parametrized queries work internally? Does the API you are using (assuming it supports parametrized queries) perform concatenation, combining the query with the parameters? Or do the parameters make it to the SQL engine separately from the query, and no concatenation is performed at all?
Google hasn't been very helpful, but maybe we haven't searched for the right thing.

The parameters make it to the SQL engine separately from the query. Execution plan calculated or reused for the parametrized query, and then query is executed by sql engine with parameters.

Paramters make it to the SQL server intact, and individually "packaged" with meta data indicating their type, whether Input or Output etc. As Alex Reitbort points out, it is so because the parametrized statements are a server level concept, not merely a convenient way of invoking commands from various connection layers.

I doubt that SQL SERVER builds a complete query string from the given parametrized query where the parameter list is concatenated in.
It most likely parses the given parametrized command string splitting it into an internal data structure based on reserved words and symbols (SELECT, FROM, ",", "+", etc). Within that data structure, there are properties/places for values like table names, literals, etc. It is here that it copies (verbatim) the each passed in parameter (from the list) into the proper section of that structure.
so your #UserName value of: 'x';delete from users --
in not never needs to be escaped, just used as the literal value it really is.

Parameters are passed along with the query (not within the query), and are automatically escaped by the API as they are sent in accordance with the underlying database communications protocol.
For example, you might have
Query: <<<<select * from users where username = :username>>>>
Param: <<<<:username text<<<<' or '1' = '1>>>>>>>>
That's not the exact encoding any database protocol actually uses, but you get the idea.

Related

Parameters in BULK COPY statements in npgsql

I've read the information about BULK COPY page on Npgsql's webpage here. Yet looking at the BULK COPY BeginBinaryExport() and BeginBinaryImport() methods, they both take strings. How would one construct a SQL injection-safe version of a query for BeginBinaryImport() that took query parameters, e.g. didn't return all the rows of a table but only the those that passed a certain filter, such as being on a certain date?
Unfortunately this isn't currently supported. I've opened issue https://github.com/npgsql/npgsql/issues/3841 to track this.
In the meantime you'll have to interpolate parameters as strings into the query, and protect against SQL injection yourself.

How can I stop MiniProfiler showing "duplicate" SQL query warnings parameters are different?

I'm using MiniProfiler to check what NPoco is doing with SQL Server but I've noticed it reports duplicate queries even when the SQL parameters have different values.
Eg: if I fetch a string from a database by ID, I might call:
SELECT * FROM PageContent WHERE ID=#ID
...twice on the same page, with two different IDs, but MiniProfiler reports this as a duplicate query even though the results will obviously be different each time.
Is there any way I can get MiniProfiler to consider the SQL parameter values so it doesn't think these queries are duplicated? I'm not sure if this problem is part of MiniProfiler or if it's a problem in how NPoco reports it's actions to MiniProfiler so I'll tag both.
I think that this is by design, and is in fact one of the reasons for the existence of the duplicate query detection.
If you are running that query twice on one page where the only difference is the param value, then you could also run it one time and include both param values in that query.
SELECT * FROM PageContent WHERE ID in (#ID1, #ID2)
So you are doing with two queries what you could do with one (you would have to of course filter on server side, but that is faster than two queries).
The duplicate query label is not for saying that you are running the absolute identical query more than once (though it would apply there as well). Rather it is highlighting an opportunity for optimizing your query approach and consolidating different queries into one (think about what an N+1 situation would look like).
If the default functionality doesn't meet your needs, you can always change it! The functionality that calculates duplicateTimings is located in UI/includes.js. You can provide your own version of this file that defines duplicates in a different way (perhaps by looking at parameter values in addition to the command text when detecting duplicates) by turning on CustomUITemplates inside MiniProfiler, and putting your own version of includes.js in there.

Using an IN clause with LINQ-to-SQL's ExecuteQuery

LINQ to SQL did a horrible job translating one of my queries, so I rewrote it by hand. The problem is that the rewrite necessarily involves an IN clause, and I cannot for the life of me figure out how to pass a collection to ExecuteQuery for that purpose. The only thing I can come up with, which I've seen suggested on here, is to use string.Format on the entire query string to kluge around it—but that will prevent the query from ever ending up in the query cache.
What's the right way to do this?
NOTE: Please note that I am using raw SQL passed to ExecuteQuery. I said that in the very first sentence. Telling me to use Contains is not helpful, unless you know a way to mix Contains with raw SQL.
Table-Valued Parameters
On Cheezburger.com, we often need to pass a list of AssetIDs or UserIDs into a stored procedure or database query.
The bad way: Dynamic SQL
One way to pass this list in was to use dynamic SQL.
IEnumerable<long> assetIDs = GetAssetIDs();
var myQuery = "SELECT Name FROM Asset WHERE AssetID IN (" + assetIDs.Join(",") + ")";
return Config.GetDatabase().ExecEnumerableSql(dr=>dr.GetString("Name"), myQuery);
This is a very bad thing to do:
Dynamic SQL gives attackers a weakness by making SQL injection attacks easier.
Since we are usually just concatenating numbers together, this is highly unlikely, but
if you start concatenating strings together, all it takes is one user to type ';DROP TABLE Asset;SELECT '
and our site is dead.
Stored procedures can't have dynamic SQL, so the query had to be stored in code instead of in the DB schema.
Every time we run this query, the query plan must be recalculated. This can be very expensive for complicated queries.
However, it does have the advantage that no additional decoding is necessary on the DB side, since the AssetIDs are found by the query parser.
The good way: Table-Valued Parameters
SQL Server 2008 adds a new ability: users can define a table-valued database type.
Most other types are scalar (they only return one value), but table-valued types can hold multiple values, as long as the values are tabular.
We've defined three types: varchar_array, int_array, and bigint_array.
CREATE TYPE bigint_array AS TABLE (Id bigint NOT NULL PRIMARY KEY)
Both stored procedures and programmatically defined SQL queries can use these table-valued types.
IEnumerable<long> assetIDs = GetAssetIDs();
return Config.GetDatabase().ExecEnumerableSql(dr=>dr.GetString("Name"),
"SELECT Name FROM Asset WHERE AssetID IN (SELECT Id FROM #AssetIDs)",
new Parameter("#AssetIDs", assetIDs));
Advantages
Can be used in both stored procedures and programmatic SQL without much effort
Not vulnerable to SQL injection
Cacheable, stable queries
Does not lock the schema table
Not limited to 8k of data
Less work done by both DB server and the Mine apps, since there is no concatenation or decoding of CSV strings.
"typical use" statistics can be derived by the query analyzer, which can lead to even better performance.
Disadvantages
Only works on SQL Server 2008 and above.
Rumors that TVP are prebuffered in their entirety before execution of the query, which means phenomenally large TVPs may be rejected by the server.
Further investigation of this rumor is ongoing.
Further reading
This article is a great resource to learn more about TVP.
If you can't use table-valued parameters, this option is a little faster than the xml option while still allowing you to stay away from dynamic sql: pass the joined list of values as a string parameter, and parse the delimited string back to values in your query. please see this article for instructions on how to do the parsing efficiently.
I have a sneaking suspicion that you're on SQL Server 2005. Table-valued parameters weren't added until 2008, but you can still use the XML data type to pass sets between the client and the server.
This works for SQL Server 2005 (and later):
create procedure IGetAListOfValues
#Ids xml -- This will recevie a List of values
as
begin
-- You can load then in a temp table or use it as a subquery:
create table #Ids (Id int);
INSERT INTO #Ids
SELECT DISTINCT params.p.value('.','int')
FROM #Ids.nodes('/params/p') as params(p);
...
end
You have to invoke this procedure with a parameter like this:
exec IGetAListOfValues
#Ids = '<params> <p>1</p> <p>2</p> </params>' -- xml parameter
The nodes function uses an xPath expression. In this case, it's /params/p and that's way the XML uses <params> as root, and <p> as element.
The value function cast the text inside each p element to int, but you can use it with other data types easily. In this sample there is a DISTINCT to avoid repeated values, but, of course, you can remove it depending on what you want to achieve.
I have an auxiliary (extension) method that converts an IEnumerable<T> in a string that looks like the one shown in the execute example. It's easy to create one, and have it do the work for you whenever you need it. (You have to test the data type of T and convert to an adequate string that can be parsed on SQL Server side). This way your C# code is cleaner and your SPs follow the same pattern to receive the parameters (you can pass in as many lists as needed).
One advantage is that you don't need to make anything special in your database for it to work.
Of course, you don't need to create a temp table as it's done in my example, but you can use the query directly as a subquery inside an IN predicate
WHERE MyTableId IN (SELECT DISTINCT params.p.value('.','int')
FROM #Ids.nodes('/params/p') as params(p) )
I am not 100% sure that I understand correctly the problem, but LinqToSql's ExecuteQuery has an overload for parameters, and the query is supposed to use a format similar to string.Format.
Using this overload is safe against SQL injection, and behind the scenes LinqToSql transalets it to use sp_executesql with parameters.
Here is an example:
string sql = "SELECT * FROM city WHERE city LIKE {0}";
db.ExecuteQuery(sql, "Lon%"); //Note that we don't need the single quotes
This way one can use the benefit of parameterized queries, even while using dynamic sql.
However when it comes to using IN with a dynamic number of parameters, there are two options:
Construct the string dynamically, and then pass the values as an array, as in:
string sql = "SELECT * FROM city WHERE zip IN (";
List<string> placeholders = new List<string>();
for(int i = 0; i < zips.Length;i++)
{
placeholders.Add("{"+i.ToString()+"}");
}
sql += string.Join(",",placeholders.ToArray());
sql += ")";
db.ExecuteQuery(sql, zips.ToArray());
We can use a more compact approach by using the Linq extension methods, as in
string sql = "SELECT * FROM city WHERE zip IN ("+
string.Join("," , zips.Select(z => "{" + zips.IndexOf(f).ToString() + "}"))
+")";
db.ExecuteQuery(sql, zips.ToArray());

Define a String constant in SQL Server?

Is it possible in SQL Server to define a String constant? I am rewriting some queries to use stored procedures and each has the same long string as part of an IN statement [a], [b], [c] etc.
It isn't expected to change, but could at some point in future. It is also a very long string (a few hundred characters) so if there is a way to define a global constant for this that would be much easier to work with.
If this is possible I would also be interested to know if it works in this scenario. I had tried to pass this String as a parameter, so I could control it from a single point within my application but the Stored Procedure didn't like it.
You can create a table with a single column and row and disallow writes on it.
Use that as you global string constant (or additional constants, if you wish).
You are asking for one thing (a string constant in MS SQL), but appear to maybe need something else. The reason I say this is because you have given a few hints at your ultimate objective, which appears to be using the same IN clause in multiple stored procedures.
The biggest clue is in the last sentence:
I had tried to pass this String as a
parameter, so I could control it from
a single point within my application
but the Stored Procedure didn't like
it.
Without details of your SQL scripts, I am going to attempt to use some psychic debugging techniques to see if I can get you to what I believe is your actual goal, and not necessarily your stated goal.
Given your Stored Procedure "didn't like that" when you tried to pass in a string as a parameter, I am guessing the composition of the string was simply a delimited list of values, something like "10293, 105968, 501940" or "Juice, Milk, Donuts" (pay no attention to the actual list values - the important part is the delimited list itself). And your SQL may have looked something like this (again, ignore the specific names and focus on the general concept):
SELECT Column1, Column2, Column3
FROM UnknownTable
WHERE Column1 IN (#parameterString);
If this approximately describes the path you tried to take, then you will need to reconsider your approach. Using a regular T-SQL statement, you will not be able to pass a string of parameter values to an IN clause - it just doesn't know what to do with them.
There are alternatives, however:
Dynamic SQL - you can build up the
whole SQL statement, parameters and
all, then execute that in the SQL
database. This probably is not what
you are trying to achieve, since you
are moving script to stored
procedures. But it is listed here
for completeness.
Table of values -
you can create a single-column table
that holds the specific values you
are interested in. Then your Stored
Procedure can simply use the column
from this table for the IN clause).
This way, there is no Dynamic SQL
required. Since you indicate that
the values are not likely to change,
you may just need to populate the
table once, and use it wherever
appropriate.
String Parsing to
derive the list of values - You can
pass the list of values as a string,
then implement code to parse the
list into a table structure on the
fly. An alternative form of this
technique is to pass an XML
structure containing the values, and
use MS SQL Server's XML
functionality to derive the table.
Define a table-value function that
returns the values to use - I have
not tried this one, so I may be
missing something, but you should be
able to define the values in a
table-value function (possibly using
a bunch of UNION statements or
something), and call that function
in the IN clause. Again - this is an
untested suggestion and would need
to be worked through to determine
it's feasibility.
I hope that helps (assuming I have guessed your underlying quandary).
For future reference, it would be extremely helpful if you could include SQL script showing
your table structure and stored procedure logic so we can see what you have actually attempted. This will considerably improve the effectiveness of the answers you receive. Thanks.
P.S. The link for String Parsing actually includes a large variety of techniques for passing arrays (i.e. lists) of information to Stored Procedures - it is a very good resource for this kind of thing.
In addition to string-constants tables as Oded suggests, I have used scalar functions to encapsulate some constants. That would be better for fewer constants, of course, but their use is simple.
Perhaps a combination - string constants table with a function that takes a key and returns the string. You could even use that for localization by having the function take a 'region' and combine that with a key to return a different string!

Prevent SQL injection on free response text fields in classic ASP

I've got some free-response text fields and I'm not sure how to scrub them to prevent SQL injection. Any ideas?
Create a parameterized query instead of concatenating the user's input into the query.
Here is how to do this in classic asp:
http://blog.binarybooyah.com/blog/post/Classic-ASP-data-access-using-parameterized-SQL.aspx
It's also important to note that the only way you can be 100% safe from sql injection is to parameterize any sql statement that uses user input, even once it's in the database. Example: Say you take user input via a parameterized query or stored procedure. You will be safe on the insert, however you need to make sure that anything down the road that uses that input also uses a parameter. Directly concatenating user input is a bad idea anywhere, including inside the db.
Call a stored procedure.
EDIT: Just to clarify. Building dynamic sql in a sp can of course be just as dangerous as doing it in the app, but binding user inputs into a query will protect you against sql injection, as described here (oracle-specific discussion, but the principle applies elsewhere):
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:23863706595353
It is not dynamic sql that is the
issue (all sql is dynamic in Oracle
actually -- even static sql in
pro*c/plsql!). It is "the
construction" of this sql that is the
problem. If a user gives you inputs -
they should be BOUND into the query --
not concatenated. The second you
concatenate user input into your SQL
-- it is as if you gave them the ability to pass you code and you
execute that code. Plain and simple.

Resources