Using an IN clause with LINQ-to-SQL's ExecuteQuery - sql-server

LINQ to SQL did a horrible job translating one of my queries, so I rewrote it by hand. The problem is that the rewrite necessarily involves an IN clause, and I cannot for the life of me figure out how to pass a collection to ExecuteQuery for that purpose. The only thing I can come up with, which I've seen suggested on here, is to use string.Format on the entire query string to kluge around it—but that will prevent the query from ever ending up in the query cache.
What's the right way to do this?
NOTE: Please note that I am using raw SQL passed to ExecuteQuery. I said that in the very first sentence. Telling me to use Contains is not helpful, unless you know a way to mix Contains with raw SQL.

Table-Valued Parameters
On Cheezburger.com, we often need to pass a list of AssetIDs or UserIDs into a stored procedure or database query.
The bad way: Dynamic SQL
One way to pass this list in was to use dynamic SQL.
IEnumerable<long> assetIDs = GetAssetIDs();
var myQuery = "SELECT Name FROM Asset WHERE AssetID IN (" + assetIDs.Join(",") + ")";
return Config.GetDatabase().ExecEnumerableSql(dr=>dr.GetString("Name"), myQuery);
This is a very bad thing to do:
Dynamic SQL gives attackers a weakness by making SQL injection attacks easier.
Since we are usually just concatenating numbers together, this is highly unlikely, but
if you start concatenating strings together, all it takes is one user to type ';DROP TABLE Asset;SELECT '
and our site is dead.
Stored procedures can't have dynamic SQL, so the query had to be stored in code instead of in the DB schema.
Every time we run this query, the query plan must be recalculated. This can be very expensive for complicated queries.
However, it does have the advantage that no additional decoding is necessary on the DB side, since the AssetIDs are found by the query parser.
The good way: Table-Valued Parameters
SQL Server 2008 adds a new ability: users can define a table-valued database type.
Most other types are scalar (they only return one value), but table-valued types can hold multiple values, as long as the values are tabular.
We've defined three types: varchar_array, int_array, and bigint_array.
CREATE TYPE bigint_array AS TABLE (Id bigint NOT NULL PRIMARY KEY)
Both stored procedures and programmatically defined SQL queries can use these table-valued types.
IEnumerable<long> assetIDs = GetAssetIDs();
return Config.GetDatabase().ExecEnumerableSql(dr=>dr.GetString("Name"),
"SELECT Name FROM Asset WHERE AssetID IN (SELECT Id FROM #AssetIDs)",
new Parameter("#AssetIDs", assetIDs));
Advantages
Can be used in both stored procedures and programmatic SQL without much effort
Not vulnerable to SQL injection
Cacheable, stable queries
Does not lock the schema table
Not limited to 8k of data
Less work done by both DB server and the Mine apps, since there is no concatenation or decoding of CSV strings.
"typical use" statistics can be derived by the query analyzer, which can lead to even better performance.
Disadvantages
Only works on SQL Server 2008 and above.
Rumors that TVP are prebuffered in their entirety before execution of the query, which means phenomenally large TVPs may be rejected by the server.
Further investigation of this rumor is ongoing.
Further reading
This article is a great resource to learn more about TVP.

If you can't use table-valued parameters, this option is a little faster than the xml option while still allowing you to stay away from dynamic sql: pass the joined list of values as a string parameter, and parse the delimited string back to values in your query. please see this article for instructions on how to do the parsing efficiently.

I have a sneaking suspicion that you're on SQL Server 2005. Table-valued parameters weren't added until 2008, but you can still use the XML data type to pass sets between the client and the server.

This works for SQL Server 2005 (and later):
create procedure IGetAListOfValues
#Ids xml -- This will recevie a List of values
as
begin
-- You can load then in a temp table or use it as a subquery:
create table #Ids (Id int);
INSERT INTO #Ids
SELECT DISTINCT params.p.value('.','int')
FROM #Ids.nodes('/params/p') as params(p);
...
end
You have to invoke this procedure with a parameter like this:
exec IGetAListOfValues
#Ids = '<params> <p>1</p> <p>2</p> </params>' -- xml parameter
The nodes function uses an xPath expression. In this case, it's /params/p and that's way the XML uses <params> as root, and <p> as element.
The value function cast the text inside each p element to int, but you can use it with other data types easily. In this sample there is a DISTINCT to avoid repeated values, but, of course, you can remove it depending on what you want to achieve.
I have an auxiliary (extension) method that converts an IEnumerable<T> in a string that looks like the one shown in the execute example. It's easy to create one, and have it do the work for you whenever you need it. (You have to test the data type of T and convert to an adequate string that can be parsed on SQL Server side). This way your C# code is cleaner and your SPs follow the same pattern to receive the parameters (you can pass in as many lists as needed).
One advantage is that you don't need to make anything special in your database for it to work.
Of course, you don't need to create a temp table as it's done in my example, but you can use the query directly as a subquery inside an IN predicate
WHERE MyTableId IN (SELECT DISTINCT params.p.value('.','int')
FROM #Ids.nodes('/params/p') as params(p) )

I am not 100% sure that I understand correctly the problem, but LinqToSql's ExecuteQuery has an overload for parameters, and the query is supposed to use a format similar to string.Format.
Using this overload is safe against SQL injection, and behind the scenes LinqToSql transalets it to use sp_executesql with parameters.
Here is an example:
string sql = "SELECT * FROM city WHERE city LIKE {0}";
db.ExecuteQuery(sql, "Lon%"); //Note that we don't need the single quotes
This way one can use the benefit of parameterized queries, even while using dynamic sql.
However when it comes to using IN with a dynamic number of parameters, there are two options:
Construct the string dynamically, and then pass the values as an array, as in:
string sql = "SELECT * FROM city WHERE zip IN (";
List<string> placeholders = new List<string>();
for(int i = 0; i < zips.Length;i++)
{
placeholders.Add("{"+i.ToString()+"}");
}
sql += string.Join(",",placeholders.ToArray());
sql += ")";
db.ExecuteQuery(sql, zips.ToArray());
We can use a more compact approach by using the Linq extension methods, as in
string sql = "SELECT * FROM city WHERE zip IN ("+
string.Join("," , zips.Select(z => "{" + zips.IndexOf(f).ToString() + "}"))
+")";
db.ExecuteQuery(sql, zips.ToArray());

Related

SSIS comma delimited string in where clause

I am trying to see if there is an easy answer for this. I have done something similar using multiple pick dropdown parameters in SSRS but this appears to be different.
My scenario is this, so maybe there is an even better answer.
I have a production server that I do not want to make any changes to including temp tables or functions. The production server has a table of clients with about 1600 records. I have set up an SSIS package that will allow transfer of data from production to dev based on a clientid. So my sources would have a query similar to Select Field From Table Where ClientId = ?
This works fine. Now I want to load more than one client, based an data in the clients table. It may be Select ClientId From Clients where Field = A and returns multiple ClientIds.
I am able to populate a comma delimited list from an execute sql task to a SSIS variable, so it maybe 1,4,8.
If I change my source query to use ClientId in (?) I get a conversion error.
I have looked at many posts that advocate a temp table or a function which I want to avoid. Select IN using varchar string with comma delimited values
I have contemplated building the entire sql statement into a variable but this don't seem like the right path as I have many tables to query and transfer where using ClientId = ? works well without having to build each individual SQL statement to a variable.
Is there an easy fix I am missing? I will turn my research now to try to find out how I did this in SSRS but I thought that I should try a post here to see if someone has accomplished this before.
I appreciate any info on this, thank you.
EDIT: Key note is that the column on clients is on the dev server, so I cannot just use a select in the where clause as the column does not exist on the production server.
EDIT: I did not mention that I am specifically looking at OLEDB sources mapping a parameter to ? in the sql statement.
EDIT: Narrowing down on this but having trouble relating SSRS and SSIS functionality. In SSRS its called a multi-value parameter in the following link the key line is
WHERE Production.ProductInventory.ProductID IN (#ProductID)
https://msdn.microsoft.com/en-us/library/dn385719(v=sql.110).aspx
This one looks good as well
https://sqlblogcasts.com/blogs/simons/archive/2007/11/22/RS-HowTo---Pass-a-multivalue-parameter-to-a-query-using-IN.aspx
I will keep researching and thank you for the help so far.
I think this sums it up best
This functionality is limited to strictly using embedded SQL.
What SSRS does is transform your SQL column IN (#value) to column IN
(#selectedvalue1,#selectedvalue2) etc.
You need to forget anything you have about the other ways of passing
lists to SQL i.e. building strings etc. and make sure you declare the
data types are correct for the value of your parameter.
You do not need to use the Join(parameters!,",") trick UNLESS
you are passing the list to a stored procedure.
In which case you then need to use some function to turn the delimited
list into a rowset as you have done.
I hope that helps
The core question is if I can get the same functionality in SSIS as in SSRS. It reminds me of macro substitution..
If you dont want to create a function, you can use the following in your t-sql statement.
Declare #ClientIds nvarchar(50) = '123,456'; --<-- Comma delimited list of Client Ids
Select Field
From Table
Where ClientId IN (
SELECT CAST(RTRIM(LTRIM(Split.a.value('.', 'VARCHAR(100)'))) AS INT) ClientIDs
FROM (
SELECT Cast ('<X>'
+ Replace(#ClientIds, ',', '</X><X>')
+ '</X>' AS XML) AS Data
) AS t CROSS APPLY Data.nodes ('/X') AS Split(a)
)

SQL Server : parameters for column names instead of values

This might seem like a silly question, but I'm surprised that I didn't find a clear answer to this already:
Is it possible to use SQL Server parameters for writing a query with dynamic column names (and table names), or does the input just need to be sanitized very carefully?
The situation is that tables and their column names (and amount of columns) are generated dynamically and there is no way to know beforehand to manually write a query. Since the tables & columns aren't known I can't use an ORM, so I'm resorting to manual queries. Usually I'd use parameters to fill in values to prevent SQL injection, however I'm pretty sure that this cannot be done the same way when specifying the table name and/or column names. I want to create generic queries for insert, update, upsert, and select, but I obviously don't want to open myself up to potential injection. Is there a best practices on how to accomplish this safely?
Just as an FYI - I did see this answer, but since there's no way for me to know the column / table names beforehand a case statement probably won't work for this situation.
Environment: SQL Server 2014 via ADO.NET (.NET 4.5 / C#)
There is no mechanism for passing table or column references to procedures. You just pass them as strings and then use dynamic SQL to build your queries. You do have to take precautions to ensure that your string parameters are valid.
One way to do this would be to validate that all table and column reference strings have valid names in sys.tables and sys.columns before building your T-SQL queries. Then you can be sure that they can be used safely.
You can also use literal parameters with dynamic sql when using the sp_executesql procedure. You can't use it to validate your table and column names, but it validates and prevents SQL injection with your other parameters.

How do SQL parameters work internally?

A coworker and I were browsing SO when we came across a question about SQL Injection, and it got us wondering: how do parametrized queries work internally? Does the API you are using (assuming it supports parametrized queries) perform concatenation, combining the query with the parameters? Or do the parameters make it to the SQL engine separately from the query, and no concatenation is performed at all?
Google hasn't been very helpful, but maybe we haven't searched for the right thing.
The parameters make it to the SQL engine separately from the query. Execution plan calculated or reused for the parametrized query, and then query is executed by sql engine with parameters.
Paramters make it to the SQL server intact, and individually "packaged" with meta data indicating their type, whether Input or Output etc. As Alex Reitbort points out, it is so because the parametrized statements are a server level concept, not merely a convenient way of invoking commands from various connection layers.
I doubt that SQL SERVER builds a complete query string from the given parametrized query where the parameter list is concatenated in.
It most likely parses the given parametrized command string splitting it into an internal data structure based on reserved words and symbols (SELECT, FROM, ",", "+", etc). Within that data structure, there are properties/places for values like table names, literals, etc. It is here that it copies (verbatim) the each passed in parameter (from the list) into the proper section of that structure.
so your #UserName value of: 'x';delete from users --
in not never needs to be escaped, just used as the literal value it really is.
Parameters are passed along with the query (not within the query), and are automatically escaped by the API as they are sent in accordance with the underlying database communications protocol.
For example, you might have
Query: <<<<select * from users where username = :username>>>>
Param: <<<<:username text<<<<' or '1' = '1>>>>>>>>
That's not the exact encoding any database protocol actually uses, but you get the idea.

Search Query - SQL Server 2005 - Ideas - Knowledge Sharing

Currently I am designing a database schema where one table will contains details about all students of a university.
I am thinking the way how can I create the search engine query for administrators where they will search for students. (Some properties are Age, Location, Name, Surname etc etc) (approx 20 properties - 1 table)
My idea is to create the sql query dynamically from the code side. Is it the best way or is there any other better ways?
Shall I use a stored procedure?
Is there any other ways?
feel free to share
I am going to assume you have a front end that collects user input, executes a query and returns a result. I would say you HAVE to create the query dynamically from the code side. At the very least you will need to pass in variables that the user selected to query by. I would probably create a method that takes in the key/value search data and use that to execute the query. Because it will only be one table there would probably be no need for a view or stored procedure. I think a simple select statement including your search criteria will work fine.
I would suggest you to use LINQ to SQL and this will allow you to write such queries just in C# code without any SQL procedures. LINQ to SQL will care about security and prevent SQL injections
p.s.
Do not ever compose SQL from concatenated strings like SQL = "select * from table where " + "param1=" + param1 ... :)

Is Dynamic SQL more vulnerable to SQL Injection/hacking?

Is Dynamic SQL more vulnerable to SQL Injection/hacking?
If yes, how to prevent?
If you use parameters instead of string concatenation to specify your filter-criteria, then it should not be vulnerable for Sql injection.
For instance:
do this:
string sqlQuery = "SELECT * FROM Persons WHERE Persons.Name LIKE #name";
SqlCommand cmd = new SqlCommand ( sqlQuery );
...
cmd.Parameters.Add ("#name", SqlDbType.VarChar).Value = aName + "%";
instead of this:
string sqlQuery = "SELECT * FROM Persons WHERE Persons.Name LIKE \'" + aName + "%\'";
The first example is not vulnerable for sql injection, but the 2nd example is very much vulnerable.
The same applies for dynamic SQL that you use in stored procedures for instance.
There, you can create a dynamic sql statement that uses parameters as well; You should then execute the dynamic statement using sp_executesql which enables you to specify parameters.
Quick answer is yes, if you're building Sql on the fly within yiour app, you have to be aware of every little trick that the rogues will try. When you're using stored procedures most of that will have been taken care of by your vendor.
A good way of reducing the chance of sql injection is to use parameter queries as above, if that's not appropriate make sure that any user generated field is stripped of non alpha characters. Take out quotes, semicolons etc. Also make sure you're connection only has enough access to do what it needs, if you're only querying data, then create a user/security group whatever that only allows select, not update and especially not delete. It can also be good practise to write the sql to a log - that way you know what people are doing, and you can tune, and spot injection attempts.
Inside TSQL, you should use sp_ExecuteSql to execute any dynamic commands you need (for example, to provide flexible searching/sorting).
Note that unless you jump through some hoops with certificates, you still need direct SELECT (etc) permission to the table (unlike a SPROC which cab provide access implicitly), but it should be injection safe. For example:
DECLARE #command nvarchar(4000), #name varchar(50)
SELECT #command = 'SELECT * FROM [CUSTOMER] WHERE [Name] = #Name',
#name = 'Fred'
EXEC sp_ExecuteSql #command, N'#Name varchar(50)', #name
There is obviously no need to use dynamic SQL in the above - this is for illustration only! The main times this is useful is when (inside a SPROC) you have multiple optional search conditions, or a flexibly ORDER BY clause.
In non-TSQL clients, you can do the same with parameters to the command.
Note also that sp_ExecuteSql also makes use of the procedure cache, so can be more efficient than raw EXEC (#command).
It depends on how dynamic your query is.
If you mean storing a dynamic value then that isn't a problem as long as you use parameters as Frederik suggests.
If you mean building queries accoring to dynamic criteria then you may be in trouble :-)
Say for example that you have a string dictionary of fields to update as the key and the new value as the item. Then you can build an update query dynamically using the dictionary. Now if a hacker manages to change one of your field names he may manage to insert a custom query and thereby hack your system.
To avoid this you may be able to do some clever verification of the field names. Maybe checking them against the tables columns. But the safer option would be to use a fixed query that updates all values and giving it the original value for all columns that didn't change. This way you can use parameters for the values, which is safe, and you are safe against sql injection in the field names.
Take a look here for an interesting discussion around this topic.

Resources