SQL Server : parameters for column names instead of values - sql-server

This might seem like a silly question, but I'm surprised that I didn't find a clear answer to this already:
Is it possible to use SQL Server parameters for writing a query with dynamic column names (and table names), or does the input just need to be sanitized very carefully?
The situation is that tables and their column names (and amount of columns) are generated dynamically and there is no way to know beforehand to manually write a query. Since the tables & columns aren't known I can't use an ORM, so I'm resorting to manual queries. Usually I'd use parameters to fill in values to prevent SQL injection, however I'm pretty sure that this cannot be done the same way when specifying the table name and/or column names. I want to create generic queries for insert, update, upsert, and select, but I obviously don't want to open myself up to potential injection. Is there a best practices on how to accomplish this safely?
Just as an FYI - I did see this answer, but since there's no way for me to know the column / table names beforehand a case statement probably won't work for this situation.
Environment: SQL Server 2014 via ADO.NET (.NET 4.5 / C#)

There is no mechanism for passing table or column references to procedures. You just pass them as strings and then use dynamic SQL to build your queries. You do have to take precautions to ensure that your string parameters are valid.
One way to do this would be to validate that all table and column reference strings have valid names in sys.tables and sys.columns before building your T-SQL queries. Then you can be sure that they can be used safely.
You can also use literal parameters with dynamic sql when using the sp_executesql procedure. You can't use it to validate your table and column names, but it validates and prevents SQL injection with your other parameters.

Related

REST Backend with specified columns, SQL questions

I'm working with a new REST backend talking to a SQL Server. Our REST api allows for the caller to pass in the columns/fields they want returned (?fields=id,name,phone).
The idea seems very normal. The issue I'm bumping up against is resistance to dynamically generating the SQL statement. Any arguments passed in would be passed to the database using a parameterized query, so I'm not concerned about SQL injection.
The basic idea would be to "inject" the column-names passed in, into a SQL that looks like:
SELECT <column-names>
FROM myTable
ORDER BY <column-name-to-sort-by>
LIMIT 1000
We sanitize all column names and verify their existence in the table, to prevent SQL injection issues. Most of our programmers are used to having all SQL in static files, and loading them from disk and passing them on to the database. The idea of code creating SQL makes them very nervous.
I guess I'm curious if others actually do this? If so, how do you do this? If not, how do you manage "dynamic columns and dynamic sort-by" requests passed in?
I think a lot of people do it especially when it comes to reporting features. There are actually two things one should do to stay on the safe side:
Parameterize all WHERE clause values
Use user input values to pick correct column/table names, don't use the user values in the sql statement at all
To elaborate on item #2, I would have a dictionary where Key is a possible user input and Value is a correponding column/table name. You can store this dictionary wherever you want: config file, database, hard code, etc. So when you process user input you just check a dictionary if the Key exists and if it does you use the Value to add a column name to your query. This way you just use user input to pick required column names but don't use the actual values in your sql statement. Besides, you might not want to expose all columns. With a predefined dictionary you can easily control the list of available columns for a user.
Hope it helps!
I've done similar to what Maksym suggests. In my case, keys were pulled directly from the database system tables (after scrubbing the user request a bit for syntactic hacks and permissions).
The following query takes care of some minor injection issues through the natural way SQL handles the LIKE condition. This doesn't go as far as handling permissions on each field (as some fields are forbidden based on the log-in) but it provides a very basic way to retrieve these fields dynamically.
CREATE PROC get_allowed_column_names
#input VARCHAR(MAX)
AS BEGIN
SELECT
columns.name AS allowed_column_name
FROM
syscolumns AS columns,
sysobjects AS tables
WHERE
columns.id = tables.id AND
tables.name = 'Categories' AND
#input LIKE '%' + columns.name + '%'
END
GO
-- The following only returns "Picture"
EXEC get_allowed_column_names 'Category_,Cat%,Picture'
GO
-- The following returns both "CategoryID and Picture"
EXEC get_allowed_column_names 'CategoryID, Picture'
GO

Constraint of using Dynamic SQL In a Stored Procedure

I read an article about Building Dynamic SQL In a Stored Procedure, it was really good.That article said:
A Dynamic SQL is needed when we need to retrieve a set of records based on different search parameters
So, i think we can use it in every project which we need to retrieve some records on different search and there is no constraint using Dynamic SQL. is It true?
You should leave dynamic queries as last resort where you are not able to use parameters.
This way you will protect yourself from SQL Injections.
You can always make some parameters optional and use those in WHERE clause:
...
WHERE (#Param1 IS NULL OR Field1=#Param1) AND ... etc
...
This way you if you set #Param1 to NULL it means do not use it.
Another option might be by using full text search, more details about this you might find here:
http://technet.microsoft.com/en-us/library/ms142571.aspx

ADO - Can I edit results of a complex query with multiple join statements?

I'm working on a data conversion utility which can push data from one master database out to a number of different databases. The utility its self will have no knowledge of how data is kept in the destination (table structure), but I would like to provide writing a SQL statement to return data from the destination using a complex SQL query with multiple join statements. As long as the data is in a standardized format that the utility can recognize (field names) in an ADO query.
What I would like to do is then modify the live data in this ADO Query. However, since there are multiple join statements, I'm not sure if it's possible to do this. I know at least with BDE (I've never used BDE), it was very strict and you had to return all fields (*) and such. ADO I know is more flexible, but I don't know quite how flexible in this case.
Is it supposed to be possible to modify data in a TADOQuery in this manner, when the results include fields from different tables? And even if so, suppose I want to append a new record to the end (TADOQuery.Append). Would it append to two different tables?
The actual primary table I'm selecting from has a complimentary table which is joined by the same primary key field, one is a "Small" table (brief info) and the other is a "Detail" table (more info for each record in Small table). So, a typical statement would include something like this:
select ts.record_uid, ts.SomeField, td.SomeOtherField from table_small ts
join table_detail td on td.record_uid = ts.record_uid
There are also a number of other joins to records in other tables, but I'm not worried about appending to those ones. I'm only worried about appending to the "Small" and "Detail" tables - at the same time.
Is such a thing possible in an ADO Query? I'm willing to tweak and modify the SQL statement in any way necessary to make this possible. I have a bad feeling though that it's not possible.
Compatibility:
SQL Server 2000 through 2008 R2
Delphi XE2
Editing these Fields which have no influence on the joins is usually no problem.
Appending is ... you can limit the Append to one of the Tables by
procedure TForm.ADSBeforePost(DataSet: TDataSet);
begin
inherited;
TCustomADODataSet(DataSet).Properties['Unique Table'].Value := 'table_small';
end;
but without an Requery you won't get much further.
The better way will be setting Values by Procedure e.g. in BeforePost, Requery and Abort.
If your View would be persistent you would be able to use INSTEAD OF Triggers
Jerry,
I encountered the same problem on FireBird, and from experience I can tell you that it can be made(up to a small complexity) by using CachedUpdates . A very good resource is this one - http://podgoretsky.com/ftp/Docs/Delphi/D5/dg/11_cache.html. This article has the answers to all your questions.
I have abandoned the original idea of live ADO query updates, as it has become more complex than I can wrap my head around. The scope of the data push project has changed, and therefore this is no longer an issue for me, however still an interesting subject to know.
The new structure of the application consists of attaching multiple "Field Links" on various fields from the original set of data. Each of these links references the original field name and a SQL Statement which is to be executed when that field is being imported. Multiple field links can be on one single field, therefore can execute multiple statements, placing the value in various tables, etc. The end goal was an app which I can easily and repeatedly export a common dataset from an original source to any outside source with different data structures, without having to recompile the app.
However the concept of cached updates was not appealing to me, simply for the fact pointed out in the link in RBA's answer that data can be changed in the database in the mean-time. So I will instead integrate my own method of customizable data pushes.

Using an IN clause with LINQ-to-SQL's ExecuteQuery

LINQ to SQL did a horrible job translating one of my queries, so I rewrote it by hand. The problem is that the rewrite necessarily involves an IN clause, and I cannot for the life of me figure out how to pass a collection to ExecuteQuery for that purpose. The only thing I can come up with, which I've seen suggested on here, is to use string.Format on the entire query string to kluge around it—but that will prevent the query from ever ending up in the query cache.
What's the right way to do this?
NOTE: Please note that I am using raw SQL passed to ExecuteQuery. I said that in the very first sentence. Telling me to use Contains is not helpful, unless you know a way to mix Contains with raw SQL.
Table-Valued Parameters
On Cheezburger.com, we often need to pass a list of AssetIDs or UserIDs into a stored procedure or database query.
The bad way: Dynamic SQL
One way to pass this list in was to use dynamic SQL.
IEnumerable<long> assetIDs = GetAssetIDs();
var myQuery = "SELECT Name FROM Asset WHERE AssetID IN (" + assetIDs.Join(",") + ")";
return Config.GetDatabase().ExecEnumerableSql(dr=>dr.GetString("Name"), myQuery);
This is a very bad thing to do:
Dynamic SQL gives attackers a weakness by making SQL injection attacks easier.
Since we are usually just concatenating numbers together, this is highly unlikely, but
if you start concatenating strings together, all it takes is one user to type ';DROP TABLE Asset;SELECT '
and our site is dead.
Stored procedures can't have dynamic SQL, so the query had to be stored in code instead of in the DB schema.
Every time we run this query, the query plan must be recalculated. This can be very expensive for complicated queries.
However, it does have the advantage that no additional decoding is necessary on the DB side, since the AssetIDs are found by the query parser.
The good way: Table-Valued Parameters
SQL Server 2008 adds a new ability: users can define a table-valued database type.
Most other types are scalar (they only return one value), but table-valued types can hold multiple values, as long as the values are tabular.
We've defined three types: varchar_array, int_array, and bigint_array.
CREATE TYPE bigint_array AS TABLE (Id bigint NOT NULL PRIMARY KEY)
Both stored procedures and programmatically defined SQL queries can use these table-valued types.
IEnumerable<long> assetIDs = GetAssetIDs();
return Config.GetDatabase().ExecEnumerableSql(dr=>dr.GetString("Name"),
"SELECT Name FROM Asset WHERE AssetID IN (SELECT Id FROM #AssetIDs)",
new Parameter("#AssetIDs", assetIDs));
Advantages
Can be used in both stored procedures and programmatic SQL without much effort
Not vulnerable to SQL injection
Cacheable, stable queries
Does not lock the schema table
Not limited to 8k of data
Less work done by both DB server and the Mine apps, since there is no concatenation or decoding of CSV strings.
"typical use" statistics can be derived by the query analyzer, which can lead to even better performance.
Disadvantages
Only works on SQL Server 2008 and above.
Rumors that TVP are prebuffered in their entirety before execution of the query, which means phenomenally large TVPs may be rejected by the server.
Further investigation of this rumor is ongoing.
Further reading
This article is a great resource to learn more about TVP.
If you can't use table-valued parameters, this option is a little faster than the xml option while still allowing you to stay away from dynamic sql: pass the joined list of values as a string parameter, and parse the delimited string back to values in your query. please see this article for instructions on how to do the parsing efficiently.
I have a sneaking suspicion that you're on SQL Server 2005. Table-valued parameters weren't added until 2008, but you can still use the XML data type to pass sets between the client and the server.
This works for SQL Server 2005 (and later):
create procedure IGetAListOfValues
#Ids xml -- This will recevie a List of values
as
begin
-- You can load then in a temp table or use it as a subquery:
create table #Ids (Id int);
INSERT INTO #Ids
SELECT DISTINCT params.p.value('.','int')
FROM #Ids.nodes('/params/p') as params(p);
...
end
You have to invoke this procedure with a parameter like this:
exec IGetAListOfValues
#Ids = '<params> <p>1</p> <p>2</p> </params>' -- xml parameter
The nodes function uses an xPath expression. In this case, it's /params/p and that's way the XML uses <params> as root, and <p> as element.
The value function cast the text inside each p element to int, but you can use it with other data types easily. In this sample there is a DISTINCT to avoid repeated values, but, of course, you can remove it depending on what you want to achieve.
I have an auxiliary (extension) method that converts an IEnumerable<T> in a string that looks like the one shown in the execute example. It's easy to create one, and have it do the work for you whenever you need it. (You have to test the data type of T and convert to an adequate string that can be parsed on SQL Server side). This way your C# code is cleaner and your SPs follow the same pattern to receive the parameters (you can pass in as many lists as needed).
One advantage is that you don't need to make anything special in your database for it to work.
Of course, you don't need to create a temp table as it's done in my example, but you can use the query directly as a subquery inside an IN predicate
WHERE MyTableId IN (SELECT DISTINCT params.p.value('.','int')
FROM #Ids.nodes('/params/p') as params(p) )
I am not 100% sure that I understand correctly the problem, but LinqToSql's ExecuteQuery has an overload for parameters, and the query is supposed to use a format similar to string.Format.
Using this overload is safe against SQL injection, and behind the scenes LinqToSql transalets it to use sp_executesql with parameters.
Here is an example:
string sql = "SELECT * FROM city WHERE city LIKE {0}";
db.ExecuteQuery(sql, "Lon%"); //Note that we don't need the single quotes
This way one can use the benefit of parameterized queries, even while using dynamic sql.
However when it comes to using IN with a dynamic number of parameters, there are two options:
Construct the string dynamically, and then pass the values as an array, as in:
string sql = "SELECT * FROM city WHERE zip IN (";
List<string> placeholders = new List<string>();
for(int i = 0; i < zips.Length;i++)
{
placeholders.Add("{"+i.ToString()+"}");
}
sql += string.Join(",",placeholders.ToArray());
sql += ")";
db.ExecuteQuery(sql, zips.ToArray());
We can use a more compact approach by using the Linq extension methods, as in
string sql = "SELECT * FROM city WHERE zip IN ("+
string.Join("," , zips.Select(z => "{" + zips.IndexOf(f).ToString() + "}"))
+")";
db.ExecuteQuery(sql, zips.ToArray());

dynamic insertion with xml or stored procedure

I have 20 record group that i need to batch insert them all in one connection so there is two solution (XML or stored procedure). this operation frequently executed so i need fast performance and least overhead
1) I think XML is performs slower but we can freely specify how many record we need to insert as a batch by producing the appropriate XML, I don't know the values of each field in a record, there maybe characters that malformed our XML like using " or filed tags in values so how should i prevent this behavior ?
2) using stored procedure is faster but i need to define all input parameters which is boring task and if i need to increase or decrease the number of records inserted in a batch then i need to change the SP
so which solution is better in my environment with respect to my constrains
XML is likely the better choice, however there are other options
If you're using SQL Server 2008 you can use Table Valued parameters instead.
Starting with .NET 2.0 you had the option to use the SQLBulkCopy
If you're using oracle you can pass a user defined type but I'm not sure what versions of ODP and Oracle that works with.
Note these are all .NET samples. I don't know that this will work for you. It would probably help if you include the database and version and client technology that you're using.

Resources