REST Backend with specified columns, SQL questions - sql-server

I'm working with a new REST backend talking to a SQL Server. Our REST api allows for the caller to pass in the columns/fields they want returned (?fields=id,name,phone).
The idea seems very normal. The issue I'm bumping up against is resistance to dynamically generating the SQL statement. Any arguments passed in would be passed to the database using a parameterized query, so I'm not concerned about SQL injection.
The basic idea would be to "inject" the column-names passed in, into a SQL that looks like:
SELECT <column-names>
FROM myTable
ORDER BY <column-name-to-sort-by>
LIMIT 1000
We sanitize all column names and verify their existence in the table, to prevent SQL injection issues. Most of our programmers are used to having all SQL in static files, and loading them from disk and passing them on to the database. The idea of code creating SQL makes them very nervous.
I guess I'm curious if others actually do this? If so, how do you do this? If not, how do you manage "dynamic columns and dynamic sort-by" requests passed in?

I think a lot of people do it especially when it comes to reporting features. There are actually two things one should do to stay on the safe side:
Parameterize all WHERE clause values
Use user input values to pick correct column/table names, don't use the user values in the sql statement at all
To elaborate on item #2, I would have a dictionary where Key is a possible user input and Value is a correponding column/table name. You can store this dictionary wherever you want: config file, database, hard code, etc. So when you process user input you just check a dictionary if the Key exists and if it does you use the Value to add a column name to your query. This way you just use user input to pick required column names but don't use the actual values in your sql statement. Besides, you might not want to expose all columns. With a predefined dictionary you can easily control the list of available columns for a user.
Hope it helps!

I've done similar to what Maksym suggests. In my case, keys were pulled directly from the database system tables (after scrubbing the user request a bit for syntactic hacks and permissions).
The following query takes care of some minor injection issues through the natural way SQL handles the LIKE condition. This doesn't go as far as handling permissions on each field (as some fields are forbidden based on the log-in) but it provides a very basic way to retrieve these fields dynamically.
CREATE PROC get_allowed_column_names
#input VARCHAR(MAX)
AS BEGIN
SELECT
columns.name AS allowed_column_name
FROM
syscolumns AS columns,
sysobjects AS tables
WHERE
columns.id = tables.id AND
tables.name = 'Categories' AND
#input LIKE '%' + columns.name + '%'
END
GO
-- The following only returns "Picture"
EXEC get_allowed_column_names 'Category_,Cat%,Picture'
GO
-- The following returns both "CategoryID and Picture"
EXEC get_allowed_column_names 'CategoryID, Picture'
GO

Related

SSIS comma delimited string in where clause

I am trying to see if there is an easy answer for this. I have done something similar using multiple pick dropdown parameters in SSRS but this appears to be different.
My scenario is this, so maybe there is an even better answer.
I have a production server that I do not want to make any changes to including temp tables or functions. The production server has a table of clients with about 1600 records. I have set up an SSIS package that will allow transfer of data from production to dev based on a clientid. So my sources would have a query similar to Select Field From Table Where ClientId = ?
This works fine. Now I want to load more than one client, based an data in the clients table. It may be Select ClientId From Clients where Field = A and returns multiple ClientIds.
I am able to populate a comma delimited list from an execute sql task to a SSIS variable, so it maybe 1,4,8.
If I change my source query to use ClientId in (?) I get a conversion error.
I have looked at many posts that advocate a temp table or a function which I want to avoid. Select IN using varchar string with comma delimited values
I have contemplated building the entire sql statement into a variable but this don't seem like the right path as I have many tables to query and transfer where using ClientId = ? works well without having to build each individual SQL statement to a variable.
Is there an easy fix I am missing? I will turn my research now to try to find out how I did this in SSRS but I thought that I should try a post here to see if someone has accomplished this before.
I appreciate any info on this, thank you.
EDIT: Key note is that the column on clients is on the dev server, so I cannot just use a select in the where clause as the column does not exist on the production server.
EDIT: I did not mention that I am specifically looking at OLEDB sources mapping a parameter to ? in the sql statement.
EDIT: Narrowing down on this but having trouble relating SSRS and SSIS functionality. In SSRS its called a multi-value parameter in the following link the key line is
WHERE Production.ProductInventory.ProductID IN (#ProductID)
https://msdn.microsoft.com/en-us/library/dn385719(v=sql.110).aspx
This one looks good as well
https://sqlblogcasts.com/blogs/simons/archive/2007/11/22/RS-HowTo---Pass-a-multivalue-parameter-to-a-query-using-IN.aspx
I will keep researching and thank you for the help so far.
I think this sums it up best
This functionality is limited to strictly using embedded SQL.
What SSRS does is transform your SQL column IN (#value) to column IN
(#selectedvalue1,#selectedvalue2) etc.
You need to forget anything you have about the other ways of passing
lists to SQL i.e. building strings etc. and make sure you declare the
data types are correct for the value of your parameter.
You do not need to use the Join(parameters!,",") trick UNLESS
you are passing the list to a stored procedure.
In which case you then need to use some function to turn the delimited
list into a rowset as you have done.
I hope that helps
The core question is if I can get the same functionality in SSIS as in SSRS. It reminds me of macro substitution..
If you dont want to create a function, you can use the following in your t-sql statement.
Declare #ClientIds nvarchar(50) = '123,456'; --<-- Comma delimited list of Client Ids
Select Field
From Table
Where ClientId IN (
SELECT CAST(RTRIM(LTRIM(Split.a.value('.', 'VARCHAR(100)'))) AS INT) ClientIDs
FROM (
SELECT Cast ('<X>'
+ Replace(#ClientIds, ',', '</X><X>')
+ '</X>' AS XML) AS Data
) AS t CROSS APPLY Data.nodes ('/X') AS Split(a)
)

SQL Server : parameters for column names instead of values

This might seem like a silly question, but I'm surprised that I didn't find a clear answer to this already:
Is it possible to use SQL Server parameters for writing a query with dynamic column names (and table names), or does the input just need to be sanitized very carefully?
The situation is that tables and their column names (and amount of columns) are generated dynamically and there is no way to know beforehand to manually write a query. Since the tables & columns aren't known I can't use an ORM, so I'm resorting to manual queries. Usually I'd use parameters to fill in values to prevent SQL injection, however I'm pretty sure that this cannot be done the same way when specifying the table name and/or column names. I want to create generic queries for insert, update, upsert, and select, but I obviously don't want to open myself up to potential injection. Is there a best practices on how to accomplish this safely?
Just as an FYI - I did see this answer, but since there's no way for me to know the column / table names beforehand a case statement probably won't work for this situation.
Environment: SQL Server 2014 via ADO.NET (.NET 4.5 / C#)
There is no mechanism for passing table or column references to procedures. You just pass them as strings and then use dynamic SQL to build your queries. You do have to take precautions to ensure that your string parameters are valid.
One way to do this would be to validate that all table and column reference strings have valid names in sys.tables and sys.columns before building your T-SQL queries. Then you can be sure that they can be used safely.
You can also use literal parameters with dynamic sql when using the sp_executesql procedure. You can't use it to validate your table and column names, but it validates and prevents SQL injection with your other parameters.

Is one way of stating tables in a query more 'optimal' than another?

Edit: I'm aware that SELECT * is bad practice, but it's used here just to focus the example SQL on the table statement rather than the rest of the query. Mentally exchange it for some column names if you prefer.
Given a database server MyServer (which we are presently connected to in SSMS), with several databases MyDb1, MyDb2, MyDb3 etc and default schema dbo, are any of the following equivilant queries (they will all return exactly the same result set) more "optimal" than the others?
SELECT * FROM MyServer.MyDb1.dbo.MyTable
I was told that this method (explicitly providing the full database name including server name) treats MyServer as a linked server and causes the query to run slower. Is this true?
SELECT * FROM MyDb1.dbo.MyTable
The server name isn't required as we're already connected to it, but would this run 'faster' than the above?
USE MyDb1
GO
SELECT * FROM dbo.MyTable
State the database we're using initially. I can't imagine that this is any better than the previous for a single query, but would it be more optimal for subsequent queries on the same database (ie, if we had more SELECT statements in the same format below this)?
USE MyDb1
GO
SELECT * FROM MyTable
As above, but omitting the default schema. I don't think this makes any difference. Does it?
SQL Server will always look for the objects you sepcify within the current "Context" if you do not specify a fully qualified name.
Is one faster than the other, sure, the same as a file name on your hard drive of "This is a really long name for a file but at long as it is under 254 it is ok.txt" will take up more hard-drive (toc) space than "x.txt". Will you ever notice it, no!
As far as the "USE" keyword, this just sets the context for you, so you dont have to fully qualify object names. The "USE" keyword is NOT sql, you cannot use in in another application (like a vb/c# app) or within a stored procedure but it is like the "GO" keyword in that it tells SSMS to do something, change the context.

ADO - Can I edit results of a complex query with multiple join statements?

I'm working on a data conversion utility which can push data from one master database out to a number of different databases. The utility its self will have no knowledge of how data is kept in the destination (table structure), but I would like to provide writing a SQL statement to return data from the destination using a complex SQL query with multiple join statements. As long as the data is in a standardized format that the utility can recognize (field names) in an ADO query.
What I would like to do is then modify the live data in this ADO Query. However, since there are multiple join statements, I'm not sure if it's possible to do this. I know at least with BDE (I've never used BDE), it was very strict and you had to return all fields (*) and such. ADO I know is more flexible, but I don't know quite how flexible in this case.
Is it supposed to be possible to modify data in a TADOQuery in this manner, when the results include fields from different tables? And even if so, suppose I want to append a new record to the end (TADOQuery.Append). Would it append to two different tables?
The actual primary table I'm selecting from has a complimentary table which is joined by the same primary key field, one is a "Small" table (brief info) and the other is a "Detail" table (more info for each record in Small table). So, a typical statement would include something like this:
select ts.record_uid, ts.SomeField, td.SomeOtherField from table_small ts
join table_detail td on td.record_uid = ts.record_uid
There are also a number of other joins to records in other tables, but I'm not worried about appending to those ones. I'm only worried about appending to the "Small" and "Detail" tables - at the same time.
Is such a thing possible in an ADO Query? I'm willing to tweak and modify the SQL statement in any way necessary to make this possible. I have a bad feeling though that it's not possible.
Compatibility:
SQL Server 2000 through 2008 R2
Delphi XE2
Editing these Fields which have no influence on the joins is usually no problem.
Appending is ... you can limit the Append to one of the Tables by
procedure TForm.ADSBeforePost(DataSet: TDataSet);
begin
inherited;
TCustomADODataSet(DataSet).Properties['Unique Table'].Value := 'table_small';
end;
but without an Requery you won't get much further.
The better way will be setting Values by Procedure e.g. in BeforePost, Requery and Abort.
If your View would be persistent you would be able to use INSTEAD OF Triggers
Jerry,
I encountered the same problem on FireBird, and from experience I can tell you that it can be made(up to a small complexity) by using CachedUpdates . A very good resource is this one - http://podgoretsky.com/ftp/Docs/Delphi/D5/dg/11_cache.html. This article has the answers to all your questions.
I have abandoned the original idea of live ADO query updates, as it has become more complex than I can wrap my head around. The scope of the data push project has changed, and therefore this is no longer an issue for me, however still an interesting subject to know.
The new structure of the application consists of attaching multiple "Field Links" on various fields from the original set of data. Each of these links references the original field name and a SQL Statement which is to be executed when that field is being imported. Multiple field links can be on one single field, therefore can execute multiple statements, placing the value in various tables, etc. The end goal was an app which I can easily and repeatedly export a common dataset from an original source to any outside source with different data structures, without having to recompile the app.
However the concept of cached updates was not appealing to me, simply for the fact pointed out in the link in RBA's answer that data can be changed in the database in the mean-time. So I will instead integrate my own method of customizable data pushes.

How can I call a stored procedure without table names in HQL?

I am trying to fetch the current timestamp through a stored procedure in HQL. This means my code looks something like the following:
var currentTimestamp =
session.CreateQuery("SELECT CURRENT_TIMESTAMP()")
.UniqueResult<DateTime>();
This doesn't work. Specifically, it throws a System.NullReferenceException deep inside of the NHibernate HqlParser.cs file. I played around with this a bit, and got the following to work instead:
var currentTimestamp =
session.CreateQuery("SELECT CURRENT_TIMESTAMP() FROM Contact")
.SetMaxResults(1)
.UniqueResult<DateTime>();
Now I have the data I want, but an HQL query I don't. I want the query to represent the question I'm asking -- like my original format.
An obvious question here is "Why are you using HQL?" I know I can easily do with this session.CreateSQLQuery(...), hitting our MySQL 5.1 database directly. This is simply an example of my core problem, with the root being that I'm using custom parameter-less HQL functions throughout my code base, and I want to have integration tests that run these HQL parameter-less functions in as much isolation as possible.
My hack also has some serious assumptions baked in. It will not return a result, for example, if there are no records in the Contact table, or if the Contact table ceases to exist.
The method to retrieve CURRENT_TIMESTAMP() (or any other database function) outside of the context of a database table varies from database to database - some allow it, some do not. Oracle, for example, does not allow a select without a table, so they provide a system table with a single row which is called DUAL.
I suspect that NHibernate is implementing in HQL primarily features which are common across all database implementations, and thus does not implement a table-less select.
I can suggest three approaches:
Create a view that hides the table-less select such as 'create view dtm as select current_timestamp() as datetime'
Follow the Oracle approach and create a utility table with a single row in it that you can use as the table in a select
Create a simple stored procedure which only executes 'select current_timestamp()'

Resources