Is it possible to capture the query text of the entire RPC SELECT statement from within the view which it calls? I am developing an enterprise database in SQL Server 2014 which must support a commercial application that submits a complex SELECT statement in which the outer WHERE clause is critical. This statement also includes subqueries against the same object (in my case, a view) which do NOT include that condition. This creates a huge problem because the calling application is joining those subqueries to the filtered result on another field and thus producing ID duplication that throws errors elsewhere.
The calling application assumes it is querying a table (not a view) and it can only be configured to use a simple WHERE clause. My task is to implement a more sophisticated security model behind this rather naive application. I can't re-write the offending query but I had hoped to retrieve the missing information from the cached query plan. Here's a super-simplified psuedo-view of my proposed solution:
CREATE VIEW schema.important_data AS
WITH a AS (SELECT special_function() AS condition),
b AS (SELECT c.criteria
FROM a, lookup_table AS c
WHERE a.condition IS NULL OR c.criteria = a.condition)
SELECT d.field_1, d.field_2, d.filter_field
FROM b, underlying_table AS d
WHERE d.filter_field = b.criteria;
My "special function" reads the query plan for the RPC, extracts the WHERE condition and preemptively filters the view so it always returns only what it should. Unfortunately, query plans don't seem to be cached until after they are executed. If the RPC is made several times, my solution works perfectly -- but only on the second and subsequent calls.
I am currently using dm_exec_cached_plans, dm_exec_query_stats and dm_exec_sql_text to obtain the full text of the RPC by searching on spid, plan creation time and specific words. It would be so much better if I could somehow get the currently executing plan. I believe dm_exec_requests does that but it only returns the current statement -- which is just the definition of my function.
Extended Events look promising but unfamiliar and there is a lot to digest. I haven't found any guidance, either, on whether they are appropriate for this particular challenge, whether a notification can be obtained in real time or how to ensure that the Event Session is always running. I am pursuing this investigation now and would appreciate any suggestions or advice. Is there anything else I can do?
This turns out to be an untenable idea and I have devised a less elegant work-around by restructuring the view itself. There is a performance penalty but the downstream error is avoided. My fundamental problem is the way the client application generates its SQL statements and there is nothing I do about that -- so, users will just have to accept whatever limitations may result.
Related
The standard flow to query the result of a query looks like this.
doc
describe table your_db.your_schema.your_table_name;
select
"name"
,"type"
,*
from table(result_scan(last_query_id())) ;
Ideally, you should be able to
select
"name"
,"type"
,*
from table(result_scan(describe table your_db.your_schema.your_table_name)) ;
Is something like this possible in reality or must I keep dreaming?
It is not possible for the following reasons
The doc clearly states that the RESULT_SCAN function only accept a query ID as an input argument.
RESULT_SCAN function scans the result of a previously executed query. Therefore that query must have occurred as a different transaction in the past
It's also worth adding to Clark's answer that it's not presently possible, because presently the DESCRIBE TABLE is not run on the COMPUTE layer, but only of the SHARED SERVICES layer (meta data, query planner, etc).
Originally back in my day, there wasn't even a RESULTS_SCAN, and everything was externally parsed. So the RESULT_SCAN allows you to run a COMPUTE layer against the a prior result set, and thus we have a two step process. Now compared to other DB's where everything is a full DB object, thus everything is accessible via SQL this is frustrating.
But I feel when you know how/why it's the way it is, the steps make sense, and like many things once you get a "working pattern" you just do it. Albeit, at first it is different, and now how one might like it to be.
So I have two queries that I'm working on, one comes from an Oracle DB and the other SQL Server DB. I'm trying to use PowerBI via Power Query as the cross over between the two. Because of the size of the Oracle DB I'm having a problem with running it, so my thought is to use one query as a clause/sub-query of the other to limit the number of results.
Based on the logic of MSFT's M language I'd assume there's a way to do sub-queries of another but I've yet to figure it out. Does anyone have any ideas on how to do this?
What I have learned to do is create a connection to the two under lying data sets, but do not load them. Then you can merge them, but do not use the default Table.NestedJoin().
After the PQ generates this, change it to:
= Table.Join(dbo_SCADocument,{"VisitID"},VISIT_KEY,{"VisitID"}) .
Also remove the trailing name. The reason is , it keeps query folding alive. For some reason, Table.NestedJoin() kills query folding. Note, if there are similar fields in the two sources other than the join it will fail.
Also it brings everything from both sources, but that is easy to alter . Also you will need to turn off the function firewall as this does not allow you to join potential sensitive data with non sensitive data. This is setting your privacy level to ignore all.
I would attempt this using the Merge command. I'm more of a UI guy, so I would click the Merge button. This generates a PQL statement for Table.Join
https://msdn.microsoft.com/en-us/library/mt260788.aspx
Setting Join Kind to Inner will restrict the output to matching rows.
I say attempt as your proposed query design will likely slow your query, not improve it. I would expect PQ to run both queries against the 2 servers and download all their data, then attempt the Join in Memory.
I have a simple SELECT statement with a couple columns referenced in the WHERE clause. Normally I do these simple ones in the VB code (setup a Command object, set Command Type to text, set Command Text to the Select statement). However I'm seeing timeout problems. We've optimized just about everything we can with our tables, etc.
I'm wondering if there'd be a big performance hit just because I'm doing the query this way, versus creating a simple stored procedure with a couple params. I'm thinking maybe the inline code forces SQL to do extra work compiling, creating query plan, etc. which wouldn't occur if I used a stored procedure.
An example of the actual SQL being run:
SELECT TOP 1 * FROM MyTable WHERE Field1 = #Field1 ORDER BY ID DESC
A well formed "inline" or "ad-hoc" SQL query - if properly used with parameters - is just as good as a stored procedure.
But this is absolutely crucial: you must use properly parametrized queries! If you don't - if you concatenate together your SQL for each request - then you don't benefit from these points...
Just like with a stored procedure, upon first executing, a query execution plan must be found - and then that execution plan is cached in the plan cache - just like with a stored procedure.
That query plan is reused over and over again, if you call your inline parametrized SQL statement multiple times - and the "inline" SQL query plan is subject to the same cache eviction policies as the execution plan of a stored procedure.
Just from that point of view - if you really use properly parametrized queries - there's no performance benefit for a stored procedure.
Stored procedures have other benefits (like being a "security boundary" etc.), but just raw performance isn't one of their major plus points.
It is true that the db has to do the extra work you mention, but that should not result in a big performance hit (unless you are running the query very, very frequently..)
Use sql profiler to see what is actually getting sent to the server. Use activity monitor to see if there are other queries blocking yours.
Your query couldn't be simpler. Is Field1 indexed? As others have said, there is no performance hit associated with "ad-hoc" queries.
For where to put your queries, this is one of the oldest debates in tech. I would argue that your requests "belong" to your application. They will be versionned with your app, tested with your app and should disappear when your app disappears. Putting them anywhere other than in your app is walking into a world of pain. But for goodness sake, use .sql files, compiled as embedded resources.
Select statement which is part of form clause of any
another statement is called as inline query.
Cannot take parameters.
Not a database object
Procedure:
Can take paramters
Database object
can be used globally if same action needs to be performed.
Does cfquery becomes a prepared statement as long as there's 1 cfqueryparam? Or are there other conditions?
What happen when the ORDER BY clause or FROM clause is dynamic? Would every unique combination becomes a prepared statement?
And what happen when we're doing cfloop with INSERT, with every value cfqueryparam'ed, and invoke the cfquery with different number of iterations?
Any potential problems with too many prepared statements?
How does DB handle prepared statement? Will they be converted into something similar to store procedure?
Under what circumstances should we Not use prepared statement?
Thank you!
I can answer some parts of your question:
a query will become a preparedStatement as long as there is one <queryparam. I have in the past added a
where 1 = <cfqueryparam value="1" to queries which didn't have any dynamic parameters, in order to get them run as preparedStatements
Most DBs handle preparedStarements similarly to Stored Procedures, just held temporarily, rather than long-term, however the details are likely to be DB-specific.
Assuming you are using the drivers supplied with ColdFusion, if you turn on the 'Log Activity' checkbox in the advanced panel of the DataSource setup, then you'll get very detailed information about how CF is interacting with he DB and when it is creating a new preparedStatement and when it is re-using them. I'd recommend trying this out for yourself, as so many factors are involved (DB setup, Driver, CF version etc). If you do use the DB logging, re-start CF before running your test code, so you can see it creating the prepared statements, otherwise you'll just see it re-using statements by ID, without seeing what those statements are.
In addition, if you are asking about execution plans then there is more involved than just the number PreparedStatement's generated. It is a huge topic and very database dependent. I do not have a DBA's grasp on it, but I can answer a few of the questions about MS SQL.
What happen when the ORDER BY clause or FROM clause is dynamic? Would
every unique combination becomes a prepared statement?
The base sql is different. So you will end up with separate execution plans for each unique ORDER BY clause.
And what happen when we're doing cfloop with INSERT, with every value
cfqueryparam'ed, and invoke the cfquery with different number of
iterations?
MS SQL should reuse the same plan for all iterations because only the parameters change.
The sys.dm_exec_cached_plans view is very useful for seeing what plans are cached and how often they are reused.
SELECT p.usecounts, p.cacheobjtype, p.objtype, t.text
FROM sys.dm_exec_cached_plans p
CROSS APPLY sys.dm_exec_sql_text( p.plan_handle) t
ORDER BY p.usecounts DESC
To clear the cache first, use DBCC FLUSHPROCINDB. Obviously do not use it on a production server.
DECLARE #ID int
SET #ID = DB_ID(N'YourTestDatabaseName')
DBCC FLUSHPROCINDB( #ID )
The problem
We need to defend against a 'WAITFOR DELAY' sql injection attack in our java application.
Background
[This is long. Skip to 'Solution?' section below if you're in a rush ]
Our application mostly uses prepared statements and callable statements (stored procedures) in accessing the database.
In a few places we dynamically build-and-execute queries for selection. In this paradigm we use a criteria object to build the query depending on the user-input criteria. For example, if the user specified values for first_name and last_name, the result querying always looks something like this:
SELECT first_name,last_name FROM MEMBER WHERE first_name ='joe' AND last_name='frazier'
(In this example the user would have specified "joe" and "frazier" as his/her input values. If the user had more or less critieria we would have longer or shorter queries. We have found that this approach is easier than using prepared statements and quicker/more performant than stored procedures).
The attack
A vulnerability audit reported an sql injection failure. The attacker injected the value 'frazier WAITFOR DELAY '00:00:20' for the 'last_name' parameter, resulting in this sql:
SELECT first_name,last_name FROM MEMBER WHERE first_name ='joe' AND last_name='frazier' WAITFOR DELAY '00:00:20'
The result: the query executes successfully, but takes 20 seconds to execute. An attacker could tie up all your database connections in the db pool and effectively shut down your site.
Some observations about this 'WAITFOR DELAY' attack
I had thought that because we used Statement executeQuery(String) we would be safe from sql injection. executeQuery(String) will not execute DML or DDL (deletes or drops). And executeQuery(String) chokes on semi-colons, thus the 'Bobby Tables' paradigm will fail (i.e. user enters 'frazier; DROP TABLE member' for a parameter. See. http://xkcd.com/327/)
The 'WAITFOR' attack differs in one important respect: WAITFOR modifies the existing 'SELECT' command, and is not a separate command.
The attack only works on the 'last parameter' in the resulting query. i.e. 'WAITFOR' must occur at the very end of the sql statement
Solution, Cheap Hack, or Both?
The most obvious solution entails simply tacking "AND 1=1" onto the where clause.
The resulting sql fails immediately and foils the attacker:
SELECT first_name,last_name FROM MEMBER WHERE first_name ='joe' AND last_name='frazier' WAITFOR DELAY '00:00:20' AND 1=1
The Questions
Is this a viable solution for the WAITFOR attack?
Does it defend against other similar vulnerabilities?
I think the best option would entail using prepared statements. More work, but less vulnerable.
The correct way to handle SQL injection is to use parameterized queries. Everything else is just pissing in the wind. It might work one time, even twice, but eventually you'll get hit by that warm feeling that says "you screwed up, badly!"
Whatever you do, except parameterized queries, is going to be sub-optimal, and it will be up to you to ensure your solution doesn't have other holes that you need to patch.
Parameterized queries, on the other hand, works out of the box, and prevents all of these attacks.
SQL injection is SQL injection - there's nothing special about a WAITFOR DELAY.
There is absolutely no excuse for not using prepared statements for such a simple query in this day and age.
(Edit: Okay, not "absolutely" - but there's almost never an excuse)
I think you suggested the solution yourself: Parameterized Queries.
How did you find that your dynamically built query is quicker than using a stored procedure? In general it is often the opposite.
To answer all your questions:
Is this a viable solution for the WAITFOR attack?
No. Just add -- to the attack string and it will ignore your fix.
Does it defend against other similar vulnerabilities?
No. See above.
I think the best option would entail using prepared statements. More work, but less vulnerable.
Yes. You don't fix SQL injection yourself. You use what's already existing and you use it right, that is, by parametrizing any dynamic part of your query.
Another lesser solution is to escape any string that is going to get inserted in your query, however, you will forget one one day and you only need one to get attacked.
Everyone else has nailed this (parameterize!) but just to touch on a couple of points here:
Is this a viable solution for the WAITFOR attack?
Does it defend against other similar vulnerabilities?
No, it doesn't. The WAITFOR trick is most likely just being used for 'sniffing' for the vulnerability; once they've found the vulnerable page, there's a lot more they can do without DDL or (the non-SELECT parts of) DML. For example, think about if they passed the following as last_name
' UNION ALL SELECT username, password FROM adminusers WHERE 'A'='A
Even after you add the AND 1 = 1, you're still hosed. In most databases, there's a lot of malicious things you can do with just SELECT access...
How about follow xkcd and sanitize the input. You could check for reserved words in general and for WAITFOR in particular.