simple problem, but perhaps no simple solution, at least I can't think of one of the top of my head but then I'm not the best at finding the best solutions.
I have a stored proc, this stored proc does (in a basic form) a select on a table, envision this:
SELECT * FROM myTable
okay, simple enough, except the table name it needs to search on isn't known, so we ended up with something pretty similiar to this:
-- Just to give some context to the variables I'll be using
DECLARE #metaInfoID AS INT
SET #metaInfoID = 1
DECLARE #metaInfoTable AS VARCHAR(200)
SELECT #metaInfoTable = MetaInfoTableName FROM MetaInfos WHERE MetaInfoID = #MetaInfoID
DECLARE #sql AS VARCHAR(200)
SET #sql = 'SELECT * FROM ' + #metaInfoTable
EXEC #sql
So, I, recognize this is ultimately bad, and can see immediately where I can perform a sql injection attack. So, the question is, is there a way to achieve the same results without the construction of the dynamic sql? or am I going to have to be super, super careful in my client code?
You have to use dynamic sql if you don't know the table name up front. But yes, you should validate the value before attempting to use it in an SQL statement.
e.g.
IF EXISTS(SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME=#metaInfoTable)
BEGIN
-- Execute the SELECT * FROM #metaInfoTable dynamic sql
END
This will make sure a table with that name exists. There is an overhead to doing this obviously as you're querying INFORMATION_SCHEMA. You could instead validate the #metaInfoTable contains only certain characters:
-- only run dynamic sql if table name value contains 0-9,a-z,A-Z, underscores or spaces (enclose table name in square brackets, in case it does contain spaces)
IF NOT #metaInfoTable LIKE '%^[0-9a-zA-Z_ ]%'
BEGIN
-- Execute the SELECT * FROM #metaInfoTable dynamic sql
END
Given the constraints described, I'd suggest 2 ways, with slight variations in performance an architecture.
Choose At the Client & Re-Architect
I'd suggest that you should consider a small re-architecture as much as possible to force the caller/client to decide which table to get its data from. It's a code smell to hold table names in another table.
I am taking an assumption here that #MetaInfoID is being passed from a webapp, data access block, etc. That's where the logic of which table to perform the SELECT on should be housed. I'd say that the client should know which stored procedure (GetCustomers or GetProducts) to call based on that #MetaInfoID. Create new method in your DAL like GetCustomersMetaInfo() and GetProductsMetaInfo() and GetInvoicesMetaInfo() which call into their appropriate sprocs (with no dynamic SQL needed, and no maintenance of a meta table in the DB).
Perhaps try to re-architect the system a little bit.
In SQL Server
If you absolutely have to do this lookup in the DB, and depending on the number of tables that you have, you could perform a handful of IF statements (as many as needed) like:
IF #MetaInfoID = 1
SELECT * FROM Customers
IF #MetaInfoID =2
SELECT * FROM Products
-- etc
That would probably become to be a nightmare to maintain.
Perhaps you could write a stored procedure for each MetaInfo. In this way, you gain the advantage of pre-compilation, and no SQL injection can occur here. (imagine if someone sabotaged the MetaInfoTableName column)
IF #MetaInfoID = 1
EXEC GetAllCustomers
IF #MetaInfoID = 2
EXEC GetAllProducts
Related
My desired end result is to simply be able to SELECT from a Stored Procedure. I've searched the Internet and unfortunately the Internet said this can't be done and that you first need to create a Temp Table to store the data. My problem is that you must first define the columns in the Temp Table before Executing the STORED Procedure. This is just time consuming. I simply want to take the data from the stored procedure and just stick it into a Temp Table.
What is the FASTEST route to achieve this from a coding perspective? To put it simply it's time consuming to first have to lookup the returned fields from a Stored Procedure and then write them all out.
Is there some sort of tool that can just build the CREATE Table Statement based on the Stored Procedure? See screenshot for clarification.
Most of the Stored Procedures I'm dealing with have 50+ fields. I don't look forward to defining each of these fields manually.
Here is good SO Post that got me this far but not what I was hoping. This still takes too much time. What are experienced SQL Server guys doing? I've only just recently made the jump from Oracle to SQL Server and I see that Temp Tables are a big deal in SQL Server from what I can tell.
You have several options to ease your task. However, these won't be fully automatic. Be aware that these won't work if there's dynamic sql in the procedure's code. You might be able to format the result from the functions to increase the automation allowing you to copy and paste easily.
SELECT * FROM sys.dm_exec_describe_first_result_set_for_object(OBJECT_ID('report.MyStoredProcedureWithAnyColumns'), 0) ;
SELECT * FROM sys.dm_exec_describe_first_result_set(N'EXEC report.MyStoredProcedureWithAnyColumns', null, 0) ;
EXEC sp_describe_first_result_set #tsql = N'EXEC report.MyStoredProcedureWithAnyColumns';
GO
If you don't mind ##temp table and some dynamic SQL
NOTE: As Luis Cazares correctly pointed out... the ##temp runs the risk of collision due to concurrency concerns
Example
Declare #SQL varchar(max) = 'Exec [dbo].[prc-App-Lottery-Search] ''8117'''
Declare #temp varchar(500) = '##myTempTable'
Set #SQL = '
If Object_ID(''tempdb..'+#temp+''') Is Not NULL Drop Table '+#temp+';
Create Table '+#temp+' ('+stuff((Select concat(',',quotename(Name),' ',system_type_name)
From sys.dm_exec_describe_first_result_set(#SQL,null,null ) A
Order By column_ordinal
For XML Path ('')),1,1,'') +')
Insert '+#temp+' '+#SQL+'
'
Exec(#SQL)
Select * from ##myTempTable
The application, I have been currently working with has different schema names for its tables, for example Table1 can have multiple existence say A.Table1 and B.Table1. All my stored procedures are stored under dbo. I'm writing the below stored procedures using dynamic SQL. I'm currently using SQL Server 2008 R2 and soon it will be migrated to SQL Server 2012.
create procedure dbo.usp_GetDataFromTable1
#schemaname varchar(100),
#userid bigint
as
begin
declare #sql nvarchar(4000)
set #sql='select a.EmailID from '+#schemaname+'.Table1 a where a.ID=#user_id';
exec sp_executesql #sql, N'#user_id bigint', #user_id=#userid
end
Now my questions are,
1. Is this type of approach affects the performance of my stored procedure?
2. If performance is affected, then how to write procedures for this kind of scenario?
The best way around this would be a redesign, if at all possible.
You can even implement this retrospectively by adding a new column to replace the schema, for example: Profile, then merge all tables from each schema into one in a single schema (e.g. dbo).
Then your procedure would appear as follows:
create procedure dbo.usp_GetDataFromTable1
#profile int,
#userid bigint
as
begin
select a.EmailID from dbo.Table1 a
where a.ID = #user_id
and a.Profile = #profile
end
I have used an int for the profile column, but if you use a varchar you could even keep your schema name for the profile value, if that helps to make things clearer.
I would look at a provisioning approach, where you dynamically create the tables and stored procedures as part of some up-front process. I'm not 100% sure of your scenario, but perhaps this could be when you add a new user. Then, you can call these SP's by convention in the application.
For example, new user creation calls an SP which creates c.Table and c.GetDetails SP.
then in the app you can call c.GetDetails based on "c" being a property of the user definition.
This gets you around any security concerns from using dynamic SQL. It's still dynamic, but is built once up front.
Dynamic schema and same table structure is quite unusual, but you can still obtain what you want using something like this:
declare #sql nvarchar(4000)
declare #schemaName VARCHAR(20) = 'schema'
declare #tableName VARCHAR(20) = 'Table'
-- this will fail, as the whole string will be 'quoted' within [..]
-- declare #tableName VARCHAR(20) = 'Category; DELETE FROM TABLE x;'
set #sql='select * from ' + QUOTENAME(#schemaName) + '.' + QUOTENAME(#tableName)
PRINT #sql
-- #user_id is not used here, but it can if the query needs it
exec sp_executesql #sql, N'#user_id bigint', #user_id=0
So, QUOTENAME should keep on the safe side regarding SQL injection.
1. Performance - dynamic SQL cannot benefit from some performance improvements (I think procedure associated statistics or something similar), so there is a performance risk.
However, for simple things that run on rather small amount of data (tens of millions at most) and for data that is not heavily changes (inserts and deletes), I don't think you will have noticeable problems.
2. Alternative -bukko has suggested a solution. Since all tables have the same structure, they can be merged. If it becomes huge, good indexing and partitioning should be able to reduce query execution times.
There is a work around for this if you know what schemas you are going to be using. You stated here that schema name is created on signup, we use this approach on login. I have a view which I add or remove unions from on session startup/dispose. Example below.
CREATE VIEW [engine].[vw_Preferences]
AS
SELECT TOP (0) CAST (NULL AS NVARCHAR (255)) AS SessionID,
CAST (NULL AS UNIQUEIDENTIFIER) AS [PreferenceGUID],
CAST (NULL AS NVARCHAR (MAX)) AS [Value]
UNION ALL SELECT 'ZZZ_7756404F411B46138371B45FB3EA6ADB', * FROM ZZZ_7756404F411B46138371B45FB3EA6ADB.Preferences
UNION ALL SELECT 'ZZZ_CE67D221C4634DC39664975494DB53B2', * FROM ZZZ_CE67D221C4634DC39664975494DB53B2.Preferences
UNION ALL SELECT 'ZZZ_5D6FB09228D941AC9ECD6C7AC47F6779', * FROM ZZZ_5D6FB09228D941AC9ECD6C7AC47F6779.Preferences
UNION ALL SELECT 'ZZZ_5F76B619894243EB919B87A1E4408D0C', * FROM ZZZ_5F76B619894243EB919B87A1E4408D0C.Preferences
UNION ALL SELECT 'ZZZ_A7C5ED1CFBC843E9AD72281702FCC2B4', * FROM ZZZ_A7C5ED1CFBC843E9AD72281702FCC2B4.Preferences
The first select top 0 row is a fall back so I always have a default definition, and a static table definition. You can select from the view and filter by a session id with
SELECT PreferenceGUID, Value
FROM engine.vw_Preferences
WHERE SessionID = 'ZZZ_5D6FB09228D941AC9ECD6C7AC47F6779';
The interesting part here though is how the execution plan is generated when you have static values inside a view. the unions that would not produce results are not evaluated by the code, leaving a basic execution plan without any joins or unions...
You can test this, it and it is just as efficient as reading directly from the table (to within a margin of error so minor nobody would care). It is even possible to replace the write back processes by using "instead" triggers and then building dynamic sql in the background. The dynamic sql is less efficient on writes but it means you can update any table via the view, usually only possible with a single table view.
Dynamic Sql usually effects both performance and security, most of the times for the worst. However, since you can't parameterize identifiers, this is probably the only way for you unless you are willing to duplicate your stored procedures for each schema:
create procedure dbo.usp_GetDataFromTable1
#schemaname varchar(100),
#userid bigint
as
begin
if #schemaname = 'a'
begin
select EmailID from a.Table1 where ID = #user_id
end
else if schemaname = 'b'
begin
select EmailID from b.Table1 where ID = #user_id
end
end
The only reason I can think of for doing this is satisfying multiple tenants. You're close but the approach you are taking is wrong.
There are 3 solutions for multi-tenancy which I'm aware of: Database per tenant, single database schema per tenant, or single database single schema (aka, tenant by row).
Two of these have already been mentioned by other users here. The one that hasn't really been detailed is schema per tenant which is what it looks like you fall under. For this approach you need to change the way you see the database. The database at this point is just a container for schemas. Each schema can have their own design, stored procs, triggers, queues, functions, etc. The main goal is data isolation. You don't want tenant A seeing tenant Bs stuff. The advantage of the schema per tenant approach is you can be more flexible with tenant specific database changes. It also allows you to scale easier than a database per tenant approach.
Answer: Instead of writing dynamic SQL to take into account the schema using the DBO user you should instead create the same stored proc for each schema (create procedure example: schema_name.stored_proc_name). In order to run the stored proc for a schema you'll need to impersonate a user that is tied to the schema in question. It would look something like this:
execute as user = 'tenantA'
exec sp_testing
revert --revert will take us back to the original user, most likely DBO in your case.
Data collation across all tenants is a little harder. The only solution that I'm aware of is to run using the DBO user and "union all" the results across all schemas separately, kind of tedious if you have a ton of schemas.
Is there a way to know if the data in a SQL Server 2008 R2 table has changed since the last time you used it? I would like to know of any type of change -- whether a new record has been inserted or an existing one has been modified or deleted.
I am not interested in what the particular change might have been. I am only interested in a boolean value that indicates whether or not the table data has been changed.
Finally, I want a simple solution that does not involve writing a trigger for each CRUD operation and then have that trigger update some other log table.
I have C# program that is meant to insert a large amount of initial data into some database tables. This is a one off operation that is supposed to happen only once, or rarely ever again if ever, in the life of the application. However, during development and testing, though, we use this program a lot.
Currently, with about the 10 tables that it inserts data into, each having about 21,000 rows per table, the program takes about 45 seconds to run. This isn't really a huge problem as this is a one-off operation that is anyway going to be done internally before shipping the product to the customer.
Still, I would like to minimize this time. So, I want to not insert data into a table if there has been no change in the table data since my program last used it.
My colleague told me that I could use the CHECKSUM_AGG function in T-SQL. My question(s) are:
1) If I compute the CHECKSUM_AGG(Cast(NumericPrimaryKeyIdColumn AS int)), then the checksum only changes if a new row has been added or an existing one deleted, right? If someone has only modified values of other columns of an existing row in the table, that will have no impact on the checksum aggregate of the ID column, right? Or will it?
2) Is there another way I can solve the problem of knowing whether table data has changed since the last time my program used it?
This is very close to what I already had in mind and what #user3003007 mentioned.
One way I am thinking of is to take a CHECKSUM(*) or CHECKSUM(Columns, I, Am, Interested, In) for each such table and then do an aggregate checksum on the checksum of each row, like so:
SELECT CHECKSUM_AGG(CAST(CHECKSUM(*) as int)) FROM TableName;
This is still not a reliable method as CHECKSUM does not work on some data types. So, if I have my column of type text or ntext, the CHECKSUM will fail.
Fortunately for me, I do not have such data types in the list of columns I am interested in, so this works for me.
Have you investigated Change Data Capture?
You can use a combination of hashing and checksum_agg. The below will work as long as the the string values do not overflow the HASHBYTES function. It works by converting all of the columns to strings, concatenating those, hashing the concatenated string, turning the hash into an integer, placing all of those values into a temp table, and then running checksum_agg on the temp table. Could easily be adapted to iterate across all real tables
Edit: Combining MD5 and checksum_agg looks like it works at least for somewhat narrow tables:
declare #tablename sysname
set #tablename = 'MyTableName'
declare #sql varchar(max)
set #sql = 'select convert(int,HASHBYTES(''MD5'','''''
declare c cursor for
select column_name
from INFORMATION_SCHEMA.COLUMNS
where table_name = #tablename
open c
declare #cname sysname
fetch next from c into #cname
while ##FETCH_STATUS = 0
begin
set #sql = #sql + '+ coalesce(convert(varchar,' + #cname + '),'')'
fetch next from c into #cname
end
close c
deallocate c
set #sql = #sql + ')) as CheckSumVal
into ##myresults from ' + #tablename
print #sql
exec(#sql)
select CHECKSUM_AGG(CheckSumVal) from ##myresults
drop table ##myresults
How do you know that the change was made by you or that the change is relevant to your needs? If you're not going to do it properly (delete & re-insert or merge) then the whole thing sounds futile to me.
In any case, if you spend only an hour researching, implementing and testing your change, you'd have to run it 80 times (and sit and watch it) before you've broken even on your time. So why bother?
Add Extra column like last_updated default getdate()
Add Extra column int type .
Declare enum (enable Flag attribute option to perform bit wise
operation ).
Then you can apply checksum on this column.
No datatype problem .
An easy way to check this is to use the system DMVs to check the index usage stats, the first index on the table (id 1) is either the heap or the clustered index of the table itself and so can be used for checking when the last update occurred:
SELECT DB_NAME(database_id) AS [database_name] ,
OBJECT_NAME([object_id], [database_id]) AS [index_name] ,
[user_seeks] ,
[user_scans] ,
[user_lookups] ,
[user_updates] ,
[last_user_seek] ,
[last_user_scan] ,
[last_user_lookup] ,
[last_user_update]
FROM sys.dm_db_index_usage_stats
WHERE [index_id] = 1
From this, you can see the last time that the table was updated as well as how many updates there have been (I have left in the seeks and scans etc just in case you're interested).
It's worth taking note that this data does not persist after a reboot, but it's pretty simple to load it into a permanent table every now and then in order to make the data permanent.
I am trying to create a generic update procedure. The point of this procedure is that we want to be able to track everything that happens in a table. If a recordis updated, we need to be able to know who changed that record, what it was originally, what it is after the change and when the change occurred. We only do this on our most important tables where accountability is a must.
Right now, we do this through a combination of web server programming and SQL Server commands.
I need to take what we currently have, and make a SQL only version.
So, here are the requirements of what I need:
The original sp is called UpdateWithHistory. Right now, it takes 4 parameters all varchar (or it can be nvarchar, doesn't matter). They are the table name, the primary key field, primary key value and a comma delimited list of fields and values in the format field='value',field1='value1'...etc.
In the background, we have a mapping table that we use to map the string table names to actual tables.
In the stored procedure, I have tried various combinations of OPENROWSET, exec(), select into, xml, and other methods. None seem to work.
So basically, I have to be able to dynamically generate a simple select statement (no joins or other complicated select stuff) from the 4 supplied parameters, then store the results of that query in a table. Since it is dynamic, I don't know the number of fields being queried, or what data types they will be.
I tried select into since that will automatically create a table with the appropriate fields and data types, but it doesn't work in conjunction with the exec command. I have also tried
exec sp_executeSQL #SQL, N'#output xml output', #resultXML output
Where #resultXML is XML datatype and #SQL is the sql command. #resultXML always ends up as null, no matter what I do. I also tried the xml route because I know that "FOR XML Path" always returns one column, but I can't use that in an insert into statement....
That statement output will be used to determine the original values before the update.
I figure once I get past this hurdle the rest will be a piece of cake. Anyone got any ideas?
So here is code for something that I finally got to work, although I don't want to use global tables, so I would gladly accept a different answer...
DECLARE #curRecordString varchar(max) = 'SELECT * into ##TEMP_TABLE FROM SOMEDB.dbo.' + #tbl + ' WHERE ' + #prikey + ' = ''' + #prival + ''' '
exec(#curRecordString)
Basically, as stated before, I need to dynamically build a sql query, then store the result of running the query so that I can access it later. I would prefer to store it as XML datatype, since I will later be using XQuery to parse and compare nodes. In the code above, I am using a global temp table (not ideal, I know) to store the result of the query so that the rest of my procedure can access the data.
Like I said, I don't like this approach but hopefully someone else can come up with something better that will allow me to dynamically build a SQL query, run it, store the results so that I can access the results later in the stored procedure.
This is most definitely a hack, but...
DECLARE #s VARCHAR(MAX)
SET #s = 'SELECT (SELECT 1 as splat FOR XML PATH) a'
CREATE TABLE #save (x XML)
INSERT INTO #save
( x )
EXEC (#s)
SELECT * FROM #save s
DROP TABLE #save
I want to write a stored proc which will use a parameter, which will be the table name.
E.g:
#tablename << Parameter
SELECT * FROM #tablename
How is this possible?
I wrote this:
set ANSI_NULLS ON
set QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[GetAllInterviewQuestions]
#Alias varchar = null
AS
BEGIN
Exec('Select * FROM Table as ' #Alias)
END
But it says incorrect syntax near #Alias.
Well, firstly you've omitted the '+' from your string. This way of doing things is far from ideal, but you can do
DECLARE #SQL nvarchar(max)
SELECT #SQL = 'SELECT * FROM ' + QuoteName(#Alias)
Exec(#SQL)
I'd strongly suggest rethinking how you do this, however. Generating Dynamic SQL often leads to SQL Injection vulnerabilities as well as making it harder for SQL Server (and other DBs) to work out the best way to process your query. If you have a stored procedure that can return any table, you're really getting virtually no benefit from it being a stored procedure in the first place as it won't be able to do much in the way of optimizations, and you're largely emasculating the security benefits too.
You'll have to do it like this:
exec('select * from '+#tablename+' where...')
But make sure you fully understand the risks, like SQL injection attacks. In general, you shouldn't ever have to use something like this if the DB is well designed.
Don't you mean
Exec('SELECT * FROM ' + #tableName)
Also, the error you get is because you've forgotten a + before #Alias.
Often, having to parameterize the table name indicates you should re-think your database schema. If you are pulling interview questions from many different tables, it is probably better to create one table with a column distinguishing between the questions in whatever way the different tables would have.
Most implementations of SQL do not allow you to specify structural elements - table names, column names, order by columns, etc. - via parameters; you have to use dynamic SQL to parameterize those aspects of a query.
However, looking at the SQL, you have:
Exec('SELECT * FROM Table AS ' #Alias)
Surely, this would mean that the code will only ever select from a table called 'Table', and you would need to concatenate the #Alias with it -- and in many SQL dialects, concatenation is indicated by '||':
Exec('SELECT * FROM Table AS ' || #Alias)
This still probably doesn't do what you want - but it might not generate a syntax error when the procedure is created (but it would probably generate an error at runtime).