OUTPUT clause for Stored Procedure vs Table-Valued Function - sql-server

I'm studying for the MCTS 70-433 "Database Design" cert, and in the text that I'm studying, one of the self-tests has this question.
You have a stored procedure named
Get_NewProducts. You wish to insert
the results of this stored procedure
into the Production.Product table and
output the INSERTED.* values using the
OUTPUT clause. What is the best way
to do this?
There are four possible answers. The first three choices are all variations of an "INSERT...OUTPUT...EXECUTE Get_NewProducts" statement. The fourth choice, D, simply says "Rewrite the stored procedure as a table-valued function.".
D is the correct answer. I don't quite understand why, and there is nothing in the text that explains it. Anyone have any insights?

Well, from msdn:
" The OUTPUT clause is not supported in DML statements that reference local partitioned views, distributed partitioned views, or remote tables, or INSERT statements that contain an execute_statement."

My knee-jerk reaction to this (I hit it again a few days ago) is:
Stored procedures can be and often are nested. Procedure A calls B, which calls C, and so on.
The code called by an INSERT...EXECUTE... statement cannot itself contain or reference an INSERT...EXECUTE... statement. If you put one in, you cannot then "embed" this procedure in a later INSERT...EXECUTE...
This may seem trivial, and it generally is, at least until you hit it during a refactoring project. Once bitten, twice shy. (And it's bit me a number of times.)
There are a number of style and appearance reasons as well, but they're kind of superficial. There probably is a serious technical reason, perhaps having to do with recompiles or query execution plans; if so, hopefully someone else will post them.

Just one reason their "right answer" is not right: TVFs have issues with error checking and reporting.
It's a really odd question/answer because D doesn't even seem to be a possibility given the question.

I don't know a 'correct' answer, but I guess that the thinking of the author is that 70-433 Database Development is development and design oriented exam, as opposed to say one of the 'data access' exams like 70-442. During the design phase you should be able to spot faults in the existing system and propose better solutions. The author considers that stored procedure that needs to have its output inserted into a table is better off rewritten as a TVF. You'll find both cons and pros as whether a TVF is better than a proc (insert exec nesting pro, bad error handling con just to start with).
I took some of these exams myself and I found that the exam preparation material and the exams themselves are not always the absolute ultimate reference on their subject. On the large they are correct and good value, but they have problems here and then and I found at least some questionable recommendations and even plain wrong ones. And on the topics that I found to be wrong I actually am the ultimate reference on the subject, they were covering code that I wrote on features I designed...
My advice is to get a feel of what is 'expected' answer and be prepared for it during the actual exam. Given your flair points and your answer I've seen, you are already above the exam level, so just go through the hoops, earn your exam badge and move on.

Related

Stored procedure with multiple selects - interaction with client tool?

Suppose I have a stored procedure as follows:
create procedure p_x
as
begin
select 'a','b','c'
select 'c','d','e'
select 'e','f','g'
end
go
This is of course not the real code, but it illustrates enough to be able to ask my questions.
I'm looking for the best performance and the best practices to deal with it.
How will the client tool (eg Informatica Data Quality) calling this procedure react?
Will it receive 3 separate results, just the last query result or all results at once?
Will each separate query be send to the client directly (and will the procedure halt till completed)? or is this done after the procedure finished?
Is it good practice to work this way? I was looking for the exchange of an OUTPUT table type parameter, but this doesn't seem possible if I'm correct (based on other stories)(just as input)
Is there a performance impact in this way? And if so what is the way to do this as efficient as possible (e.g. to just send one result back to the client)
You would be better served by posting your question to the Informatica forums. They should be able to answer your questions precisely and accurately. But I'll give it a go.
How will the tool react? Don't know, but often tools that support using stored procedures as a datasource will assume and will consume a single (and the first) resultset. Any others will be ignored. Go ask in their forums.
Will it receive 3 ...? Roughly the same question and answer as the first.
Will each separate query ...? Your procedure produces three resultsets. How the client consumes them is, again, something you should ask in their forums. The procedure itself will not "halt" waiting for the client to do anything.
Is it good practice...? Not in my opinion. Nor is posting a complete nonsense procedure a useful tool for discussing the pros/cons of this approach. Can it be a useful thing to do? Likely. But it is not often done IME. In addition, you are dealing with a tool with which you are not familiar. The simpler you keep things the better you are off in the long run regardless of your tools.
A procedure is a unit of work and should do one "thing". If it produces multiple resultsets, one can argue that it ceases to do a single thing since, logically, each resultset represents a set of different (even if related) things. And typically one would expect to see some relationship among the resultsets. If there are no relationships, then the resultsets are obviously different things which violates the idea of a procedure. You might want to review the topic of coupling and cohesion. But I think I see a bigger issue - which I'll address with the next item.
Is there a performance impact ...? This can't really be answered. Performance is always, ALWAYS specific to a particular situation (query, schema, etc). Based on that last sentence, I think you have not made the adjustment to thinking in terms of sets - something that is critical to writing efficient sql. Rather, I'll guess that you are thinking in terms of a loop which includes a select statement and each iteration will produce a set of (perhaps 1 but who knows) rows. If you think you have the "option" to produce just one resultset of 3 rows vs. 3 resultsets of 1 row, then you are most likely stuck in RBAR land. Regardless, this can't really be answered. It is also a question for the Informatica people.

Does writing the full path in SELECT statements enhance performance SQL?

Is the performance of queries impacted when writing the full query path. And what is the best practice when writing such queries ? Assuming the script is way more complex and longer than the following.
Example #1:
SELECT Databasename.Tablename.NameofColumn
FROM databasename.tablename
Example #2:
SELECT NameofColumn
FROM tablename
OR using aliases - example #3:
SELECT t.NameofColumn
FROM tablename t
There are a number of considerations when you're writing queries that are going to be released into a production environment, and how and when to use fully qualified names is one of those considerations.
A fully qualified table name has four parts: [Server].[Database].[Schema].[Table]. You missed Schema in your examples above, but it's actually the one that makes the most difference. SQL Server will allow you to have objects with the same name in different schemas; so you could have dbo.myTable and staging.myTable in the same database. SQL Server doesn't care, but your query probably does.
Even if there aren't identically named objects, adding the schema still helps the engine find the object you're querying a little bit faster, so there's your performance boost, albeit a small one, and only in the compile/execution plan step.
Besides performance, though, you need to worry about readability for your own sake when you need to revisit your code, and conventionality for when somebody else needs to look at your code. Conventions vary slightly from shop to shop, but here are a couple of generalities that will at least make your code easier to look at, say, on Stack Overflow.
1. Use table aliases.
This gets almost unreadable after about three column names:
SELECT
SchemaName.Tablename.NameofColumn1,
SchemaName.Tablename.NameofColumn2,
SchemaName.Tablename.NameofColumn3
FROM SchemaName.TableName
This is just easier on the brain:
SELECT
tn.NameofColumn1,
tn.NameofColumn2,
tn.NameofColumn3
FROM SchemaName.TableName as tn
2. Put the alias in front of every column reference, everywhere in your query.
There should never be any ambiguity about which table a particular column is coming from, either for you, when you're trying to troubleshoot it at 3:00 AM, or for anyone else, when you're sipping margaritas on the beach and your buddy's on call for you.
3. Make your aliases meaningful.
Again, it's about keeping things straight in your head later on. Aaron Bertrand wrote the definitive post on it almost ten years ago now.
4. Include the database name in the FROM clause if you want, but...*
If you have to restore a database using a different name, your procedures won't run. In my shop, we prefer a USE statement at the top of each proc. Fewer places to change a name if need be.
tl;dr
Your example #3 is pretty close. Just add the table schema to the FROM clause.

SQLite database questions, problems with design (indexing/multiple fields)

I use stackoverflow a lot, but this is my first question here, so if i'm doing anything wrong just let me know. I'm not a programmer (I just do programming for my own needs) so I'm open to tutorial suggestions etc. I won't be offended if you just give me something to read and find the answers myself.
OK, to the point - I'm trying to write simple application to track my personal expenses and I have a problem with database design. I'm using VStudio to create the database (SQLite). I attached a diagram with my design and I have some questions.
My SQLite diagram
I don't know exactly how to design "Transactions" table. Fields like Date, Payment Type etc. seems to be easy enough but the idea was to store in this table information about transactions so I need to store multiple products there. I've read about it and created table "Transactions_Products" that will help with that. My problem is : where do I put quantity of products in the transaction? I can't think of a place to put it. I tried to find similar databases but couldn't find anything.
Second thing. I've read about indexing a lot, but I still can't grasp the idea. I don't know when to use it. Should I use it only on fields that I will be "querying" a lot?
Last one - is it better for such a small application just for myself to store my account balance in a separate table or should I just calculate it every time?
As I said, I don't need answers like: "do this, do that". If you just give me some good tutorials/articles I think I can find answers on my own, but I couldn't find it. Maybe I'm searching for it wrong.
Thank you in advance for any information.
where do I put quantity of products in the transaction?
Transactions is a bad table name as it's vague and has multiple meanings. Consider "payments", "purchase invoices", etc. See https://dba.stackexchange.com/questions/12991/ready-to-use-database-models-example/23831#23831 for some existing patterns.
Should I use [indexes] only on fields that I will be "querying" a lot?
There's no free lunch. Indexes take space, and can slow down inserts. Start with indexes on your primary keys (which is the default for SQLite), measure what is slow (looking at query plans) and add indexes if they help and if you have room.
is it better for such a small application just for myself to store my account balance in a separate table or should I just calculate it every time?
For an operational/transactional database like you describe, avoid storing calculated values. SQLite can count numbers quickly :)
Premature optimization is premature. Make it work first with full normalization. If you have performance problems, analyze what is really causing the slow-down and go from there.

Is there a way to store array like data in a column in SQL Server?

I have a table that holds Test Questions. Each row of the table contains details of a question for a test for a particular user.
The user is presented with 3-5 possible answers and I would like to store details of the answers that have been checked in the row. I don't really want to add new rows for every answer as this would create a huge number of rows.
Is there a way that I can store something like an array of answers in a column in SQL Server? Presently I am storing the data as a JSON string but I remember that Oracle had some way to store array data and I am wondering if SQL Server has the same.
Generally denormallizing is not a good idea. It is rarely a good idea idea. However, it is sometimes necessary for performance reasons. So, if not too slow, don't even consider it.
If you make a secondary answers table in your case with the TestQuestionID (or whatever you call the answers for a single question) to be the clustered index, it won't be much of a performance difference at all compared to a denormalized table.
If I were denormalizing your descriibed table I would probably just create 5 columns in the table, You could also use an xml field, but all you are storing is 5 answers, so I would not use xml in this simple case.
Since you are asking this question, you are not really a seasoned professional (we all start as novices) and you should consult the local sql expert before you denormalize.
ADDED CAVEAT,
Since you accepted this answer, you really need to understand for certain that denormalizing is almost always the wrong thing to do. That is why everyone, including me, was trying to tell you. Don't do this without talking to your DBA -- if you don't have a local DBA (unfortunately all too common) take the collective advice, and don't denormalize. I can think of only 1 time n my career that I think denormalizing was the correct solution. And I have bitten by the bad design (forced on me) by innappropriate denormlazation on many occasions.

Query equivalence evaluation

My question is rooted in T-SQL, SQL Server environment, but its scope is not confined to this technology. I am working on a database with a quite complex business logic, with existing views, stored procedures and new ones to be designed. By means of comparisons of different queries or part of them, I have a strong feeling that there are sections performing the same job with a different arrangement, but of course to refactor the whole mess I need something more that a feeling; so I am trying to determine a way to demonstrate that two statements are equivalent.
An obvious but weak response could be to ascertain that the two queries A and B produce the same recordset: if A is a subset of B and B is a subset of A, they are the same recordset; but I am not sure that this is a good idea because, of course, a recordset is not a query, the results could depend on data and specific parameter values. My questions is: there is a method to prove the equivalence of two different queries? I would say yes, because the optimization performed by the database should works on this. Someone could provide me some pointer to documentation or books digging in this? If there is no general method to prove the equivalence, there is some smart approach based on regression testing performed according to some effective heuristic that does the job?
Edited later: in case, reverse engineering the queries (by hand?) by means of relational algebra, could be a superior method to assess the query equivalence instead of using other queries and / or the computer? There are automated tools helping in performing this "reverse engineering", in case?
Thanks a lot for helping
You probably can't prove it, since the problem seems to be NP-complete; check this SO question on query equivalence (that one is about Oracle, but there are a couple of answers / links that should be relevant for you).
You can check the execution plans of the two queries. If they are the same, you have your answer!
Only by the execution plan you can check it. Apart from that i dont think that there is any way to prove this thing.
You'll need to implement some "canonical query plan" generator for this (an "optimal query plan" as generated by the DBMS can be nondeterministic). In most cases, using alphabetical ordering of terms and tables as a tie-breaker will get you there.
I doubt you are going to be able to formally proof or disprove this but my take on this would be to
identify all use cases
identify all boundary values
identify all parameters
and derive a test plan from that. It would require you to
create testdata for each case
run both queries against that data
compare the results
If you don't find any differences after testing, you can be reasonably assured that both statements are equivallent.

Resources