Combine parts of different tables [duplicate] - sql-server

I'm kind of new to writing sql and I have a question about joins. Here's an example select:
select bb.name from big_box bb, middle_box mb, little_box lb
where lb.color = 'green' and lb.parent_box = mb and mb.parent_box = bb;
So let's say that I'm looking for the names of all the big boxes that have nested somewhere inside them a little box that's green. If I understand correctly, the above syntax is another way of getting the same results that we could get by using the 'join' keyword.
Questions: is the above select statement efficient for the task it's doing? If not, what is a better way to do it? Is the statement syntactic sugar for a join or is it actually doing something else?
If you have links to any good material on the subject I'd gladly read it, but since I don't know exactly what this technique is called I'm having trouble googling it.

You are using implicit join syntax. This is equivalent to using the JOIN keyword but it is a good idea to avoid this syntax completely and instead use explicit joins:
SELECT bb.name
FROM big_box bb
JOIN middle_box mb ON mb.parent_box = bb.id
JOIN little_box lb ON lb.parent_box = mb.id
WHERE lb.color = 'green'
You were also missing the column name in the join condition. I have guessed that the column is called id.
This type of query should be efficient if the tables are indexed correctly. In particular there should be foreign key constraints on the join conditions and an index on little_box.color.
An issue with your query is that if there are multiple green boxes inside a single box you will get duplicate rows returned. These duplicates can be removed by addding DISTINCT after SELECT.

Related

Relational database structure design advice

This is a textual description of data for which I need to create a database design (using SQLite) for an application.
The application needs to keep a record of operations. Each operation has a Name and its list of parameters. Each parameter has its Name and a Value. However, the values of the parameters will change over the lifetime of the app (in fact the user will be able to changes them using GUI) and we want to keep a history of the values which a certain parameter has had. Furthermore, each operation can have multiple parameter sets. A parameter set is like an envelope which encompasses a set of parameter values (which all belong to the same operation) and gives this envelope a unique Number and a non-unique Description.
This is what I have so-far:
[Database model image][1]
The database model should allow me to perform these actions on the database data:
Show a list of operations - I know how to do this.
Show a list of parameters for a given operation - I know how to do this.
For a given operation, show all its parameters as columns and show the values of the parameters as rows - each row represents a different parameter value from the history of values. I'm stuck at this one.
For a given operation, show a list of all parameter sets which belong to that operation. I'm stuck at this one too.
For a given operation and for a given parameter set, get the latest values of its parameters. Stuck at this.
I'm not sure if I should re-work my database model or if I should look for proper SQL statements to accomplish the tasks above with the model that I have. Any help is greatly appreciated. Thank you.
EDIT 1
I have re-worked my database model according to a helpful advice from #Marek Herman. Thanks to that I am able to accomplish tasks 1) 2) 4).
Now I'm trying to accomplish 5) which should not be that difficult with the current database model. I have this SQL statement:
SELECT Parameter.ParameterIdentifier, ParameterValue.ParameterValue,
ParameterValueVersion.VersionNumber, ParameterValueVersion.ChangedOn
FROM ParameterValueVersion INNER JOIN
(((Operation INNER JOIN Parameter ON Operation.OperationPLC_ID = Parameter.OperationPLC_ID)
INNER JOIN ParameterSet ON Operation.OperationPLC_ID = ParameterSet.OperationPLC_ID)
INNER JOIN ParameterValue ON (ParameterSet.ID = ParameterValue.ParameterSetID) AND
(Parameter.ID = ParameterValue.ParameterID)) ON ParameterValueVersion.ID = ParameterValue.ParameterValueVersionID
WHERE (Operation.OperationPLC_ID=[opID] AND
ParameterSet.ParameterSetNumber=[parSetNum]);
where [opID] and [parSetNum] are the input parameters. This SQL statement actually only joins all these tables together on their PK->FK relationship: Operation, Parameter, ParameterSet, ParameterValue, ParameterValueVersion and filters the rows by specified OperationPLC_ID and ParameterSetNumber.
Here is an example of an output of this SQL statement. Each row shows a name of a parameter, its value, a version number of the value and date of change of that value. Some parameters only have one value (only one version -e.g., "OFFSET"). Some parameters have two values. For example "PREFILLING" has a value of "3" which was input on Oct 20, 2016 (and has a version number 1) and it also has a value of "3.5" which was input on Oct 21, 2016 and has a version number of 2. So I'd like to show only the latest versions of the values of the parameters. Any advice how to modify the SQL statement is much appreciated. Thank you.
EDIT 2
I guess I figured out how to perform 5). I had to study a bit how GROUP BY works. This did the trick:
SELECT Parameter.ParameterIdentifier, last(ParameterValue.ParameterValue) AS ParameterValue, last(ParameterValueVersion.ChangedOn) AS ChangedOn, max(ParameterValueVersion.VersionNumber) AS VersionNumber
FROM ParameterValueVersion INNER JOIN
(((Operation INNER JOIN Parameter ON Operation.OperationPLC_ID = Parameter.OperationPLC_ID)
INNER JOIN ParameterSet ON Operation.OperationPLC_ID = ParameterSet.OperationPLC_ID)
INNER JOIN ParameterValue ON (ParameterSet.ID = ParameterValue.ParameterSetID) AND
(Parameter.ID = ParameterValue.ParameterID)) ON ParameterValueVersion.ID = ParameterValue.ParameterValueVersionID
WHERE (((Operation.OperationPLC_ID)=[opID]) AND ((ParameterSet.ParameterSetNumber)=[parSetNum]))
GROUP BY Parameter.ParameterIdentifier
ORDER BY Parameter.ParameterIdentifier
Now I still need to figure out how to perform task no. 3. I'm gonna study the suggested COALESCE function. Thank you.
0) I would connect ParameterSet to Operation and Parameter and not to ParameterValue.
1) okay!
2) okay!
3) I think you can use the COALESCE() function to display the columns and then it should be possible to show all parameters with matching OperationID
4) you can do that if you do point #0
5) same as above I think

CTE performance on execution plan. Is it displayed two times, or two times processed?

This is the SQL with CommonTableExpression. Note, that USERS_PROJECTS_CTE used twice.
WITH USERS_PROJECTS_CTE (PRO_ID, SHOW_IAS, USERNAME)
AS
(
SELECT up.PRO_ID, up.SHOW_IAS, ISNULL(u.FIRST_NAME, '') + ' ' + ISNULL(u.SECOND_NAME, '')
FROM SFMIS07_PRO.USERS_PROJECTS up
INNER JOIN SFMIS07_ADM.USERS AS u
ON up.USER_ID = u.ID
WHERE up.IS_RESP_PERSON = 1 AND up.valid_to is null
)
SELECT up.PRO_ID,
up1.USERNAME as RESP_USER1,
up2.USERNAME as RESP_USER2,
up.COUNT_
FROM SFMIS07_PRO.PRO_RESP_USERS_KERNEL_MV AS up
LEFT JOIN USERS_PROJECTS_CTE AS up1 ON up.PRO_ID = up1.PRO_ID AND up1.SHOW_IAS=1
LEFT JOIN USERS_PROJECTS_CTE AS up2 ON up.PRO_ID = up2.PRO_ID AND up2.SHOW_IAS=0
The Execution Plan. Note that CTE displayed twice:
Questions:
am I right that CTE is not only displayed twice but processed twice?
is it possible to inform QO to reuse CTE ?
is it possible for QO in principle to detect "the same SQL fragment" and reuse results (I imagine the realization of this - by coping already prepared data)?
how to optimize the query (without using temporal tables :) ?
Am I right that CTE is not only displayed twice but processed twice?
Yes
Is it possible to inform QO to reuse CTE ?
Not directly but there are some hacks to encourage this.
is it possible for QO in principle to detect "the same SQL fragment"
and reuse results (I imagine the realization of this - by coping
already prepared data)?
In principle yes. See Microsoft Research Paper Efficient Exploitation of Similar Subexpressions for Query
Processing for examples.
how to optimize the query (without using temporal tables :) ?
The most reliable way would be to use a temporary (not temporal) table. See Provide a hint to force intermediate materialization of CTEs or derived tables for a more hacky workaround.

Non standard join

I ran into a problem and was trying to find a general solution to it as a join.
I have 2 tables :
http://pastebin.com/q5yws5Ym (not sure how to enforce the styling)
And i want to generate something like
http://pastebin.com/GscBUrYS
(while there are more parameters i'm interested in how i would do it for something like this)
While i was able to reach a similar effect with self-joins and equi-joins it would generate a lot of unneeded rows, which i'm not sure how to delete automatically.
Try something along the lines of:
SELECT user.user_id, j1.user_param, j1.user_value, j2.user_param, j2.user_value
FROM user
JOIN Users_info j1 ON user.user_id = j1.user_id
JOIN users_info j2 on user.user_id = j2.user_id
where j1.user_param != j2.user_param
GROUP BY user.user_id
It may be possible that you will need some more "exclusion" clauses for the where to make sure that every row is only selected once but the general idea should work (for a given and limited number of different user_param`s).

How do I search a "Property Bag" table in SQL?

I have a basic "property bag" table that stores attributes about my primary table "Card." So when I want to start doing some advanced searching for cards, I can do something like this:
SELECT dbo.Card.Id, dbo.Card.Name
FROM dbo.Card
INNER JOIN dbo.CardProperty ON dbo.CardProperty.IdCrd = dbo.Card.Id
WHERE dbo.CardProperty.IdPrp = 3 AND dbo.CardProperty.Value = 'Fiend'
INTERSECT
SELECT dbo.Card.Id, dbo.Card.Name
FROM dbo.Card
INNER JOIN dbo.CardProperty ON dbo.CardProperty.IdCrd = dbo.Card.Id
WHERE (dbo.CardProperty.IdPrp = 10 AND (dbo.CardProperty.Value = 'Wind' OR dbo.CardProperty.Value = 'Fire'))
What I need to do is to extract this idea into some kind of stored procedure, so that ideally I can pass in a list of property/value combinations and get the results of the search.
Initially this is going to be a "strict" search meaning that the results must match all elements in the query, but I'd also like to have a "loose" query so that it would match any of the results in the query.
I can't quite seem to wrap my head around this one. My previous version of this was to do generate some massive SQL query to execute with a lot of AND/OR clauses in it, but I'm hoping to do something a little more elegant this time. How do I go about doing this?
it seems to me that you have an EAV model here.
if you're using sql server 2005 and up i'd suggest you use XML datatype for this:
http://weblogs.sqlteam.com/mladenp/archive/2006/10/14/14032.aspx
makes searching and stuff much easier with built in xml querying capabilities.
if you can't change your model then look at this:
http://weblogs.sqlteam.com/davidm/articles/12117.aspx

Why Is My Inline Table UDF so much slower when I use variable parameters rather than constant parameters?

I have a table-valued, inline UDF. I want to filter the results of that UDF to get one particular value. When I specify the filter using a constant parameter, everything is great and performance is almost instantaneous. When I specify the filter using a variable parameter, it takes a significantly larger chunk of time, on the order of 500x more logical reads and 20x greater duration.
The execution plan shows that in the variable parameter case the filter is not applied until very late in the process, causing multiple index scans rather than the seeks that are performed in the constant case.
I guess my questions are: Why, since I'm specifying a single filter parameter that is going to be highly selective against an indexed field, does my performance go into the weeds when that parameter is in a variable? Is there anything I can do about this?
Does it have something to do with the analytic function in the query?
Here are my queries:
CREATE FUNCTION fn_test()
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
SELECT DISTINCT GCN_SEQNO, Drug_package_version_ID
FROM
(
SELECT COALESCE(ndctbla.GCN_SEQNO, ndctblb.GCN_SEQNO) AS GCN_SEQNO,
dpv.Drug_package_version_ID, ROW_NUMBER() OVER (PARTITION BY dpv.Drug_package_version_id ORDER BY
ndctbla.GCN_SEQNO DESC) AS Predicate
FROM dbo.Drug_Package_Version dpv
LEFT JOIN dbo.NDC ndctbla ON ndctbla.NDC = dpv.Sp_package_code
LEFT JOIN dbo.NDC ndctblb ON ndctblb.SPC_NDC = dpv.Sp_package_code
) iq
WHERE Predicate = 1
GO
GRANT SELECT ON fn_test TO public
GO
-- very fast
SELECT GCN_SEQNO
FROM dbo.fn_test()
WHERE Drug_package_version_id = 10000
GO
-- comparatively slow
DECLARE #dpvid int
SET #dpvid = 10000
SELECT GCN_SEQNO
FROM dbo.fn_test()
WHERE Drug_package_version_id = #dpvid
Once you create a new projection through a UDF, it can't be expected that your indexes will still apply on the columns that are indexed on the original table and included in the projection. When you filter on the projection (and not in the UDF against the original table with the indexes) the indexes no longer apply.
What you want to do is parameterize the function to take in the parameter.
If you find that you have too many fields that you want to set parameters on, then you might want to take a look at indexed views, as you can create your projection and index it as well and then run queries against that.
Simply, the constant is easy to evaluate in the plan. The local variable is not. Especially with the ranking function and filter Predicate = 1
Paraphrasing casparOne, you need to push the filter as far inwards as possible so that you filter on dpv.Drug_package_version_id inside the iq derived table.
If you do that, then you also have no need for the PARTITION BY because you have only a single dpv.Drug_package_version_id. Then you can do a cleaner ...TOP 1 ... ORDER BY ndctbla.GCN_SEQNO DESC.
The responses I got were good, and I learned from them, but I think I've found an answer that satisfies me.
I do think it's the use of the PARTITION BY clause that is causing the problem here. I reformulated the UDF using a variant of the self-join idiom:
SELECT t1.A, t1.B, t1.C
FROM T t1
INNER JOIN
(
SELECT A, MAX(C) AS C
FROM T
GROUP BY A
) t2 ON t1.A = t2.A AND t1.C = t2.C
Ironically, this is more performant than using the SQL 2008-specific query, and also the optimizer doesn't have a problem with joining this version of the query using variables rather than constants. At this point, I'm concluding that the optimizer just doesn't handle the more recent SQL extensions as well as the older stuff. As a bonus, I can make use of the UDF now, in my pre-upgraded SQL 2000 platforms.
Thanks for your help, everyone!

Resources