One T-SQL query output to multiple record sets - sql-server

Don't ask for what, but i need two tables from one SQL query.
Like this...
Select Abc, Dgf from A
and result are two tables
abc
1
1
1
dgf
2
2
2
More details?
Ok lets try.
Now i have sp like this:
SELECT a.* from ActivityView as a with (nolock)
where a.WorkplaceGuid = #WorkplaceGuid
SELECT b.* from ActivityView as a with (nolock)
left join PersonView as b with (nolock) on a.PersonGuid=b.PersonGuid where a.WorkplaceGuid = #WorkplaceGuid
It's cool. But execution time about 22 seconds. I do this because in my programm i have classes that automaticly get data from records set. Class Activity and class Person. That why i can't make it in one recordset. Program didn't parse it.

You can write a stored procedure that has two SELECTs.
SELECT Abc FROM A AS AbcTable;
SELECT Dgf FROM A AS DfgTable;
Depending on your specific requirements, I would consider just submitting two separate queries. I don't see any advantage to combining them.

SQL Server supports legacy COMPUTE BY clause which acts almost like GROUP BY but returns multiple resultsets (the resultsets constituting each group followed by the resultsets with the aggregates):
WITH q AS
(
SELECT 1 AS id
UNION ALL
SELECT 2 AS id
)
SELECT *
FROM q
ORDER BY
id
COMPUTE COUNT(id)
BY id
This, however, is obsolete and is to be removed in the future releases.

Those don't seem to be excessively complicated queries (although select * should in general not be used in production and never when you are doing a join as it needlessly wastes resources sending the value of the joined field twice). Therefore if it is taking 22 seconds, then either you are returning a huge amount of data or you don't have proper indexing.
Have you looked at the execution plans to see what is causing the slowness?

Related

Hierarchical SQL select-query

I'm using MS SqlServer 2008. And I have a table 'Users'. This table has the key field ID of bigint. And also a field Parents of varchar which encodes all chain of user's parent IDs.
For example:
User table:
ID | Parents
1 | null
2 | ..
3 | ..
4 | 3,2,1
Here user 1 has no parents and user 4 has a chain of parents 3->2->1. I created a function which parses the user's Parents field and returns result table with user IDs of bigint.
Now I need a query which will select and join IDs of some requested users and theirs parents (order of users and theirs parents is not important). I'm not an SQL expert so all I could come up with is the following:
WITH CTE AS(
SELECT
ID,
Parents
FROM
[Users]
WHERE
(
[Users].Name = 'John'
)
UNION ALL
SELECT
[Users].Id,
[Users].Parents
FROM [Users], CTE
WHERE
(
[Users].ID in (SELECT * FROM GetUserParents(CTE.ID, CTE.Parents) )
))
SELECT * FROM CTE
And basically it works. But performance of this query is very poor. I believe WHERE .. IN .. expression here is a bottle neck. As I understand - instead of just joining the first subquery of CTE (ID's of found users) with results of GetUserParents (ID's of user parents) it has to enumerate all users in the Users table and check whether the each of them is a part of the function's result (and judging on execution plan - Sql Server does distinct order of the result to improve performance of WHERE .. IN .. statement - which is logical by itself but in general is not required for my goal. But this distinct order takes 70% of execution time of the query). So I wonder how this query could be improved or perhaps somebody could suggest some another approach to solve this problem at all?
Thanks for any help!
The recursive query in the question looks redundant since you already form the list of IDs needed in GetUserParents. Maybe change this into SELECT from Users and GetUserParents() with WHERE/JOIN.
select Users.*
from Users join
(select ParentId
from (SELECT * FROM Users where Users.Name='John') as U
cross apply [GetDocumentParents](U.ID, U.Family, U.Parents))
as gup
on Users.ID = gup.ParentId
Since GetDocumentParents expects scalars and select... where produces a table, we need to apply the function to each row of the table (even if we "know" there's only one). That's what apply does.
I used indents to emphasize the conceptual parts of the query. (select...) as gup is the entity Users is join'd with; (select...) as U cross apply fn() is the argument to FROM.
The key knowledge to understanding this query is to know how the cross apply works:
it's a part of a FROM clause (quite unexpectedly; so the syntax is at FROM (Transact-SQL))
it transforms the table expression left of it, and the result becomes the argument for the FROM (i emphasized this with indent)
The transformation is: for each row, it
runs a table expression right of it (in this case, a call of a table-valued function), using this row
adds to the result set the columns from the row, followed by the columns from the call. (In our case, the table returned from the function has a single column named ParentId)
So, if the call returns multiple rows, the added records will be the same row from the table appended with each row from the function.
This is a cross apply so rows will only be added if the function returns anything. If this was the other flavor, outer apply, a single row would be added anyway, followed by a NULL in the function's column if it returned nothing.
This "parsing" thing violates even the 1NF. Make Parents field contain only the immediate parent (preferably, a foreign key), then an entire subtree can be retrieved with a recursive query.

Transact SQL and Where Statement and CASE Statement

I have the following requirement that will multiple joins - Currently this search looks at the client’s residence county. This needs to be changed to look at tbl_client_insurance.region_id for the active, effective insurance for the consumer. If the client has multiple insurances meeting this criteria, use the county for ins_id = 2 (Medicaid). I believe I have most of the query correct, but I am getting hung up on the ti.ins_id, which I believe I will need a case, basically returning only the 2 or else if 2 does not exist return the insurance the client does have.
SELECT
ti.client_id, ti.exp_dt, ins_id, *
FROM
tbl_Client AS tc
INNER JOIN
tbl_client_insurance AS ti ON ti.client_ID = tc.client_ID
WHERE
tc.client_id = 26
AND tc.active = 1
AND (ti.exp_dt >= GETDATE() OR ti.exp_dt IS NULL)
CASE
--- Need some help here.
You may be able to add such a condition to your join logic as
INNER JOIN tbl_client_insurance AS ti ON ti.client_ID = tc.client_ID AND ti.ins_id=2
I don't know what your desired result set is.
Based on your sample you want all matching client insurances.
But it seems you may want just one for that client.
If you just want one for that client, this is just an ordering issue.
You should use a UNION ALL.
The first part is the query for insurance id 2.
The second part is the query for all other insurances.
Order by a new column you provide.
For the first part the new column will be A.
For the second part it will be B.
Then take only the top 1 row.
That way if id 2 exists it will be first, otherwise it will be one of the other insurances, or if there are no matching insurances you will get an empty result set.
The question then becomes how do you tell which of the other insurances to select if there is no medicaid. Again this is just ordering.

Is the 'BETWEEN' function very expensive in SQL Server?

I'm trying to join two relatively simple tables together, but my query is experiencing serious hangups. I'm not sure why, but I think it might have something to do with the 'between' function. My first table looks something like this (with a lot of other columns, but this would be the only column I'm pulling):
RowNumber
1
2
3
4
5
6
7
8
My second table "groups" my rows into "blocks", and has the following schema:
BlockID RowNumberStart RowNumberStop
1 1 3
2 4 7
3 8 8
The desired result I'm looking to get is to link the RowNumber with the BlockID like below, with the same number of rows with the first table. So the result would look like this:
RowNumber BlockID
1 1
2 1
3 1
4 2
5 2
6 2
7 2
8 3
In order to get that, I used the following query, writing the results into a temp table:
select A.RowNumber, B.BlockID
into TEMP_TABLE
from TABLE_1 A left join TABLE_2 B
on A.RowNumber between B.RowNumberStart and B.RowNumberStop
TABLE_1 and TABLE_2 are actually very large tables. Table 1 is about 122M Rows, and TABLE_2 is about 65M rows. In TABLE_1, the RowNumber is defined as a 'bigint', and in TABLE_2, the BlockID, RowNumberStart, and RowNumberStop are all defined as 'int'. Not sure that makes a difference, but just wanted include that information, too.
The query has now been hung up for eight hours. Similar queries on this type and volume of data are not taking anywhere near this long. So I'm wondering if it could be the 'between' statement that's hanging up this query.
Definitely would welcome any suggestions on how to make this more efficient.
BETWEEN is simply shorthand for :
select A.RowNumber, B.BlockID
into TEMP_TABLE
from TABLE_1 A left join TABLE_2 B
on A.RowNumber >= B.RowNumberStart AND A.RowNumber <= B.RowNumberStop
If execution plan goes from B to A (but left join would indicate it has to go from A to B, really), then I'm assuming TABLE_1 is indexed on RowNumber (and that should be covering on this query). If it's only got a clustered index on RowNumber and the table is very wide, I recommend a non-clustered index only on RowNumber, since you'll fit a lot more rows per page that way.
Otherwise, you want to index on TABLE_2 on RowNumberStart DESC or RowNumberStop ASC, because for given A you'd need a DESC on RowNumberStart to match.
I think you might want to change your join to an INNER JOIN, the way your join criteria is set up. (Are you ever going to get TABLE_1 in no block?)
If you look at your execution plan, you should get more clues as to why the performance might be bad, but the Stop criteria is probably not used on the seek into TABLE_1.
Unfortunately SQLMenace's answer about the SELECT INTO has been deleted. My comment regarding that was meant to be: #Martin SELECT INTO performance isn't as bad as it once was, but I still recommend a CREATE TABLE for most production because SELECT INTO will infer types and NULLability. This is fine if you verify it is doing what you think it is doing, but creating a super long varchar or a decimal column with very strange precision can result in not only odd tables, but performance issues (especially with some of those big varchars when you forget a LEFT or whatever). I think it just helps to make it clear what you are expecting the table to look like. Often I will SELECT INTO using WHERE 0 = 1 and check out the schema and then script it with my tweaks (like adding an IDENTITY or adding a column with a timestamp default).
You have one main problem: you want to display too much data volume at once. Ar you really sure you want handle the result of ALL 122M rows from table 1 at once? Do you really need that?

Which is faster: JOIN with GROUP BY or a Subquery?

Let's say we have two tables: 'Car' and 'Part', with a joining table in 'Car_Part'. Say I want to see all cars that have a part 123 in them. I could do this:
SELECT Car.Col1, Car.Col2, Car.Col3
FROM Car
INNER JOIN Car_Part ON Car_Part.Car_Id = Car.Car_Id
WHERE Car_Part.Part_Id = #part_to_look_for
GROUP BY Car.Col1, Car.Col2, Car.Col3
Or I could do this
SELECT Car.Col1, Car.Col2, Car.Col3
FROM Car
WHERE Car.Car_Id IN (SELECT Car_Id FROM Car_Part WHERE Part_Id = #part_to_look_for)
Now, everything in me wants to use the first method because I've been brought up by good parents who instilled in me a puritanical hatred of sub-queries and a love of set theory, but it has been suggested to me that doing that big GROUP BY is worse than a sub-query.
I should point out that we're on SQL Server 2008. I should also say that in reality I want to select based the Part Id, Part Type and possibly other things too. So, the query I want to do actually looks like this:
SELECT Car.Col1, Car.Col2, Car.Col3
FROM Car
INNER JOIN Car_Part ON Car_Part.Car_Id = Car.Car_Id
INNER JOIN Part ON Part.Part_Id = Car_Part.Part_Id
WHERE (#part_Id IS NULL OR Car_Part.Part_Id = #part_Id)
AND (#part_type IS NULL OR Part.Part_Type = #part_type)
GROUP BY Car.Col1, Car.Col2, Car.Col3
Or...
SELECT Car.Col1, Car.Col2, Car.Col3
FROM Car
WHERE (#part_Id IS NULL OR Car.Car_Id IN (
SELECT Car_Id
FROM Car_Part
WHERE Part_Id = #part_Id))
AND (#part_type IS NULL OR Car.Car_Id IN (
SELECT Car_Id
FROM Car_Part
INNER JOIN Part ON Part.Part_Id = Car_Part.Part_Id
WHERE Part.Part_Type = #part_type))
The best thing you can do is test them yourself, on realistic data volumes. That would not only benefit for this query, but for all future queries when you are not sure which is the best way.
Important things to do include:
- test on production level data volumes
- test fairly & consistently (clear cache: http://www.adathedev.co.uk/2010/02/would-you-like-sql-cache-with-that.html)
- check the execution plan
You could either monitor using SQL Profiler and check the duration/reads/writes/CPU there, or SET STATISTICS IO ON; SET STATISTICS TIME ON; to output stats in SSMS. Then compare the stats for each query.
If you can't do this type of testing, you'll be potentially exposing yourself to performance problems down the line that you'll have to then tune/rectify. There are tools out there you can use that will generate data for you.
I have similar data so I checked the execution plan for both styles of query. To my surprise, the Column In Subquery (CIS) produced an execution plan with 25% less I/O cost to than the inner join (IJ) query. In the CIS execution plan I get an 2 index scans of the intermediate table (Car_Part) versus an index scan of the intermediate and a relatively more expensive hash join in the IJ. My indexes are healthy but non-clustered so it stands to reason that the index scans might be made a bit faster by clustering them. I doubt this would impact the cost of the hash join which is the more expensive step in the IJ query.
Like the others have pointed out, it depends on your data. If you're working with many gigabytes in these 3 tables then tune away. If your rows are numbered in the hundreds or thousands then you might be splitting hairs over a very small performance gain. I would say that the IJ query is much more readable so as long as it's good enough, do any future developer who touches your code a favour and give them something easier to read. The row count in my tables are 188877, 283912, 13054 and both queries returned in less time that it took to sip coffee.
Small postscript: since you're not aggregating any numerical values, it looks like you mean to select distinct. Unless you're actually going to do something with the group, it's easier to see your intention with select distinct rather than group by at the end. IO cost is the same but one indicates your intention better IMHO.
With SQL Server 2008 I would expect In to be quicker as it is equivalent to this.
SELECT Car.Col1, Car.Col2, Car.Col3
FROM Car
WHERE EXISTS(SELECT * FROM Car_Part
WHERE Car_Part.Car_Id = Car.Car_Id
AND Car_Part.Part_Id = #part_to_look_for
)
i.e. it only has to check for the existence of the row not join it then remove duplicates. This is discussed here.

SQL HAVING SUM GROUP BY

Using SQL Server 2005. I am building an inventory/purchasing program and I’m at the point where I need the user to “check out” equipment. When he selects a product, I need to query which stock locations have the available Qty, and tell the user which location to walk to/ retrieve product.
Here is a query for a particular [StockLocation_Products].ProductID, with a particular assigned [ProductUsages].ProductUsageID.
SELECT
PROD.ProductID,
PROD.ProductName,
SL.Room,
SL.StockSpace,
SLPPU.ResvQty,
PRDUSG.ProductUsage
FROM [StockLocations] SL
INNER JOIN [StockLocation_Products] SLP ON SL.StockLocationID = SLP.StockLocationID
INNER JOIN [StockLocation_Product_ProductUsages] SLPPU ON SLP.StockLocationID = SLPPU.StockLocationID AND SLP.ProductID = SLPPU.ProductID
INNER JOIN [ProductUsages] PUSG ON SLPPU.ProductUsageID = PRDUSG.ProductUsageID
INNER JOIN [Products] PROD ON SLPPU.ProductID = PROD.ProductID
WHERE SLP.ProductID = 4 AND PRDUSG.ProductUsageID = 1
This query returns:
ProductID ProductName Room StockSpace ResvQty ProductUsage
------------------------------------------------------------------------------------------------------------------------
4 Addonics Pocket DVD+/-R/RW B700 5-D 12 MC Pool
4 Addonics Pocket DVD+/-R/RW B700 6-B 10 MC Pool
4 Addonics Pocket DVD+/-R/RW B700 6-C 21 MC Pool
4 Addonics Pocket DVD+/-R/RW B700 6-D 20 MC Pool
I thought maybe I could use an additional HAVING clause to make this query return which combination of StockSpace(s) you’d need to visit to satisfy a request for some Qty. E.g. User needs to pull 30 of Product (ID =4).
But I don’t really understand how to use GROUP BY with HAVING SUM(), to achieve what I want.
I tried various things in my group by / having clause, but I just don’t get any results.
GROUP BY PROD.ProductID,PROD.ProductName,SL.Room,SL.StockSpace,SLPPU.ResvQty,PUSG.ProductUsage
HAVING SUM(ResvQty) >= 30;
I want results that show (at least one) combination of StockSpaces which sums up to 30, so I can tell the user “you can get 21 units from space ‘6-C’, and 9 units from ‘6-B’. There may be multiple combinations of rows that could sum() >= 30, but I need at least how to find one combination that does! Help!
You can have an inner select, such as:
SELECT count_of_foo, count(bar), baz
FROM (SELECT count(foo) as count_of_foo, bar, baz, other1, other2 FROM complex_query WHERE foo = bar HAVING count(foo) > 1) inner_query
GROUP BY count_of_foo, baz.
This will give you the ability to add more group by after the HAVING clause.
What you are trying to do is a running sum, which you can get with various techniques in SQL. I think the most efficient query, especially if you are trying to do this all in the same query, is to use a CTE (here's one example).
Another technique that doesn't rely on CTE requires the data to be populated into another table (could be a temp table, though) and basically you do a join-and-sort operation as you go.
Once you get the data to include a running sum, then you can simply select the values from which the running sum is less than or equal to the total number that you are trying to locate.
And here is a nice summary of several of the different techniques.

Resources