SQL HAVING SUM GROUP BY - sql-server

Using SQL Server 2005. I am building an inventory/purchasing program and I’m at the point where I need the user to “check out” equipment. When he selects a product, I need to query which stock locations have the available Qty, and tell the user which location to walk to/ retrieve product.
Here is a query for a particular [StockLocation_Products].ProductID, with a particular assigned [ProductUsages].ProductUsageID.
SELECT
PROD.ProductID,
PROD.ProductName,
SL.Room,
SL.StockSpace,
SLPPU.ResvQty,
PRDUSG.ProductUsage
FROM [StockLocations] SL
INNER JOIN [StockLocation_Products] SLP ON SL.StockLocationID = SLP.StockLocationID
INNER JOIN [StockLocation_Product_ProductUsages] SLPPU ON SLP.StockLocationID = SLPPU.StockLocationID AND SLP.ProductID = SLPPU.ProductID
INNER JOIN [ProductUsages] PUSG ON SLPPU.ProductUsageID = PRDUSG.ProductUsageID
INNER JOIN [Products] PROD ON SLPPU.ProductID = PROD.ProductID
WHERE SLP.ProductID = 4 AND PRDUSG.ProductUsageID = 1
This query returns:
ProductID ProductName Room StockSpace ResvQty ProductUsage
------------------------------------------------------------------------------------------------------------------------
4 Addonics Pocket DVD+/-R/RW B700 5-D 12 MC Pool
4 Addonics Pocket DVD+/-R/RW B700 6-B 10 MC Pool
4 Addonics Pocket DVD+/-R/RW B700 6-C 21 MC Pool
4 Addonics Pocket DVD+/-R/RW B700 6-D 20 MC Pool
I thought maybe I could use an additional HAVING clause to make this query return which combination of StockSpace(s) you’d need to visit to satisfy a request for some Qty. E.g. User needs to pull 30 of Product (ID =4).
But I don’t really understand how to use GROUP BY with HAVING SUM(), to achieve what I want.
I tried various things in my group by / having clause, but I just don’t get any results.
GROUP BY PROD.ProductID,PROD.ProductName,SL.Room,SL.StockSpace,SLPPU.ResvQty,PUSG.ProductUsage
HAVING SUM(ResvQty) >= 30;
I want results that show (at least one) combination of StockSpaces which sums up to 30, so I can tell the user “you can get 21 units from space ‘6-C’, and 9 units from ‘6-B’. There may be multiple combinations of rows that could sum() >= 30, but I need at least how to find one combination that does! Help!

You can have an inner select, such as:
SELECT count_of_foo, count(bar), baz
FROM (SELECT count(foo) as count_of_foo, bar, baz, other1, other2 FROM complex_query WHERE foo = bar HAVING count(foo) > 1) inner_query
GROUP BY count_of_foo, baz.
This will give you the ability to add more group by after the HAVING clause.

What you are trying to do is a running sum, which you can get with various techniques in SQL. I think the most efficient query, especially if you are trying to do this all in the same query, is to use a CTE (here's one example).
Another technique that doesn't rely on CTE requires the data to be populated into another table (could be a temp table, though) and basically you do a join-and-sort operation as you go.
Once you get the data to include a running sum, then you can simply select the values from which the running sum is less than or equal to the total number that you are trying to locate.
And here is a nice summary of several of the different techniques.

Related

SQL SUM() function with parameters returned by query for each row

First of all, sorry for that weird title. Here is the thing:
I work for a online shop, which sells products on amazon. Since we sell sets of different items, it happens that we send the same item within multiple sets to amazon fba. To give out the total sum of one item in all sets, I wrote the following query:
SELECT
SUM(nQuantity)
AS [total]
FROM [amazon_fba]
INNER JOIN (SELECT
[cArtNr]
FROM [tArtikel]
INNER JOIN (SELECT
[kStueckliste]
FROM [tStueckliste]
WHERE [kArtikel] = (SELECT
[kArtikel]
FROM [tArtikel]
WHERE [cHAN] = 12345)) [bar]
ON [tArtikel].[kStueckliste] = [bar].[kStueckliste]) [foo]
ON [amazon_fba].[cSellerSKU] = [foo].[cArtNr]
The cHAN=12345 part is just used to pick one specific item for which we want to know the total number of items. This query itself works fine, so this is not the problem.
However, I also know that all products that are part of sets have [tArtikel].[kStueckliste]=0, which -in theory- makes identifying them pretty easy. Which got me to the idea, that I could use this query to instantly generate a list of all these products with their respective total, like:
kArtikel | total
=================
01234 | 23
56789 | 42
So basically I needed something like
foreach (
select [kArtikel]
from [tArtikel]
where [tArtikel].[kStueckliste]=0
) do (
< the query I made >
)
Thus I tried the following statement:
SELECT
SUM(nQuantity)
AS [total]
FROM [amazon_fba]
INNER JOIN (SELECT
[cArtNr]
FROM [tArtikel]
INNER JOIN (SELECT
[kStueckliste]
FROM [tStueckliste]
INNER JOIN (SELECT
[kArtikel]
FROM [tArtikel]
WHERE [tArtikel].[tStueckliste] = 0) [baz]
ON [tStueckliste].[kArtikel] = [baz].[kArtikel]) [bar]
ON [tArtikel].[kStueckliste] = [bar].[kStueckliste]) [foo]
ON [amazon_fba].[cSellerSKU] = [foo].[cArtNr]
This did not -as I hoped- return a list of sums, but instead gave me the total sum of all sums I wanted to create.
Since I am pretty new to SQL (about two weeks in maybe), I have neither any idea what to do, nor where my mistake is, NOR what phrasing I should use to google my way around -thus that wierd Title of this post. So if anyone could help me with that and/or point me into the right direction I'd be really happy :)
I write MySQL rather than SQL but I believe it's very similar other than a few functions and syntaxes. Here's what I think should work for you:
select am.cArtNr, sum(am.nQuantity) as total
from amazon_fba am
join tArtikel ar on ar.cArtNr=am.cArtNr
join tStueckliste st on st.kStueckliste=ar.kStueckliste
where ar.kStueckliste=0
group by am.cArtNr;
Adding the group by will do the split out by articles, but reducing the number of brackets (in this instance derived tables) will speed up the query provided you're using indexes. Again, this is how I would do it in MySQL, and the only other query language I have experience in is BigQuery which won't help here.

How can I convert a view containing a START WITH...CONNECT BY sub-query to SQL Server?

I am trying to convert a view from an Oracle RDBMS to SQL Server. The view looks like:
create or replace view user_part_v
as
select part_region.part_id, users.id as users_id
from part_region, users
where part_region.region_id in(select region_id
from region_relation
start with region_id = users.region_id
connect by parent_region_id = prior region_id)
Having read about recursive CTE's and also about their use in sub-queries, my best guess at translating the above into SQL Server syntax is:
create view user_part_v
as
with region_structure(region_id, parent_region_id) as (
select region_id
, parent_region_id
from region_relation
where parent_region_id = users.region_id
union all
select r.region_id
, r.parent_region_id
from region_relation r
join region_structure rs on rs.parent_region_id = r.region_id
)
select part_region.part_id, users.id as users_id
from part_region, users
where part_region.region_id in(select region_id from region_structure)
Obviously this gives me an error about the reference to users.region_id in the CTE definition.
How can I achieve the same result in SQL Server as I get from the Oracle view?
Background
I am working on the conversion of a system from running on an Oracle 11g RDMS to SQL Server 2008. This system is a relatively large Java EE based system, using JPA (Hibernate) to query from the database.
Many of the queries use the above mentioned view to restrict the results returned to those appropriate for the current user. If I cannot convert the view directly then the conversion will be much harder as I will need to change all of the places where we query the database to achieve the same result.
The tables referenced by this view have a structure similar to:
USERS
ID
REGION_ID
REGION
ID
NAME
REGION_RELATIONSHIP
PARENT_REGION_ID
REGION_ID
PART
ID
PARTNO
DESCRIPTION
PART_REGION
PART_ID
REGION_ID
So, we have regions, arranged into a hierarchy. A user may be assigned to a region. A part may be assigned to many regions. A user may only see the parts assigned to their region. The regions reference various geographic regions:
World
Europe
Germany
France
...
North America
Canada
USA
New York
...
If a part, #123, is assigned to the region USA, and the user is assigned to the region New York, then the user should be able to see that part.
UPDATE: I was able to work around the error by creating a separate view that contained the necessary data, and then have my main view join to this view. This has the system working, but I have not yet done thorough correctness or performance testing yet. I am still open to suggestions for better solutions.
I reformatted your original query to make it easier for me to read.
create or replace view user_part_v
as
select part_region.part_id, users.id as users_id
from part_region, users
where part_region.region_id in(
select region_id
from region_relation
start with region_id = users.region_id
connect by parent_region_id = prior region_id
);
Let's examine what's going on in this query.
select part_region.part_id, users.id as users_id
from part_region, users
This is an old-style join where the tables are cartesian joined and then the results are reduced by the subsequent where clause(s).
where part_region.region_id in(
select region_id
from region_relation
start with region_id = users.region_id
connect by parent_region_id = prior region_id
);
The sub-query that's using the connect by statement is using the region_id from the users table in outer query to define the starting point for the recursion.
Then the in clause checks to see if the region_id for the part_region is found in the results of the recursive query.
This recursion follows the parent-child linkages given in the region_relation table.
So the combination of doing an in clause with a sub-query that references the parent and the old-style join means that you have to consider what the query is meant to accomplish and approach it from that direction (rather than just a tweaked re-arrangement of the old query) to be able to translate it into a single recursive CTE.
This query also will return multiple rows if the part is assigned to multiple regions along the same branch of the region heirarchy. e.g. if the part is assigned to both North America and USA a user assigned to New York will get two rows returned for their users_id with the same part_id number.
Given the Oracle view and the background you gave of what the view is supposed to do, I think what you're looking for is something more like this:
create view user_part_v
as
with user_regions(users_id, region_id, parent_region_id) as (
select u.users_id, u.region_id, rr.parent_region_id
from users u
left join region_relation rr on u.region_id = rr.region_id
union all
select ur.users_id, rr.region_id, rr.parent_region_id
from user_regions ur
inner join region_relation rr on ur.parent_region_id = rr.region_id
)
select pr.part_id, ur.users_id
from part_region pr
inner join user_regions ur on pr.region_id = ur.region_id;
Note that I've added the users_id to the output of the recursive CTE, and then just done a simple inner join of the part_region table and the CTE results.
Let me break down the query for you.
select u.users_id, u.region_id, rr.parent_region_id
from users u
left join region_relation rr on u.region_id = rr.region_id
This is the starting set for our recursion. We're taking the region_relation table and joining it against the users table, to get the starting point for the recursion for every user. That starting point being the region the user is assigned to along with the parent_region_id for that region. A left join is done here and the region_id is pulled from the user table in case the user is assigned to a top-most region (which means there won't be an entry in the region_relation table for that region).
select ur.users_id, rr.region_id, rr.parent_region_id
from user_regions ur
inner join region_relation rr on ur.parent_region_id = rr.region_id
This is the recursive part of the CTE. We take the existing results for each user, then add rows for each user for the parent regions of the existing set. This recursion happens until we run out of parents. (i.e. we hit rows that have no entries for their region_id in the region_relationship table.)
select pr.part_id, ur.users_id
from part_region pr
inner join user_regions ur on pr.region_id = ur.region_id;
This is the part where we grab our final result set. Assuming (as I do from your description) that each region has only one parent (which would mean that there's only one row in region_relationship for each region_id), a simple join will return all the users that should be able to view the part based on the part's region_id. This is because there is exactly one row returned per user for the user's assigned region, and one row per user for each parent region up to the heirarchy root.
NOTE:
Both the original query and this one do have a limitation that I want to make sure you are aware of. If the part is assigned to a region that is lower in the heirarchy than the user (i.e. a region that is a descendent of the user's region like the part being assigned to New York and the user to USA instead of the other way around), the user won't see that part. The part has to be assigned to either the user's assigned region, or one higher in the region heirarchy.
Another thing is that this query still exhibits the case I mentioned above about the original query, where if a part is assigned to multiple regions along the same branch of the heirarchy that multiple rows will be returned for the same combination of users_id and part_id. I did this because I wasn't sure if you wanted that behavior changed or not.
If this is actually an issue and you want to eliminate the duplicates, then you can replace the query below the CTE with this one:
select p.part_id, u.users_id
from part p
cross join users u
where exists (
select 1
from part_region pr
inner join user_regions ur on pr.region_id = ur.region_id;
where pr.part_id = p.part_id
and ur.users_id = u.users_id
);
This does a cartesian join between the part table and the users table and then only returns rows where the combination of the two has at least one row in the results of the subquery, which are the results that we are trying to de-duplicate.

MS Access : Average and Total Calculation in Single Query

INTRODUCTION TO DATABASE TABLE BEING USED -
I am working on a “Stock Market Prices” based Database Table. My table has got the data for the following FIELDS –
ID
SYMBOL
OPEN
HIGH
LOW
CLOSE
VOLUME
VOLUME CHANGE
VOLUME CHANGE %
OPEN_INT
SECTOR
TIMESTAMP
New data gets added to the table daily “Monday to Friday”, based on the stock market price changes for that day. The current requirement is based on the VOLUME field, which shows the volume traded for a particular stock on daily basis.
REQUIREMENT –
To get the Average and Total Volume for last 10,15 and 30 Days respectively.
METHOD USED CURRENTLY -
I created these 9 SEPARATE QUERIES in order to get my desired results –
First I have created these 3 queries to take out the most recent last 10,15 and 30 dates from the current table:
qryLast10DaysStored
qryLast15DaysStored
qryLast30DaysStored
Then I have created these 3 queries for getting the respective AVERAGES:
qrySymbolAvgVolume10Days
qrySymbolAvgVolume15Days
qrySymbolAvgVolume30Days
And then I have created these 3 queries for getting the respective TOTALS:
qrySymbolTotalVolume10Days
qrySymbolTotalVolume15Days
qrySymbolTotalVolume30Days
PROBLEM BEING FACED WITH CURRENT METHOD -
Now, my problem is that I have ended up having these so many different queries, whereas I wanted to get the output into One Single Query, as shown in the Snapshot of the Excel Sheet:
http://i49.tinypic.com/256tgcp.png
SOLUTION NEEDED -
Is there some way by which I can get these required fields into ONE SINGLE QUERY, so that I do not have to look into multiple places for the required fields? Can someone please tell me how to get all these separate queries into one -
A) Either by taking out or moving the results from these separate individual queries to one.
B) Or by making a new query which calculates all these fields within itself, so that these separate individual queries are no longer needed. This would be a better solution I think.
One Clarification about Dates –
Some friend might think why I used the method of using Top 10,15 and 30 for getting the last 10,15 and 30 Date Values. Why not I just used the PC Date for getting these values? Or used something like -
("VOLUME","tbl-B", "TimeStamp BETWEEN Date() - 10 AND Date()")
The answer is that I require my query to "Read" the date from the "TIMESTAMP" Field, and then perform its calculations accordingly for LAST / MOST RECENT "10 days, 15 days, 30 days” FOR WHICH THE DATA IS AVAILABLE IN THE TABLE, WITHOUT BOTHERING WHAT THE CURRENT DATE IS. It should not depend upon the current date in any way.
If there is any better method or more efficient way to create these queries, then please enlighten.
You have separate queries to compute 10DayTotalVolume and 10DayAvgVolume. I suspect you can compute both in one query, qry10DayVolumes.
SELECT
b.SYMBOL,
Sum(b.VOLUME) AS 10DayTotalVolume,
Avg(b.VOLUME) AS 10DayAvgVolume
FROM
[tbl-B] AS b INNER JOIN
qryLast10DaysStored AS q
ON b.TIMESTAMP = q.TIMESTAMP
GROUP BY b.SYMBOL;
However, that makes me wonder whether 10DayAvgVolume can ever be anything other than 10DayTotalVolume / 10
Similar considerations apply to the 15 and 30 day values.
Ultimately, I think you want something based on a starting point like this:
SELECT
q10.SYMBOL,
q10.[10DayTotalVolume],
q10.[10DayAvgVolume],
q15.[15DayTotalVolume],
q15.[15DayAvgVolume],
q30.[30DayTotalVolume],
q30.[30DayAvgVolume]
FROM
(qry10DayVolumes AS q10
INNER JOIN qry15DayVolumes AS q15
ON q10.SYMBOL = q15.SYMBOL)
INNER JOIN qry30DayVolumes AS q30
ON q10.SYMBOL = q30.SYMBOL;
That assumes you have created qry15DayVolumes and qry30DayVolumes following the approach I suggested for qry10DayVolumes.
If you want to cut down the number of queries, you could use subqueries for each of the qry??DayVolumes saved queries, but try it this way first to make sure the logic is correct.
In that second query above, there can be a problem due to field names which start with digits. Enclose those names in square brackets or re-alias them in qry10DayVolumes, qry15DayVolumes, and qry30DayVolumes using alias names which begin with letters instead of digits.
I tested the query as written above with the "2nd Upload.mdb" you uploaded, and it ran without error from Access 2007. Here is the first row of the result set from that query:
SYMBOL 10DayTotalVolume 10DayAvgVolume 15DayTotalVolume 15DayAvgVolume 30DayTotalVolume 30DayAvgVolume
ACC-1 42909 4290.9 54892 3659.46666666667 89669 2988.96666666667
Access doesn't support most advanced SQL syntax and clauses, so this is a bit of a hack, but it works, and is fast on your small sample. You're basically running 3 queries but the Union clauses allow you to combine into one:
select
Symbol,
sum([10DayTotalVol]) as 10DayTotalV,
sum([10DayAvgVol]) as 10DayAvgV,
sum([15DayTotalVol]) as 15DayTotalV,
sum([15DayAvgVol]) as 15DayAvgV,
sum([30DayTotalVol]) as 30DayTotalV,
sum([30DayAvgVol]) as 30DayAvgV
from (
select
Symbol,
sum(volume) as 10DayTotalVol, avg(volume) as 10DayAvgVol,
0 as 15DayTotalVol, 0 as 15DayAvgVol,
0 as 30DayTotalVol, 0 as 30DayAvgVol
from
[tbl-b]
where
timestamp >= (select min(ts) from (select distinct top 10 timestamp as ts from [tbl-b] order by timestamp desc ))
group by
Symbol
UNION
select
Symbol,
0, 0,
sum(volume), avg(volume),
0, 0
from
[tbl-b]
where
timestamp >= (select min(ts) from (select distinct top 15 timestamp as ts from [tbl-b] order by timestamp desc ))
group by
Symbol
UNION
select
Symbol,
0, 0,
0, 0,
sum(volume), avg(volume)
from
[tbl-b]
where
timestamp >= (select min(ts) from (select distinct top 30 timestamp as ts from [tbl-b] order by timestamp desc ))
group by
Symbol
) s
group by
Symbol

MS Access row number, specify an index

Is there a way in MS access to return a dataset between a specific index?
So lets say my dataset is:
rank | first_name | age
1 Max 23
2 Bob 40
3 Sid 25
4 Billy 18
5 Sally 19
But I only want to return those records between 'rank' 2 and 4, so my results set is Bob, Sid and Billy? However, Rank is not part of the table, and this should be generated when the query is run. Why don't I use an autogenerated number, because if a record is deleted, this will be inconsistent, and what if I wanted the results in reverse!
This obviously very simple, and the reason I ask is because I am working on a product catalogue and I am looking for a more efficient way of paging through the returned dataset, so if I only return 1 page worth of data from the database this is obviously going to be quicker then return a complete set of 3000 records and then having to subselect from that set!
Thanks R.
Original suggestion:
SELECT * from table where rank BETWEEN 2 and 4;
Modified after comment, that rank is not existing in structure:
Select top 100 * from table;
And if you want to choose subsequent results, you can choose the ID of the last record from the first query, say it was ID 101, and use a WHERE clause to get the next 100;
Select top 100 * from table where ID > 100;
But these won't give you what you're looking for either, I bet.
How are you calculating rank? I assume you are basing it on some data in another dataset somewhere. If so, create a function, do a table join, or do something that can calculate rank based on values in other table(s), then you can do queries based on the rank() function.
For example:
select *
from table
where rank() between 2 and 4
If you are not calculating rank based on some data somewhere, there really isn't a way to write this query, and you might as well be returning three random rows from the table.
I think you need to use a correlated subquery to calculate the rank on the fly e.g. I'm guessing the rank is based on name:
SELECT T1.first_name, T1.age,
(
SELECT COUNT(*) + 1
FROM MyTable AS T2
WHERE T1.first_name > T2.first_name
) AS rank
FROM MyTable AS T1;
The bad news is the Access data engine is poorly optimized for this kind of query; in my experience, performace will start to noticeably degrade beyond a few hundred rows.
If it is not possible to maintain the rank on the db side of the house (e.g. high insertion environment) consider doing the paging on the client side. For example, an ADO classic recordset object has properties to support paging (PageCount, PageSize, AbsolutePage, etc), something for which DAO recordsets (being of an older vintage) have no support.
As always, you'll have to perform your own timings but I suspect that when there are, say, 10K rows you will find it faster to take on the overhead of fetching all the rows to an ADO recordset then finding the page (then perhaps fabricate smaller ADO recordset consisting of just that page's worth of rows) than it is to perform a correlated subquery to only fetch the number of rows for the page.
Unfortunately the LIMIT keyword isn't available in MS Access -- that's what is used in MySQL for a multi-page presentation. If you can write an order key into the results table, then you can use it something like this:
SELECT TOP 25 MyOrder, Etc FROM Table1 WHERE MyOrder in
(SELECT TOP 55 MyOrder FROM Table1 ORDER BY MyOrder DESC)
ORDER BY MyOrder ASCENDING
If I understand you correctly, there is ionly first_name and age columns in your table. If this is the case, then there is no way to return Bob, Sid, and Billy with a single query. Unless you do something like
SELECT * FROM Table
WHERE FirstName = 'Bob'
OR FirstName = 'Sid'
OR FirstName = 'Billy'
But I think that this is not what you are looking for.
This is because SQL databases make no guarantee as to the order that the data will come out of the database unless you specify an ORDER BY clause. It will usually come out in the same order it was added, but there are no guarantees, and once you get a lot of rows in your table, there's a reasonably high probability that they won't come out in the order you put them in.
As a side note, you should probably add a "rank" column (this column is usually called id) to your table, and make it an auto incrementing integer (see Access documentation), so that you can do the query mentioned by Sev. It's also important to have a primary key so that you can be certain which rows are being updated when you are running an update query, or which rows are being deleted when you run a delete query. For example, if you had 2 people named Max, and they were both 23, how you delete 1 row without deleting the other. If you had another auto incrementing unique column in there, you could specify the unique ID in your query to delete only one.
[ADDITION]
Upon reading your comment, If you add an autoincrement field, and want to read 3 rows, and you know the ID of the first row you want to read, then you can use "TOP" to read 3 rows.
Assuming your data looks like this
ID | first_name | age
1 Max 23
2 Bob 40
6 Sid 25
8 Billy 18
15 Sally 19
You can wuery Bob, Sid and Billy with the following QUERY.
SELECT TOP 3 FirstName, Age
From Table
WHERE ID >= 2
ORDER BY ID

One T-SQL query output to multiple record sets

Don't ask for what, but i need two tables from one SQL query.
Like this...
Select Abc, Dgf from A
and result are two tables
abc
1
1
1
dgf
2
2
2
More details?
Ok lets try.
Now i have sp like this:
SELECT a.* from ActivityView as a with (nolock)
where a.WorkplaceGuid = #WorkplaceGuid
SELECT b.* from ActivityView as a with (nolock)
left join PersonView as b with (nolock) on a.PersonGuid=b.PersonGuid where a.WorkplaceGuid = #WorkplaceGuid
It's cool. But execution time about 22 seconds. I do this because in my programm i have classes that automaticly get data from records set. Class Activity and class Person. That why i can't make it in one recordset. Program didn't parse it.
You can write a stored procedure that has two SELECTs.
SELECT Abc FROM A AS AbcTable;
SELECT Dgf FROM A AS DfgTable;
Depending on your specific requirements, I would consider just submitting two separate queries. I don't see any advantage to combining them.
SQL Server supports legacy COMPUTE BY clause which acts almost like GROUP BY but returns multiple resultsets (the resultsets constituting each group followed by the resultsets with the aggregates):
WITH q AS
(
SELECT 1 AS id
UNION ALL
SELECT 2 AS id
)
SELECT *
FROM q
ORDER BY
id
COMPUTE COUNT(id)
BY id
This, however, is obsolete and is to be removed in the future releases.
Those don't seem to be excessively complicated queries (although select * should in general not be used in production and never when you are doing a join as it needlessly wastes resources sending the value of the joined field twice). Therefore if it is taking 22 seconds, then either you are returning a huge amount of data or you don't have proper indexing.
Have you looked at the execution plans to see what is causing the slowness?

Resources