Optimisation of a MSSQL Query - group by multiple columns - sql-server

Hey guys i could need some advice, i've the following 2 tables
Table Model:
+----------------+---------------+-------------+------------------------+-------------+--------+-----------------+------------------+------------------+------------------------------+------------------------------+----------------------+----------------------+------------------+
| DLTCountryCode | SupplierID | ModelNumber | ModelDescription | Brand | Fedas | MeasurementUnit | MinModelNetPrice | MaxModelNetPrice | MinModelSuggestedRetailPrice | MaxModelSuggestedRetailPrice | MinModelInsteadPrice | MaxModelInsteadPrice | PictureAvailable |
+----------------+---------------+-------------+------------------------+-------------+--------+-----------------+------------------+------------------+------------------------------+------------------------------+----------------------+----------------------+------------------+
| AT | 9120048150008 | 2012266 | xxx | Brand | 115946 | STK | 6.05 | 6.05 | 10.95 | 10.95 | 0 | 0 | 1 |
+----------------+---------------+-------------+------------------------+-------------+--------+-----------------+------------------+------------------+------------------------------+------------------------------+----------------------+----------------------+------------------+
Table ModelColorSizeInventory:
+----------------+---------------+-------------+-----------+------+---------------+----------+-------------------------+
| DLTCountryCode | SupplierID | ModelNumber | ColorCode | Size | ItemNumber | Quantity | InventoryDateTime |
+----------------+---------------+-------------+-----------+------+---------------+----------+-------------------------+
| AT | 9120048150008 | 2012266 | 801 | L | 9008601584968 | 0 | 2017-09-29 11:16:02.347 |
| AT | 9120048150008 | 2012266 | 801 | M | 9008601584951 | 0 | 2017-09-29 11:16:02.347 |
| AT | 9120048150008 | 2012266 | 801 | S | 9008601584944 | 2 | 2017-09-29 11:16:02.347 |
| AT | 9120048150008 | 2012266 | 801 | XL | 9008601584975 | 4 | 2017-09-29 11:16:02.347 |
| AT | 9120048150008 | 2012266 | 801 | XXL | 9008601584982 | 6 | 2017-09-29 11:16:02.347 |
+----------------+---------------+-------------+-----------+------+---------------+----------+-------------------------+
And the following Query:
SELECT dccdm.*, SUM(dccdmcsi.[Quantity]) AS QuantityModel
FROM "Model" AS "dccdm"
LEFT JOIN ModelColorSizeInventory AS dccdmcsi ON dccdm.[ModelNumber] = dccdmcsi.[ModelNumber]
WHERE (
dccdm.ModelNumber IN('2012266')
)
AND dccdmcsi.[Quantity] >0
AND dccdm.[DLTCountryCode]='AT'
GROUP BY dccdm.[DLTCountryCode],dccdm.[SupplierID],dccdm.[ModelNumber],dccdm.[ModelDescription],dccdm.[Brand],dccdm.[Fedas],dccdm.[MeasurementUnit],dccdm.[MinModelNetPrice],dccdm.[MaxModelNetPrice],dccdm.[MinModelSuggestedRetailPrice],dccdm.[MaxModelSuggestedRetailPrice],dccdm.[MinModelInsteadPrice],dccdm.[MaxModelInsteadPrice],dccdm.[PictureAvailable]
This Query works as expected, i'm joining ModelColorSizeInventory to find out the sum of all variants with quantities
One thing that bothers me is the group by part, because if i skip the group by statement i'm getting the following error:
Msg 8120, Column 'DLTCountryCode' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Since i'm not that familiar with MSSQL i ask the following question:
How can i write down this query without using this complex GROUP BY clause
The reason behind this question is that writing down multiple columns in a group by statement feels wrong, in queries like these ... ;)

You could use "subview" for grouped amounts, or simply use subquery in join, like:
Select dccdm.*, Isnull(dccdmcsi.SumQuantity,0)
FROM dbo.[Model] dccdm LEFT JOIN
(Select ModelNumber, SUM([Quantity]) as SumQuantity from dbo.ModelColorSizeInventory GROUP BY ModelNumber) dccdmcsi
ON dccdm.ModelNumber=dccdmcsi.ModelNumber

Related

How can I improve the response time of this query in Oracle

this query takes 24 seconds and returns 1891 results:
SELECT p.STATE, p.REFNUM, p.CODE, p.TYPE, i.STATE, pj.NAME, pj.DOCUMENT
TABLE p
inner join TABLE2 i on i.REFNUM = p.REFNUM
inner join TABLE3 pj on pj.NUMBER = i.NUMBER and p.OFIC_ID = pj.OFIC_ID and p.PUB_ID = pj.PUB_ID
inner join OFICE o on t.OFIC_ID = p.OFIC_ID and o.PUB_ID = p.PUB_ID
inner join GROUP glad on glad.GROUP_CODE=p.GROUP_CODE
WHERE glad.GROUP_TYPE ='3' AND i.STATE = '1'
AND p.PUB_ID IN ('05','11','12','09','08','13','04','02','01','06','10','03','07','14')
AND pj.NAME LIKE 'BANK%'
ORDER BY o.NAME,p.ID;
I have these indexes:
CREATE INDEX IND_TABLE1_REFNUM_ZONE ON TABLE1 (PUB_ID, OFIC_ID, REFNUM, ZONE_ID)
CREATE INDEX IND_TABLE1_REFNUMPUB ON TABLE1 (REFNUM, PUB_ID, OFIC_ID, GROUP_CODE );
CREATE INDEX IND_TABLE1_GROUP ON TABLE1 (PUB_ID, GROUP_CODE, REFNUM, OFIC_ID)
CREATE INDEX IND_TABLE2_REF ON TABLE2 (REFNUM, NUMBER, STATE);
CREATE INDEX IND_TABLE2_QUERY ON TABLE2 (NUMBER, TYPE, STATE, REFNUM, NUM, CODE);
CREATE INDEX IND_TABLE2_REFNUM ON TABLE2 (REFNUM)
CREATE INDEX IND_TABLE2_NUMBER ON TABLE2 (NUMBER)
CREATE INDEX IND_TABLE3_NUM ON TABLE3 (NUMBER, PUB_ID, OFIC_ID, NAME );
CREATE INDEX IND_TABLE3_NAME ON TABLE3 ( NAME );
CREATE INDEX IND_GROUP_COD ON GROUP (GROUP_CODE, GROUP_TYPE)
I made the following queries to see how many records are in each table:
SELECT count(*) FROM TABLE1 --> 18298458 results
SELECT count(*) FROM TABLE2 --> 60627924 results
SELECT count(*) FROM TABLE3 --> 18425913 results
SELECT count(*) FROM OFICE --> 65 results
SELECT count(*) FROM TABLE1 p INNER JOIN GROUP glad on glad.GROUP_CODE=p.GROUP_CODE where glad.GROUP_TYPE ='3' AND p.PUB_ID IN ('05','11','12','09','08','13','04','02','01','06','10','03','07','14') --> 1314077 results
SELECT count(*) FROM TABLE1 p INNER JOIN GROUP glad on glad.GROUP_CODE=p.GROUP_CODE where glad.GROUP_TYPE ='3' AND p.PUB_ID IN ('05') --> 53754 results
SELECT count(*) FROM TABLE3 WHERE NAME LIKE 'BANK%' --> 1922081 results
this is the plan generated by oracle:
-----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 291K| 38M| 384K| | |
| 1 | SORT ORDER BY | | 291K| 38M| 384K| | |
| 2 | HASH JOIN | | 291K| 38M| 375K| | |
| 3 | TABLE ACCESS FULL | OFICE | 64 | 960 | 3 | | |
| 4 | HASH JOIN | | 291K| 34M| 375K| | |
| 5 | INDEX SKIP SCAN | IND_GROUP_COD | 47 | 329 | 1 | | |
| 6 | HASH JOIN | | 452K| 50M| 375K| | |
| 7 | PART JOIN FILTER CREATE | :BF0000 | 452K| 50M| 375K| | |
| 8 | NESTED LOOPS | | 452K| 50M| 375K| | |
| 9 | NESTED LOOPS | | | | | | |
| 10 | STATISTICS COLLECTOR | | | | | | |
| 11 | HASH JOIN | | 2100K| 166M| 252K| | |
| 12 | NESTED LOOPS | | 2100K| 166M| 252K| | |
| 13 | STATISTICS COLLECTOR | | | | | | |
| 14 | PARTITION RANGE ALL | | 1681K| 89M| 82582 | 1 | 19 |
| 15 | PARTITION HASH ALL | | 1681K| 89M| 82582 | 1 | 32 |
| 16 | TABLE ACCESS FULL | TABLE3 | 1681K| 89M| 82582 | 1 | 608 |
| 17 | INDEX RANGE SCAN | IND_TABLE2_QUERY | 1 | 27 | 103K| | |
| 18 | INDEX FAST FULL SCAN | IND_TABLE2_QUERY | 32M| 845M| 103K| | |
| 19 | INDEX RANGE SCAN | IND_TABLE1_REFNUM_ZONE| | | | | |
| 20 | TABLE ACCESS BY GLOBAL INDEX ROWID| TABLE1 | 1 | 35 | 70380 | ROWID | ROWID |
| 21 | PARTITION RANGE ALL | | 19M| 650M| 70380 | 1 | 19 |
| 22 | PARTITION HASH JOIN-FILTER | | 19M| 650M| 70380 |:BF0000|:BF0000|
| 23 | TABLE ACCESS FULL | TABLE1 | 19M| 650M| 70380 | 1 | 608 |
-----------------------------------------------------------------------------------------------------------------
I think it takes time because this one is using TABLE ACCESS FULL for TABLE1 and TABLE3
if I perform the query filtering only PUB_ID='05' instead of all the numbers in the above query, the query returns 181 results and takes 8 seconds and in that case oracle generates this plan:
--------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 797 | 108K| 312K| | |
| 1 | SORT ORDER BY | | 797 | 108K| 312K| | |
| 2 | NESTED LOOPS | | 797 | 108K| 312K| | |
| 3 | HASH JOIN | | 1238 | 160K| 312K| | |
| 4 | TABLE ACCESS BY INDEX ROWID BATCHED| OFICE | 3 | 45 | 2 | | |
| 5 | INDEX RANGE SCAN | SYS_C0034405 | 3 | | 1 | | |
| 6 | HASH JOIN | | 2091 | 240K| 312K| | |
| 7 | PART JOIN FILTER CREATE | :BF0000 | 66316 | 5375K| 241K| | |
| 8 | NESTED LOOPS | | 66316 | 5375K| 241K| | |
| 9 | PARTITION RANGE ALL | | 53085 | 2903K| 82490 | 1 | 19 |
| 10 | PARTITION HASH ALL | | 53085 | 2903K| 82490 | 1 | 32 |
| 11 | TABLE ACCESS FULL | TABLE3 | 53085 | 2903K| 82490 | 1 | 608 |
| 12 | INDEX RANGE SCAN | IND_TABLE2_QUERY | 1 | 27 | 3 | | |
| 13 | PARTITION RANGE ALL | | 762K| 25M| 68657 | 1 | 19 |
| 14 | PARTITION HASH JOIN-FILTER | | 762K| 25M| 68657 |:BF0000|:BF0000|
| 15 | TABLE ACCESS FULL | TABLE1 | 762K| 25M| 68657 | 1 | 608 |
| 16 | INDEX RANGE SCAN | IND_GROUP_COD | 1 | 7 | 0 | | |
--------------------------------------------------------------------------------------------------------------
SYS_C0034405 is the primary key of OFFICE which contains these fields: (PUB_ID, REG_ID)
if in addition to filtering only PUB_ID='05' I remove the "order by", the query takes only 3.5 seconds but I definitely have to return the ordered data and I would prefer to be able to filter several PUB_IDs
I thought the query could be improved if I removed the "inner join" from GROUP and changed the filter "glad.GROUP_TYPE ='3'" to "p.GROUP_CODE in ('01','07','10','21 ')" (these are all type 3 codes), because now it should use the IND_TABLE1_GROUP index but instead of improving, it gets worse, it takes 13 seconds even filtering only PUB_ID='05'; This is the plan that oracle generates:
------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 47 | 6251 | 172K| | |
| 1 | SORT ORDER BY | | 47 | 6251 | 172K| | |
| 2 | HASH JOIN | | 47 | 6251 | 172K| | |
| 3 | PARTITION RANGE ALL | | 47786 | 2613K| 82490 | 1 | 19 |
| 4 | PARTITION HASH ALL | | 47786 | 2613K| 82490 | 1 | 32 |
| 5 | TABLE ACCESS FULL | TABLE3 | 47786 | 2613K| 82490 | 1 | 608 |
| 6 | HASH JOIN | | 41945 | 3154K| 89633 | | |
| 7 | NESTED LOOPS | | 41945 | 3154K| 89633 | | |
| 8 | NESTED LOOPS | | 75740 | 3154K| 89633 | | |
| 9 | STATISTICS COLLECTOR | | | | | | |
| 10 | NESTED LOOPS | | 18935 | 924K| 15985 | | |
| 11 | TABLE ACCESS BY INDEX ROWID BATCHED | OFICE | 3 | 45 | 2 | | |
| 12 | INDEX RANGE SCAN | SYS_C0034405 | 3 | | 1 | | |
| 13 | INLIST ITERATOR | | | | | | |
| 14 | TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| TABLE1 | 6312 | 215K| 7039 | ROWID | ROWID |
| 15 | INDEX RANGE SCAN | IND_TABLE1_GROUP | 6828 | | 284 | | |
| 16 | INDEX RANGE SCAN | IND_TABLE2_REFNUM | 4 | | 2 | | |
| 17 | TABLE ACCESS BY INDEX ROWID | TABLE2 | 2 | 54 | 4 | | |
| 18 | INDEX FAST FULL SCAN | IND_TABLE2_QUERY | 2 | 54 | 2 | | |
------------------------------------------------------------------------------------------------------------------------------
And if I put all the PUB_IDs, Oracle generates this plan (it doesn't even use the IND_TABLE1_GROUP index anymore):
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Pstart| Pstop |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 35661 | 4631K| 345K| | |
| 1 | SORT ORDER BY | | 35661 | 4631K| 345K| | |
| 2 | HASH JOIN | | 35661 | 4631K| 344K| | |
| 3 | TABLE ACCESS FULL | OFICE | 64 | 960 | 3 | | |
| 4 | HASH JOIN | | 35661 | 4109K| 344K| | |
| 5 | PARTITION RANGE ALL | | 1360K| 45M| 79580 | 1 | 19 |
| 6 | PARTITION HASH ALL | | 1360K| 45M| 79580 | 1 | 32 |
| 7 | TABLE ACCESS FULL | TABLE1 | 1360K| 45M| 79580 | 1 | 608 |
| 8 | HASH JOIN | | 2100K| 166M| 252K| | |
| 9 | NESTED LOOPS | | 2100K| 166M| 252K| | |
| 10 | STATISTICS COLLECTOR | | | | | | |
| 11 | PARTITION RANGE ALL | | 1681K| 89M| 82582 | 1 | 19 |
| 12 | PARTITION HASH ALL | | 1681K| 89M| 82582 | 1 | 32 |
| 13 | TABLE ACCESS FULL | TABLE3 | 1681K| 89M| 82582 | 1 | 608 |
| 14 | INDEX RANGE SCAN | IND_TABLE2_QUERY | 1 | 27 | 103K| | |
| 15 | INDEX FAST FULL SCAN | IND_TABLE2_QUERY | 32M| 845M| 103K| | |
-------------------------------------------------------------------------------------------------

Results of join listed in rows vs additional columns?

I have 2 tables, with the same exact fields and fields names. i am trying to inner join them but im having some difficulty determining how i can get my results in my desired format.
I know i can do select a.customer, a.id, a.date, a.line, a.product, b.customer, b.id, b.date, b.line, b.product but instead of having my A data and B data on the same row, id like for them to be on seperate rows.
I have 2 tables, with the same exact fields and fields names, i am trying to inner join them so that unique line becomes a row.
Table A:
|customer| id | Date | line | Product|
|--------|-----|---------|------|--------|
| 445678 | 123 | 1/1/22 | 10 | 88975 |
| 853652 | 456 | 1/10/22 | 5 | 55876 |
| 845689 | 789 | 1/25/22 | 1 | 45587 |
TABLE B:
|customer| id | Date | line | Product|
|--------|-----|---------|------|--------|
| 445678 | 489 | 1/1/22 | 1 | 87574 |
| 853652 | 853 | 1/10/22 | 12 | 45678 |
| 587435 | 157 | 2/12/22 | 3 | 25896 |
DESIRED RESULTS:
|customer| id | Date | line | Product|
|--------|-----|---------|------|--------|
| 445678 | 123 | 1/1/22 | 10 | 88975 |
| 445678 | 489 | 1/1/22 | 1 | 87574 |
| 853652 | 456 | 1/10/22 | 5 | 55876 |
| 853652 | 853 | 1/10/22 | 12 | 45678 |
my query:
select a.customer, a.id, a.date, a.line, a.product
from data1 a
inner join data2 b
on a.date = b.date
and a.customer = b.customer

Cumulative Count of NULL restarting at NOT NULL

I would like to add a column indicating the number invites a person received before they accepted by incrementally counting the number of null columns before a non-null while partitioning over the PERSON_ID and ordering by the INVITED_DATE.
My table has the following format:
| UNIQUE_ID | PERSON_ID | INVITED_DATE | ACCEPTED_DATE |
| 12345 | 567 | 12-01-18 | NULL |
| 12346 | 567 | 12-02-18 | NULL |
| 12347 | 567 | 12-03-18 | NULL |
| 12348 | 567 | 12-04-18 | 12-04-18 |
| 12349 | 567 | 12-05-18 | NULL |
| 12350 | 568 | 12-01-18 | NULL |
| 12351 | 568 | 12-02-18 | 12-02-18 |
The output should ideally look like the following:
| UNIQUE_ID | PERSON_ID | INVITED_DATE | ACCEPTED_DATE | INVITES_BEFORE_ACCEPT |
| 12345 | 567 | 12-01-18 | NULL | 1 |
| 12346 | 567 | 12-02-18 | NULL | 2 |
| 12347 | 567 | 12-03-18 | NULL | 3 |
| 12348 | 567 | 12-04-18 | 12-04-18 | 0 |
| 12349 | 567 | 12-05-18 | NULL | 1 |
| 12350 | 568 | 12-01-18 | NULL | 1 |
| 12351 | 568 | 12-02-18 | 12-02-18 | 0 |
So far I've tried a number iterations of ROW NUMBER with OVER and PARTITION but I've found it will need to be an OUTER APPLY. The following OUTER APPLY counts over the data but doesn't restart the count with a successful accept.
SELECT t.* , invites.INVITES_BEFORE_ACCEPT
FROM table t
OUTER APPLY (
SELECT COUNT(*) INVITES_BEFORE_ACCEPT
FROM table t2
WHERE t.PERSON_ID = t2.PERSON_ID and t.INVITED_DATE < t2.ACCEPTED_DATE
) invites
One way would be
WITH t
AS (SELECT *,
COUNT(ACCEPTED_DATE)
OVER (
PARTITION BY PERSON_ID
ORDER BY INVITED_DATE) AS Grp
FROM [table])
SELECT *,
SUM(CASE
WHEN ACCEPTED_DATE IS NULL
THEN 1
ELSE 0
END)
OVER (
PARTITION BY PERSON_ID, Grp
ORDER BY INVITED_DATE) AS INVITES_BEFORE_ACCEPT
FROM t
Demo

Find the newest entry of a crosstable per record?

I have three tables:
My products with their IDs and their features.
is a table with treatments of my products with a treatment-ID, a method, and a date. The treatments are done in batches of many products so there is a crosstable
with the products IDs and the treatment IDs and a bool value for the success of the treatment.
Each product can undergo many different treatments so there is a many-to-many relation. I now want to add to the product table (1.) for every product a value that shows the method of its most recent successful treatment if there is any.
I made a query that groups the crosstable's entries by product-ID but I don't know how to show the method and date of it's last treatment.
table 1:
| productID | size | weight | height | ... |
|-----------|:----:|-------:|--------|-----|
| 1 | 13 | 16 | 9 | ... |
| 2 | 12 | 17 | 12 | ... |
| 3 | 11 | 15 | 15 | ... |
| ... | ... | ... | ... | ... |
table 2:
| treatmentID | method | date |
|-------------|:--------:|-----------:|
| 1 | dye blue | 01.02.2016 |
| 2 | dye red | 01.02.2017 |
| 3 | dye blue | 01.02.2018 |
| ... | ... | ... |
table 3:
| productID | treatmentID | success |
|-----------|:-----------:|--------:|
| 1 | 1 | yes |
| 1 | 2 | yes |
| 1 | 3 | no |
| ... | ... | ... |
I need table 1 to be like:
table 1:
| productID | size | weight | height | latest succesful method |
|-----------|:----:|-------:|--------|-------------------------|
| 1 | 13 | 16 | 9 | dye red |
| 2 | 12 | 17 | 12 | ... |
| 3 | 11 | 15 | 15 | ... |
| ... | ... | ... | ... | ... |
My query:
SELECT table3.productID, table2.method
FROM table2 INNER JOIN table3 ON table2.treatmentID = table3.treatmentID
GROUP BY table3.productID, table2.method
HAVING (((table3.productID)=Max([table2].[date])))
ORDER BY table3.productID DESC;
but this does NOT show only one (the most recent) entry but all of them.
Simplest solution here would be to write either a subquery within your sql, or create a new query to act as a subquery(it will look like a table) to help indicate(or elminate) the records you want to see.
Using similar but potentially slightly different source data as you only gave one example.
Table1
| ProductID | Size | Weight | Height |
|-----------|------|--------|--------|
| 1 | 13 | 16 | 9 |
| 2 | 12 | 17 | 12 |
| 3 | 11 | 15 | 15 |
Table2
| TreatmentID | Method | Date |
|-------------|------------|----------|
| 1 | dye blue | 1/2/2016 |
| 2 | dye red | 1/2/2017 |
| 3 | dye blue | 1/2/2018 |
| 4 | dye yellow | 1/4/2017 |
| 5 | dye brown | 1/5/2018 |
Table3
| ProductID | TreatmentID | Success |
|-----------|-------------|---------|
| 1 | 1 | yes |
| 1 | 2 | yes |
| 1 | 3 | no |
| 2 | 4 | no |
| 2 | 5 | yes |
First order of business is to get the max(dates) and productIds of successful treatments.
We'll do this by aggregating the date along with the productIDs and "success".
SELECT Table3.productid, Max(Table2.Date) AS MaxOfdate, Table3.success
FROM Table2 INNER JOIN Table3 ON Table2.treatmentid = Table3.treatmentid
GROUP BY Table3.productid, Table3.success;
This should give us something along the lines of:
| ProductID | MaxofDate | Success |
|-----------|-----------|---------|
| 1 | 1/2/2018 | No |
| 1 | 1/2/2017 | Yes |
| 2 | 1/4/2017 | No |
| 2 | 1/8/2017 | Yes |
We'll save this query as a "regular" query. I named mine "max", you should probably use something more descriptive. You'll see "max" in this next query.
Next we'll join tables1-3 together but in addition we will also use this "max" subquery to link tables 1 and 2 by the productID and MaxOfDate to TreatmentDate where success = "yes" to find the details of the most recent SUCCESSFUL treatment.
SELECT table1.productid, table1.size, table1.weight, table1.height, Table2.method
FROM ((table1 INNER JOIN [max] ON table1.productid = max.productid)
INNER JOIN Table2 ON max.MaxOfdate = Table2.date) INNER JOIN Table3 ON
(Table2.treatmentid = Table3.treatmentid) AND (table1.productid = Table3.productid)
WHERE (((max.success)="yes"));
The design will look something like this:
Design
(ps. you can add queries to your design query editor by clicking on the "Queries" tab when you are adding tables to your query design. They act just like tables, just be careful as very detailed queries tend to bog down Access)
Running this query should give us our final results.
| ProductID | Size | Weight | Height | Method |
|-----------|------|--------|--------|-----------|
| 1 | 13 | 16 | 9 | dye red |
| 2 | 12 | 17 | 12 | dye brown |

Join 2 tables by matching children

I'm trying to have 2 tables (In this case it's actually 1 table in a self join) joined by their matching children.
Let me preface the purpose of this which might give a better understanding what I need:
I'm trying to look up a new order that I just got, to see if we ever had the same order, in order to find out in which box type this would be packaged.
So i'd need the matching order to contain the same item and the same qty for the item.
Look at the tables below and note that order 1300981 has the same items as order 1303097, how do I write this join?
Remember: I don't want the results to include any matches that do not match %100.
SQL Fiddle
OrderMain:
| OrderID | BoxId |
|---------|--------|
| 1300981 | 34 |
| 1303096 | (null) |
| 1303097 | (null) |
| 1303098 | (null) |
| 1303099 | (null) |
| 1303100 | (null) |
| 1303101 | (null) |
| 1303102 | (null) |
| 1303103 | (null) |
| 1303104 | B1 |
| 1303105 | (null) |
| 1303106 | (null) |
| 1303107 | 48 |
| 1303108 | (null) |
| 1303109 | (null) |
| 1303110 | (null) |
| 1303111 | (null) |
| 1303112 | (null) |
| 1303113 | (null) |
| 1303114 | (null) |
| 1303115 | (null) |
| 1303116 | (null) |
| 1303117 | (null) |
Order Detail:
| id | OrderID | Item | Qty |
|----|---------|--------|-----|
| 1 | 1300981 | 172263 | 3 |
| 2 | 1300981 | 171345 | 3 |
| 3 | 1300981 | 138757 | 3 |
| 4 | 1303117 | 231711 | 1 |
| 5 | 1303116 | 227835 | 1 |
| 6 | 1303115 | 244798 | 1 |
| 7 | 1303114 | 121755 | 1 |
| 8 | 1303113 | 145275 | 2 |
| 9 | 1303112 | 219554 | 1 |
| 10 | 1303111 | 179385 | 1 |
| 11 | 1303110 | 6229 | 1 |
| 12 | 1303109 | 217330 | 1 |
| 13 | 1303108 | 243596 | 1 |
| 14 | 1303107 | 246758 | 1 |
| 15 | 1303106 | 193931 | 1 |
| 16 | 1303105 | 244659 | 1 |
| 17 | 1303104 | 192548 | 1 |
| 18 | 1303103 | 228410 | 1 |
| 19 | 1303102 | 147474 | 1 |
| 20 | 1303101 | 239191 | 1 |
| 21 | 1303100 | 243594 | 1 |
| 22 | 1303099 | 232301 | 1 |
| 23 | 1303098 | 201212 | 1 |
| 24 | 1303097 | 172263 | 3 |
| 25 | 1303097 | 171345 | 3 |
| 26 | 1303097 | 138757 | 3 |
| 27 | 1303096 | 172263 | 3 |
| 28 | 1303096 | 171345 | 1 |
| 29 | 1303096 | 138757 | 3 |
| 30 | 1303095 | 172263 | 3 |
Expected Results
| OrderID | BoxId |
|---------|--------|
| 1303097 | 34 |
May be a weird way to do this, but if you convert the order details to xml and compare it to other orders, you can look for matches.
WITH BoxOrders AS
(
SELECT om.[OrderId],
om.[BoxId],
(SELECT Item, Qty
FROM orderDetails od
WHERE od.[OrderId] = om.[OrderId]
ORDER BY Item
FOR XML PATH('')) Details
FROM orderMain om
WHERE BoxID IS NOT NULL
)
SELECT mo.OrderId, bo.BoxId
FROM BoxOrders bo
JOIN (
SELECT om.[OrderId],
om.[BoxId],
(SELECT Item, Qty
FROM orderDetails od
WHERE od.[OrderId] = om.[OrderId]
ORDER BY Item
FOR XML PATH('')) Details
FROM orderMain om
WHERE BoxID IS NULL
) mo ON bo.Details = mo.Details
SQL Fiddle
Here's a different approach using SQL and a few analytics.
This joins order detail to itself based on item and qty and order number < other order number and ensures the count of items in each order matches. Thus if items match, count matches and qty matches then the order has the same items.
This returns both orders but easily enough to adjust. Using the CTE so the count materializes. Pretty sure you can't use a having with an analytic like this.
The one major assumption I'm making is that order numbers are sequential and when you say see if an older order exists, I should only need to look at earlier order numbers when evaluating if a prior order had the same items and quantities.
I'm also assuming a 100% match means: Exact same items. Same Quantity of items. and SAME Item Count so count of items for order 1 is 3 and order 2 is 3 and items and quantities match that is 100% but if order 2 had 4 items and order 1 only had 3, no match.
with cte as (
SELECT distinct OD1.OrderID PriorOrder, od2.orderID newOrder, OM.BoxId,
count(OD1.Item) over (partition by OD1.OrderID) OD1Cnt,
count(OD2.Item) over (partition by OD2.OrderID) OD2cnt
FROM OrderDetails OD1
INNER JOIN orderDetails OD2
on OD1.item=OD2.item
and od1.qty = od2.qty
and OD1.OrderID < OD2.OrderID
LEFT JOIN ORderMain OM
on OM.OrderID = OD1.orderID)
Select PriorOrder, NewOrder, boxID from cte where od1cnt = od2cnt

Resources