How can I improve the response time of this query in Oracle - database

this query takes 24 seconds and returns 1891 results:
SELECT p.STATE, p.REFNUM, p.CODE, p.TYPE, i.STATE, pj.NAME, pj.DOCUMENT
TABLE p
inner join TABLE2 i on i.REFNUM = p.REFNUM
inner join TABLE3 pj on pj.NUMBER = i.NUMBER and p.OFIC_ID = pj.OFIC_ID and p.PUB_ID = pj.PUB_ID
inner join OFICE o on t.OFIC_ID = p.OFIC_ID and o.PUB_ID = p.PUB_ID
inner join GROUP glad on glad.GROUP_CODE=p.GROUP_CODE
WHERE glad.GROUP_TYPE ='3' AND i.STATE = '1'
AND p.PUB_ID IN ('05','11','12','09','08','13','04','02','01','06','10','03','07','14')
AND pj.NAME LIKE 'BANK%'
ORDER BY o.NAME,p.ID;
I have these indexes:
CREATE INDEX IND_TABLE1_REFNUM_ZONE ON TABLE1 (PUB_ID, OFIC_ID, REFNUM, ZONE_ID)
CREATE INDEX IND_TABLE1_REFNUMPUB ON TABLE1 (REFNUM, PUB_ID, OFIC_ID, GROUP_CODE );
CREATE INDEX IND_TABLE1_GROUP ON TABLE1 (PUB_ID, GROUP_CODE, REFNUM, OFIC_ID)
CREATE INDEX IND_TABLE2_REF ON TABLE2 (REFNUM, NUMBER, STATE);
CREATE INDEX IND_TABLE2_QUERY ON TABLE2 (NUMBER, TYPE, STATE, REFNUM, NUM, CODE);
CREATE INDEX IND_TABLE2_REFNUM ON TABLE2 (REFNUM)
CREATE INDEX IND_TABLE2_NUMBER ON TABLE2 (NUMBER)
CREATE INDEX IND_TABLE3_NUM ON TABLE3 (NUMBER, PUB_ID, OFIC_ID, NAME );
CREATE INDEX IND_TABLE3_NAME ON TABLE3 ( NAME );
CREATE INDEX IND_GROUP_COD ON GROUP (GROUP_CODE, GROUP_TYPE)
I made the following queries to see how many records are in each table:
SELECT count(*) FROM TABLE1 --> 18298458 results
SELECT count(*) FROM TABLE2 --> 60627924 results
SELECT count(*) FROM TABLE3 --> 18425913 results
SELECT count(*) FROM OFICE --> 65 results
SELECT count(*) FROM TABLE1 p INNER JOIN GROUP glad on glad.GROUP_CODE=p.GROUP_CODE where glad.GROUP_TYPE ='3' AND p.PUB_ID IN ('05','11','12','09','08','13','04','02','01','06','10','03','07','14') --> 1314077 results
SELECT count(*) FROM TABLE1 p INNER JOIN GROUP glad on glad.GROUP_CODE=p.GROUP_CODE where glad.GROUP_TYPE ='3' AND p.PUB_ID IN ('05') --> 53754 results
SELECT count(*) FROM TABLE3 WHERE NAME LIKE 'BANK%' --> 1922081 results
this is the plan generated by oracle:
-----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 291K| 38M| 384K| | |
| 1 | SORT ORDER BY | | 291K| 38M| 384K| | |
| 2 | HASH JOIN | | 291K| 38M| 375K| | |
| 3 | TABLE ACCESS FULL | OFICE | 64 | 960 | 3 | | |
| 4 | HASH JOIN | | 291K| 34M| 375K| | |
| 5 | INDEX SKIP SCAN | IND_GROUP_COD | 47 | 329 | 1 | | |
| 6 | HASH JOIN | | 452K| 50M| 375K| | |
| 7 | PART JOIN FILTER CREATE | :BF0000 | 452K| 50M| 375K| | |
| 8 | NESTED LOOPS | | 452K| 50M| 375K| | |
| 9 | NESTED LOOPS | | | | | | |
| 10 | STATISTICS COLLECTOR | | | | | | |
| 11 | HASH JOIN | | 2100K| 166M| 252K| | |
| 12 | NESTED LOOPS | | 2100K| 166M| 252K| | |
| 13 | STATISTICS COLLECTOR | | | | | | |
| 14 | PARTITION RANGE ALL | | 1681K| 89M| 82582 | 1 | 19 |
| 15 | PARTITION HASH ALL | | 1681K| 89M| 82582 | 1 | 32 |
| 16 | TABLE ACCESS FULL | TABLE3 | 1681K| 89M| 82582 | 1 | 608 |
| 17 | INDEX RANGE SCAN | IND_TABLE2_QUERY | 1 | 27 | 103K| | |
| 18 | INDEX FAST FULL SCAN | IND_TABLE2_QUERY | 32M| 845M| 103K| | |
| 19 | INDEX RANGE SCAN | IND_TABLE1_REFNUM_ZONE| | | | | |
| 20 | TABLE ACCESS BY GLOBAL INDEX ROWID| TABLE1 | 1 | 35 | 70380 | ROWID | ROWID |
| 21 | PARTITION RANGE ALL | | 19M| 650M| 70380 | 1 | 19 |
| 22 | PARTITION HASH JOIN-FILTER | | 19M| 650M| 70380 |:BF0000|:BF0000|
| 23 | TABLE ACCESS FULL | TABLE1 | 19M| 650M| 70380 | 1 | 608 |
-----------------------------------------------------------------------------------------------------------------
I think it takes time because this one is using TABLE ACCESS FULL for TABLE1 and TABLE3
if I perform the query filtering only PUB_ID='05' instead of all the numbers in the above query, the query returns 181 results and takes 8 seconds and in that case oracle generates this plan:
--------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 797 | 108K| 312K| | |
| 1 | SORT ORDER BY | | 797 | 108K| 312K| | |
| 2 | NESTED LOOPS | | 797 | 108K| 312K| | |
| 3 | HASH JOIN | | 1238 | 160K| 312K| | |
| 4 | TABLE ACCESS BY INDEX ROWID BATCHED| OFICE | 3 | 45 | 2 | | |
| 5 | INDEX RANGE SCAN | SYS_C0034405 | 3 | | 1 | | |
| 6 | HASH JOIN | | 2091 | 240K| 312K| | |
| 7 | PART JOIN FILTER CREATE | :BF0000 | 66316 | 5375K| 241K| | |
| 8 | NESTED LOOPS | | 66316 | 5375K| 241K| | |
| 9 | PARTITION RANGE ALL | | 53085 | 2903K| 82490 | 1 | 19 |
| 10 | PARTITION HASH ALL | | 53085 | 2903K| 82490 | 1 | 32 |
| 11 | TABLE ACCESS FULL | TABLE3 | 53085 | 2903K| 82490 | 1 | 608 |
| 12 | INDEX RANGE SCAN | IND_TABLE2_QUERY | 1 | 27 | 3 | | |
| 13 | PARTITION RANGE ALL | | 762K| 25M| 68657 | 1 | 19 |
| 14 | PARTITION HASH JOIN-FILTER | | 762K| 25M| 68657 |:BF0000|:BF0000|
| 15 | TABLE ACCESS FULL | TABLE1 | 762K| 25M| 68657 | 1 | 608 |
| 16 | INDEX RANGE SCAN | IND_GROUP_COD | 1 | 7 | 0 | | |
--------------------------------------------------------------------------------------------------------------
SYS_C0034405 is the primary key of OFFICE which contains these fields: (PUB_ID, REG_ID)
if in addition to filtering only PUB_ID='05' I remove the "order by", the query takes only 3.5 seconds but I definitely have to return the ordered data and I would prefer to be able to filter several PUB_IDs
I thought the query could be improved if I removed the "inner join" from GROUP and changed the filter "glad.GROUP_TYPE ='3'" to "p.GROUP_CODE in ('01','07','10','21 ')" (these are all type 3 codes), because now it should use the IND_TABLE1_GROUP index but instead of improving, it gets worse, it takes 13 seconds even filtering only PUB_ID='05'; This is the plan that oracle generates:
------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 47 | 6251 | 172K| | |
| 1 | SORT ORDER BY | | 47 | 6251 | 172K| | |
| 2 | HASH JOIN | | 47 | 6251 | 172K| | |
| 3 | PARTITION RANGE ALL | | 47786 | 2613K| 82490 | 1 | 19 |
| 4 | PARTITION HASH ALL | | 47786 | 2613K| 82490 | 1 | 32 |
| 5 | TABLE ACCESS FULL | TABLE3 | 47786 | 2613K| 82490 | 1 | 608 |
| 6 | HASH JOIN | | 41945 | 3154K| 89633 | | |
| 7 | NESTED LOOPS | | 41945 | 3154K| 89633 | | |
| 8 | NESTED LOOPS | | 75740 | 3154K| 89633 | | |
| 9 | STATISTICS COLLECTOR | | | | | | |
| 10 | NESTED LOOPS | | 18935 | 924K| 15985 | | |
| 11 | TABLE ACCESS BY INDEX ROWID BATCHED | OFICE | 3 | 45 | 2 | | |
| 12 | INDEX RANGE SCAN | SYS_C0034405 | 3 | | 1 | | |
| 13 | INLIST ITERATOR | | | | | | |
| 14 | TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| TABLE1 | 6312 | 215K| 7039 | ROWID | ROWID |
| 15 | INDEX RANGE SCAN | IND_TABLE1_GROUP | 6828 | | 284 | | |
| 16 | INDEX RANGE SCAN | IND_TABLE2_REFNUM | 4 | | 2 | | |
| 17 | TABLE ACCESS BY INDEX ROWID | TABLE2 | 2 | 54 | 4 | | |
| 18 | INDEX FAST FULL SCAN | IND_TABLE2_QUERY | 2 | 54 | 2 | | |
------------------------------------------------------------------------------------------------------------------------------
And if I put all the PUB_IDs, Oracle generates this plan (it doesn't even use the IND_TABLE1_GROUP index anymore):
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Pstart| Pstop |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 35661 | 4631K| 345K| | |
| 1 | SORT ORDER BY | | 35661 | 4631K| 345K| | |
| 2 | HASH JOIN | | 35661 | 4631K| 344K| | |
| 3 | TABLE ACCESS FULL | OFICE | 64 | 960 | 3 | | |
| 4 | HASH JOIN | | 35661 | 4109K| 344K| | |
| 5 | PARTITION RANGE ALL | | 1360K| 45M| 79580 | 1 | 19 |
| 6 | PARTITION HASH ALL | | 1360K| 45M| 79580 | 1 | 32 |
| 7 | TABLE ACCESS FULL | TABLE1 | 1360K| 45M| 79580 | 1 | 608 |
| 8 | HASH JOIN | | 2100K| 166M| 252K| | |
| 9 | NESTED LOOPS | | 2100K| 166M| 252K| | |
| 10 | STATISTICS COLLECTOR | | | | | | |
| 11 | PARTITION RANGE ALL | | 1681K| 89M| 82582 | 1 | 19 |
| 12 | PARTITION HASH ALL | | 1681K| 89M| 82582 | 1 | 32 |
| 13 | TABLE ACCESS FULL | TABLE3 | 1681K| 89M| 82582 | 1 | 608 |
| 14 | INDEX RANGE SCAN | IND_TABLE2_QUERY | 1 | 27 | 103K| | |
| 15 | INDEX FAST FULL SCAN | IND_TABLE2_QUERY | 32M| 845M| 103K| | |
-------------------------------------------------------------------------------------------------

Related

Pivoting a table with multiple columns in SQL

My goal here is to take a list of two corresponding store numbers and provide an output similar to:
Ultimate goal: produce a list of closest stores by travel time and distance based on source data of 2 rows per zip9 where each row is the travel time in distance, and in time, to a store in question.
The result is that each zip code has 2 stores to choose from, and the requirement is being able to return one row with both options.
+-----------+---------------+---------------------+-------------------+-------------------------+
| zip | Shortest_time | Shortest_time_store | Shortest_distance | Shortest_distance_store |
+-----------+---------------+---------------------+-------------------+-------------------------+
| 70011134 | 38.7035 | 75 | 21.3124 | 115 |
| 70011186 | 38.4841 | 75 | 21.4144 | 115 |
| 70011207 | 39.1567 | 75 | 21.1826 | 115 |
| 100013232 | 22.976 | 145 | 9.5031 | 115 |
| 112075140 | 21.888 | 145 | 7.3705 | 115 |
+-----------+---------------+---------------------+-------------------+-------------------------+
Original dataset
+---------------+--------------------------+-----------------------+------------------+
| CORRECTED_ZIP | SourceOrganizationNumber | Travel Time (Minutes) | Distance (Miles) |
+---------------+--------------------------+-----------------------+------------------+
| 70011134 | 75 | 38.7035 | 26.8628 |
| 70011134 | 115 | 39.3969 | 21.3124 |
| 70011186 | 75 | 38.4841 | 26.7609 |
| 70011186 | 115 | 39.6389 | 21.4144 |
| 70011207 | 75 | 39.1567 | 31.2771 |
| 70011207 | 115 | 39.188 | 21.1826 |
| 100013232 | 115 | 28.6561 | 9.50311 |
| 100013232 | 145 | 22.976 | 10.0307 |
| 112075140 | 115 | 36.1803 | 7.37053 |
| 112075140 | 145 | 21.888 | 9.50123 |
+---------------+--------------------------+-----------------------+------------------+
Dataset after I've modified it with this query:
SELECT TOP 1000 [corrected_zip]
, TRY_CONVERT( DECIMAL(18, 4), ROUND([Travel Time (Minutes)], 4)) AS [Unit of Measurement]
, [SourceOrganizationNumber]
, 'Time' AS [Type]
FROM [db].[dbo].[my_table_A] [tt]
WHERE [tt].[CORRECTED_ZIP] IN('070011134', '070011186', '070011207', '112075140', '100013232')
AND [Travel Time (Minutes)] IN
(
SELECT MIN([Travel Time (Minutes)])
FROM [db].[dbo].[my_table_A]
WHERE [CORRECTED_ZIP] = [tt].[CORRECTED_ZIP]
GROUP BY [CORRECTED_ZIP]
)
UNION ALL
SELECT TOP 1000 [corrected_zip]
, TRY_CONVERT( DECIMAL(18, 4), ROUND([Distance (Miles)], 4))
, [SourceOrganizationNumber]
, 'Distance'
FROM [db].[dbo].[my_table_A] [tt]
WHERE [tt].[CORRECTED_ZIP] IN('070011134', '070011186', '070011207', '112075140', '100013232')
AND [Distance (Miles)] IN
(
SELECT MIN([Distance (Miles)])
FROM [db].[dbo].[my_table_A]
WHERE [CORRECTED_ZIP] = [tt].[CORRECTED_ZIP]
GROUP BY [CORRECTED_ZIP]
)
ORDER BY [CORRECTED_ZIP];
+---------------+---------------------+--------------------------+----------+
| corrected_zip | Unit of Measurement | SourceOrganizationNumber | Type |
+---------------+---------------------+--------------------------+----------+
| 70011134 | 38.7035 | 75 | Time |
| 70011134 | 21.3124 | 115 | Distance |
| 70011186 | 21.4144 | 115 | Distance |
| 70011186 | 38.4841 | 75 | Time |
| 70011207 | 39.1567 | 75 | Time |
| 70011207 | 21.1826 | 115 | Distance |
| 100013232 | 9.5031 | 115 | Distance |
| 100013232 | 22.976 | 145 | Time |
| 112075140 | 21.888 | 145 | Time |
| 112075140 | 7.3705 | 115 | Distance |
+---------------+---------------------+--------------------------+----------+
Data after I attempted to pivot it
+---------------+--------------------------+----------+---------+
| corrected_zip | SourceOrganizationNumber | Distance | Time |
+---------------+--------------------------+----------+---------+
| 070011134 | 115 | 21.3124 | NULL |
| 070011134 | 75 | NULL | 38.7035 |
| 070011186 | 115 | 21.4144 | NULL |
| 070011186 | 75 | NULL | 38.4841 |
| 070011207 | 115 | 21.1826 | NULL |
| 070011207 | 75 | NULL | 39.1567 |
| 100013232 | 115 | 9.5031 | NULL |
| 100013232 | 145 | NULL | 22.9760 |
| 112075140 | 115 | 7.3705 | NULL |
| 112075140 | 145 | NULL | 21.8880 |
+---------------+--------------------------+----------+---------+
It seems like my issue is picking the correct store ID as opposed to grouping by store ID?
You can use row_number() twice in a subquery(once to rank by time, another by distance), and then do conditional aggregation in the outer query:
select
corrected_zip,
min(travel_time) shortest_time,
min(case when rnt = 1 then source_organization_number end) shortest_time_store,
min(distance) shortest_distance,
min(case when rnd = 1 then source_organization_number end) shortest_distance_store
from (
select
t.*,
row_number() over(partition by corrected_zip order by travel_time) rnt,
row_number() over(partition by corrected_zip order by distance) rnd
from mytable t
) t
group by corrected_zip

SQL Server: How to unpivot from pivoted table back to a self referencing table

I've looked at examples from: https://learn.microsoft.com/en-us/sql/t-sql/queries/from-using-pivot-and-unpivot?view=sql-server-ver15 but I couldn't seem to find samples of what I'm trying to do.
I'm wondering if there's a way to unpivot from this:
+----+------------+--------+--------+--------+
| Id | Level0 | Level1 | Level2 | Level3 |
+----+------------+--------+--------+--------+
| 0 | TMI | | | |
+----+------------+--------+--------+--------+
| 1 | TMI | A | | |
+----+------------+--------+--------+--------+
| 2 | TMI | A | B | |
+----+------------+--------+--------+--------+
| 3 | TMI | A | B | C |
+----+------------+--------+--------+--------+
| 4 | TMI | A | B | D |
+----+------------+--------+--------+--------+
Back to self referencing table like this:
+----+-----------+----------+--------+
| Id | LevelName | ParentId | Level |
+----+-----------+----------+--------+
| 0 | TMI | | Level0 |
+----+-----------+----------+--------+
| 1 | A | 0 | Level1 |
+----+-----------+----------+--------+
| 2 | B | 1 | Level2 |
+----+-----------+----------+--------+
| 3 | C | 2 | Level3 |
+----+-----------+----------+--------+
| 4 | D | 2 | Level3 |
+----+-----------+----------+--------+

Find the newest entry of a crosstable per record?

I have three tables:
My products with their IDs and their features.
is a table with treatments of my products with a treatment-ID, a method, and a date. The treatments are done in batches of many products so there is a crosstable
with the products IDs and the treatment IDs and a bool value for the success of the treatment.
Each product can undergo many different treatments so there is a many-to-many relation. I now want to add to the product table (1.) for every product a value that shows the method of its most recent successful treatment if there is any.
I made a query that groups the crosstable's entries by product-ID but I don't know how to show the method and date of it's last treatment.
table 1:
| productID | size | weight | height | ... |
|-----------|:----:|-------:|--------|-----|
| 1 | 13 | 16 | 9 | ... |
| 2 | 12 | 17 | 12 | ... |
| 3 | 11 | 15 | 15 | ... |
| ... | ... | ... | ... | ... |
table 2:
| treatmentID | method | date |
|-------------|:--------:|-----------:|
| 1 | dye blue | 01.02.2016 |
| 2 | dye red | 01.02.2017 |
| 3 | dye blue | 01.02.2018 |
| ... | ... | ... |
table 3:
| productID | treatmentID | success |
|-----------|:-----------:|--------:|
| 1 | 1 | yes |
| 1 | 2 | yes |
| 1 | 3 | no |
| ... | ... | ... |
I need table 1 to be like:
table 1:
| productID | size | weight | height | latest succesful method |
|-----------|:----:|-------:|--------|-------------------------|
| 1 | 13 | 16 | 9 | dye red |
| 2 | 12 | 17 | 12 | ... |
| 3 | 11 | 15 | 15 | ... |
| ... | ... | ... | ... | ... |
My query:
SELECT table3.productID, table2.method
FROM table2 INNER JOIN table3 ON table2.treatmentID = table3.treatmentID
GROUP BY table3.productID, table2.method
HAVING (((table3.productID)=Max([table2].[date])))
ORDER BY table3.productID DESC;
but this does NOT show only one (the most recent) entry but all of them.
Simplest solution here would be to write either a subquery within your sql, or create a new query to act as a subquery(it will look like a table) to help indicate(or elminate) the records you want to see.
Using similar but potentially slightly different source data as you only gave one example.
Table1
| ProductID | Size | Weight | Height |
|-----------|------|--------|--------|
| 1 | 13 | 16 | 9 |
| 2 | 12 | 17 | 12 |
| 3 | 11 | 15 | 15 |
Table2
| TreatmentID | Method | Date |
|-------------|------------|----------|
| 1 | dye blue | 1/2/2016 |
| 2 | dye red | 1/2/2017 |
| 3 | dye blue | 1/2/2018 |
| 4 | dye yellow | 1/4/2017 |
| 5 | dye brown | 1/5/2018 |
Table3
| ProductID | TreatmentID | Success |
|-----------|-------------|---------|
| 1 | 1 | yes |
| 1 | 2 | yes |
| 1 | 3 | no |
| 2 | 4 | no |
| 2 | 5 | yes |
First order of business is to get the max(dates) and productIds of successful treatments.
We'll do this by aggregating the date along with the productIDs and "success".
SELECT Table3.productid, Max(Table2.Date) AS MaxOfdate, Table3.success
FROM Table2 INNER JOIN Table3 ON Table2.treatmentid = Table3.treatmentid
GROUP BY Table3.productid, Table3.success;
This should give us something along the lines of:
| ProductID | MaxofDate | Success |
|-----------|-----------|---------|
| 1 | 1/2/2018 | No |
| 1 | 1/2/2017 | Yes |
| 2 | 1/4/2017 | No |
| 2 | 1/8/2017 | Yes |
We'll save this query as a "regular" query. I named mine "max", you should probably use something more descriptive. You'll see "max" in this next query.
Next we'll join tables1-3 together but in addition we will also use this "max" subquery to link tables 1 and 2 by the productID and MaxOfDate to TreatmentDate where success = "yes" to find the details of the most recent SUCCESSFUL treatment.
SELECT table1.productid, table1.size, table1.weight, table1.height, Table2.method
FROM ((table1 INNER JOIN [max] ON table1.productid = max.productid)
INNER JOIN Table2 ON max.MaxOfdate = Table2.date) INNER JOIN Table3 ON
(Table2.treatmentid = Table3.treatmentid) AND (table1.productid = Table3.productid)
WHERE (((max.success)="yes"));
The design will look something like this:
Design
(ps. you can add queries to your design query editor by clicking on the "Queries" tab when you are adding tables to your query design. They act just like tables, just be careful as very detailed queries tend to bog down Access)
Running this query should give us our final results.
| ProductID | Size | Weight | Height | Method |
|-----------|------|--------|--------|-----------|
| 1 | 13 | 16 | 9 | dye red |
| 2 | 12 | 17 | 12 | dye brown |

Need to update "orderby" column

I have a table test
+----+--+------+--+--+----------+--+--------------+
| ID | | Name | | | orderby | | processgroup |
+----+--+------+--+--+----------+--+--------------+
| 1 | | ABC | | | 10 | | 1 |
| 10 | | DEF | | | 12 | | 1 |
| 15 | | LMN | | | 1 | | 1 |
| 44 | | JKL | | | 4 | | 1 |
| 42 | | XYZ | | | 3 | | 2 |
+----+--+------+--+--+----------+--+--------------+
I want to update the orderby column in the sequence, I am expecting output like
+----+--+------+--+--+----------+--+--------------+
| ID | | Name | | | orderby | | processgroup |
+----+--+------+--+--+----------+--+--------------+
| 1 | | ABC | | | 1 | | 1 |
| 10 | | DEF | | | 2 | | 1 |
| 15 | | LMN | | | 3 | | 1 |
| 44 | | JKL | | | 4 | | 1 |
| 42 | | XYZ | | | 5 | | 1 |
+----+--+------+--+--+----------+--+--------------+
Logic behind this is when we have procesgroup as 1, orderby column should update as 1,2,3,4 and when procesgroup is 2 then update orderby as 5.
This might help you
;WITH CTE AS (
SELECT ROW_NUMBER() OVER (ORDER BY processgroup, ID ) AS SNO, ID FROM TABLE1
)
UPDATE TABLE1 SET TABLE1.orderby= CTE.SNO FROM CTE WHERE TABLE1.ID = CTE.ID

Join 2 tables by matching children

I'm trying to have 2 tables (In this case it's actually 1 table in a self join) joined by their matching children.
Let me preface the purpose of this which might give a better understanding what I need:
I'm trying to look up a new order that I just got, to see if we ever had the same order, in order to find out in which box type this would be packaged.
So i'd need the matching order to contain the same item and the same qty for the item.
Look at the tables below and note that order 1300981 has the same items as order 1303097, how do I write this join?
Remember: I don't want the results to include any matches that do not match %100.
SQL Fiddle
OrderMain:
| OrderID | BoxId |
|---------|--------|
| 1300981 | 34 |
| 1303096 | (null) |
| 1303097 | (null) |
| 1303098 | (null) |
| 1303099 | (null) |
| 1303100 | (null) |
| 1303101 | (null) |
| 1303102 | (null) |
| 1303103 | (null) |
| 1303104 | B1 |
| 1303105 | (null) |
| 1303106 | (null) |
| 1303107 | 48 |
| 1303108 | (null) |
| 1303109 | (null) |
| 1303110 | (null) |
| 1303111 | (null) |
| 1303112 | (null) |
| 1303113 | (null) |
| 1303114 | (null) |
| 1303115 | (null) |
| 1303116 | (null) |
| 1303117 | (null) |
Order Detail:
| id | OrderID | Item | Qty |
|----|---------|--------|-----|
| 1 | 1300981 | 172263 | 3 |
| 2 | 1300981 | 171345 | 3 |
| 3 | 1300981 | 138757 | 3 |
| 4 | 1303117 | 231711 | 1 |
| 5 | 1303116 | 227835 | 1 |
| 6 | 1303115 | 244798 | 1 |
| 7 | 1303114 | 121755 | 1 |
| 8 | 1303113 | 145275 | 2 |
| 9 | 1303112 | 219554 | 1 |
| 10 | 1303111 | 179385 | 1 |
| 11 | 1303110 | 6229 | 1 |
| 12 | 1303109 | 217330 | 1 |
| 13 | 1303108 | 243596 | 1 |
| 14 | 1303107 | 246758 | 1 |
| 15 | 1303106 | 193931 | 1 |
| 16 | 1303105 | 244659 | 1 |
| 17 | 1303104 | 192548 | 1 |
| 18 | 1303103 | 228410 | 1 |
| 19 | 1303102 | 147474 | 1 |
| 20 | 1303101 | 239191 | 1 |
| 21 | 1303100 | 243594 | 1 |
| 22 | 1303099 | 232301 | 1 |
| 23 | 1303098 | 201212 | 1 |
| 24 | 1303097 | 172263 | 3 |
| 25 | 1303097 | 171345 | 3 |
| 26 | 1303097 | 138757 | 3 |
| 27 | 1303096 | 172263 | 3 |
| 28 | 1303096 | 171345 | 1 |
| 29 | 1303096 | 138757 | 3 |
| 30 | 1303095 | 172263 | 3 |
Expected Results
| OrderID | BoxId |
|---------|--------|
| 1303097 | 34 |
May be a weird way to do this, but if you convert the order details to xml and compare it to other orders, you can look for matches.
WITH BoxOrders AS
(
SELECT om.[OrderId],
om.[BoxId],
(SELECT Item, Qty
FROM orderDetails od
WHERE od.[OrderId] = om.[OrderId]
ORDER BY Item
FOR XML PATH('')) Details
FROM orderMain om
WHERE BoxID IS NOT NULL
)
SELECT mo.OrderId, bo.BoxId
FROM BoxOrders bo
JOIN (
SELECT om.[OrderId],
om.[BoxId],
(SELECT Item, Qty
FROM orderDetails od
WHERE od.[OrderId] = om.[OrderId]
ORDER BY Item
FOR XML PATH('')) Details
FROM orderMain om
WHERE BoxID IS NULL
) mo ON bo.Details = mo.Details
SQL Fiddle
Here's a different approach using SQL and a few analytics.
This joins order detail to itself based on item and qty and order number < other order number and ensures the count of items in each order matches. Thus if items match, count matches and qty matches then the order has the same items.
This returns both orders but easily enough to adjust. Using the CTE so the count materializes. Pretty sure you can't use a having with an analytic like this.
The one major assumption I'm making is that order numbers are sequential and when you say see if an older order exists, I should only need to look at earlier order numbers when evaluating if a prior order had the same items and quantities.
I'm also assuming a 100% match means: Exact same items. Same Quantity of items. and SAME Item Count so count of items for order 1 is 3 and order 2 is 3 and items and quantities match that is 100% but if order 2 had 4 items and order 1 only had 3, no match.
with cte as (
SELECT distinct OD1.OrderID PriorOrder, od2.orderID newOrder, OM.BoxId,
count(OD1.Item) over (partition by OD1.OrderID) OD1Cnt,
count(OD2.Item) over (partition by OD2.OrderID) OD2cnt
FROM OrderDetails OD1
INNER JOIN orderDetails OD2
on OD1.item=OD2.item
and od1.qty = od2.qty
and OD1.OrderID < OD2.OrderID
LEFT JOIN ORderMain OM
on OM.OrderID = OD1.orderID)
Select PriorOrder, NewOrder, boxID from cte where od1cnt = od2cnt

Resources