SQL Query based on occurrence of records - sql-server

After a long time, I am getting a chance to post a SQL Server question here.
I have a table variable as shown below, in SQL Server 2005. This table is populated by a stored procedure written by some other team.
This is a order processing system. Each order can be accomplished by multiple processes by various departments, based on the OPRouteCode.
Taking example for OrderNo = 2, it has two OPRouteCode - but both these OPRouteCodes are using the same processes by same departments. They are considered equivalent OPRouteCodes.
On the other hand, for example OrderNo = 1, the processes and departments vary; hence they are not equivalent.
What is the best way to select only orders that has non-equivalent OPRouteCodes.
Note: If there is only one OPRouteCode, it is considered as equivalent only. Non-equivalence come only if there are more than one OPRouteCode.
What is the best SQL Server query to get this result? I couldn't write anything working after hours of effort.
DECLARE #OrderProcess TABLE (OrderNo Int,
OPRouteCode VARCHAR(5),
Department VARCHAR(10),
Process VARCHAR(20) )
--Order = 1 OPRouteCode = '0023'
INSERT INTO #OrderProcess
SELECT 1,'0023' ,'103','Receive'
UNION ALL
SELECT 1,'0023' ,'104','Produce'
UNION ALL
SELECT 1,'0023' ,'104','Pack'
UNION ALL
SELECT 1,'0023' ,'105','Ship'
--Order = 1 OPRouteCode = '0077'
INSERT INTO #OrderProcess
SELECT 1,'0077' ,'103','Receive'
UNION ALL
SELECT 1,'0077' ,'104','Produce'
UNION ALL
SELECT 1,'0077' ,'105','Ship'
--Order = 2 OPRouteCode = '0044'
INSERT INTO #OrderProcess
SELECT 2,'0044' ,'105','Receive'
UNION ALL
SELECT 2,'0044' ,'106','Ship'
--Order = 2 OPRouteCode = '0055'
INSERT INTO #OrderProcess
SELECT 2,'0055' ,'105','Receive'
UNION ALL
SELECT 2,'0055' ,'106','Ship'
Table Variable
Expected Output

Alright I got it this time. Sorry for the wrong answer before.
Select OrderNo,OPRouteCode
From (
select OrderNo,OPRouteCode, RANK() OVER(PARTITION BY OrderNo,Department ORDER BY Process ) 'Rnk'
from #OrderProcess
) a
Where Rnk =2

Related

SQL stored procedure for picking a random sample based on multiple criteria

I am new to SQL. I looked for all over the internet for a solution that matches the problem I have but I couldn't find any. I have a table named 'tblItemReviewItems' in an SQL server 2012.
tblItemReviewItems
Information:
1. ItemReviewId column is the PK.
2. Deleted column will have only "Yes" and "No" value.
3. Audited column will have only "Yes" and "No" value.
I want to create a stored procedure to do the followings:
Pick a random sample of 10% of all ItemReviewId for distinct 'UserId' and distinct 'ReviewDate' in a given date range. 10% sample should include- 5% of the total population from Deleted (No) and 5% of the total population from Deleted (Yes). Audited ="Yes" will be excluded from the sample.
For example – A user has 118 records. Out of the 118 records, 17 records have Deleted column value "No" and 101 records have Deleted column value "Yes". We need to pick a random sample of 12 records. Out of those 12 records, 6 should have Deleted column value "No" and 6 should have Deleted column value "Yes".
Update Audited column value to "Check" for the picked sample.
How can I achieve this?
This is the stored procedure I used to pick a sample of 5% of Deleted column value "No" and 5% of Deleted column value "Yes". Now the situation is different.
ALTER PROC [dbo].[spItemReviewQcPickSample]
(
#StartDate Datetime
,#EndDate Datetime
)
AS
BEGIN
WITH CTE
AS (SELECT ItemReviewId
,100.0
*row_number() OVER(PARTITION BY UserId
,ReviewDate
,Deleted
order by newid()
)
/count(*) OVER(PARTITION BY UserId
,Reviewdate
,Deleted
)
AS pct
FROM tblItemReviewItems
WHERE ReviewDate BETWEEN #StartDate AND #EndDate
AND Deleted in ('Yes','No')
AND Audited='No'
)
SELECT a.*
FROM tblItemReviewItems AS a
INNER JOIN cte AS b
ON b.ItemReviewId=a.ItemReviewId
AND b.pct<=6
;
WITH CTE
AS (SELECT ItemReviewId
,100.00
*row_number() OVER(PARTITION BY UserId
,ReviewDate
,Deleted
ORDER BY newid()
)
/COUNT(*) OVER(PARTITION BY UserId
,Reviewdate
,Deleted
)
AS pct
FROM tblItemReviewItems
WHERE ReviewDate BETWEEN #StartDate AND #EndDate
AND deleted IN ('Yes','No')
AND audited='No'
)
UPDATE a
SET Audited='Check'
FROM tblItemReviewItems AS a
INNER JOIN cte AS b
ON b.ItemReviewId=a.ItemReviewId
AND b.pct<=6
;
END
Any help would be highly appreciated. Thanks in advance.
This may assist you in getting started. My idea is, you create the temp tables you need, and load the specific data into the (deleted, not deleted etc.). You then run something along the lines of:
IF OBJECT_ID('tempdb..#tmpTest') IS NOT NULL DROP TABLE #tmpTest
GO
CREATE TABLE #tmpTest
(
ID INT ,
Random_Order INT
)
INSERT INTO #tmpTest
(
ID
)
SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5 UNION ALL
SELECT 6 UNION ALL
SELECT 7 UNION ALL
SELECT 8 UNION ALL
SELECT 9 UNION ALL
SELECT 10 UNION ALL
SELECT 11 UNION ALL
SELECT 12 UNION ALL
SELECT 13 UNION ALL
SELECT 14 UNION ALL
SELECT 15 UNION ALL
SELECT 16;
DECLARE #intMinID INT ,
#intMaxID INT;
SELECT #intMinID = MIN(ID)
FROM #tmpTest;
SELECT #intMaxID = MAX(ID)
FROM #tmpTest;
WHILE #intMinID <= #intMaxID
BEGIN
UPDATE #tmpTest
SET Random_Order = 10 + CONVERT(INT, (30-10+1)*RAND())
WHERE ID = #intMinID;
SELECT #intMinID = #intMinID + 1;
END
SELECT TOP 5 *
FROM #tmpTest
ORDER BY Random_Order;
This assigns a random number to a column, that you then use in conjunction with a TOP 5 clause, to get a random top 5 selection.
Appreciate a loop may not be efficient, but you may be able to update to a random number without it, and the same principle could be implemented. Hope that gives you some ideas.

Can the below query be rebuilt using setbased rather than iterative logic

I am preparing one SSRS report and i have only select access in the table.
I have built the query to get the data.But since i am using table variables, and not getting the desired output in report.
Here is my table structure:
This table stores each job run data with below columns:
jobname
instance_id--ever increasing unique number for each run
starttime
endtime
status
I want to show in my report all above columns and I wrote the query like below.
since I want to show latest run irrespective of success or failure, I need to get max(instance_id) into one table and from that base table I am passing all in loop.
I am getting max instance_id along with row number for looping purposes
create table ##table
(
fqjn varchar(1000),
maxi int,
rownum int
)
create table ##tb
(jobname varchar(max),
[status] varchar(100),
duration int
)
insert into ##table
select jobname,max(instance_id) as maxi,row_number() over (order by jobname desc) as rownum from [dbo].[vw_job_hist]
where grp_data_id in (select grp_Data_id from [dbo].[vw_job_data] where grp_name='pcp')
group by jobname
--now loop through table passing instance_id as parameter
declare #rownum int
select top 1 #rownum=rownum from ##table order by rownum
while #rownum is not null
begin
insert into ##tb
select jh.jobname,"job status"=
case jh.completion_status
when 0 then 'Success'
else 'Failed'
end,
"duration"=datediff(minute,jh.started_time,jh.end_time)
from [dbo].[vw_job_] jh
where jh.instance_id in (select maxi from ##table where rownum=#rownum)
set #rownum=(select top 1 rownum from ##table where rownum>#rownum
order by rownum
end
select * from ##tb
drop table ##tb
drop table ##table
I am not getting the desired output in SSRS ,i know if i create the above query as stored proc,i will get the desired results.but this is third party database and we wont get access.
ask:
Can the above query logic be built in single query with out loop/cursor
I tried using recursive cte ,but no further progress
any help/pointers would be much appreciated
Try this:
;WITH OrderedJobInfo AS (
SELECT jobname, completion_status, started_time, end_time, instance_id
, ROW_NUMBER() OVER (PARTITION BY jobname ORDER BY instance_id DESC) AS rownum
FROM [dbo].[vw_job_hist]
WHERE grp_data_id IN (SELECT grp_Data_id FROM [dbo].[vw_job_data] WHERE grp_name='pcp')
)
SELECT o.jobname, o.instance_id
, [job status] = CASE o.completion_status WHEN 0 THEN 'Success' ELSE 'Failed' END
, [duration] = DATEDIFF(MINUTE, o.started_time, o.end_time)
FROM OrderedJobInfo o
WHERE o.rownum = 1;

SQL server logic for fetching data from the table

"Need to display all items linked to the parent category id=1 As per the table, It should fetch:Big Machine, Computer, CPU Cabinet, Hard Disk and Magnetic Disk. But by the logic that is written it is not fetching all the records. Plz help.."
create table ItemSpares
(
ItemName varchar(20),
ItemID int,
ParentCategoryID int
)
insert into ItemSpares (ItemName,ItemID,ParentCategoryID)
select 'Big Machine', 1 , NULL UNION ALL
select 'Computer', 2, 1 UNION ALL
select 'CPU Cabinet', 3, 2 UNION ALL
select 'Hard Disk', 4, 3 UNION ALL
select 'Magnetic Disk',5,4 UNION ALL
select 'Another Big Machine',6, NULL
You need to use a hierarchical SQL query, took a while to figure out but try this:
with BigComputerList (ItemName, ItemID, ParentCategoryID, Level)
AS
(
-- Anchor member definition
SELECT e.ItemName, e.ItemID, e.ParentCategoryID,
0 AS Level
FROM ItemSpares AS e
WHERE ItemID = 1
UNION ALL
-- Recursive member definition
SELECT e.ItemName, e.ItemID, e.ParentCategoryID,
Level + 1
FROM ItemSpares AS e
INNER JOIN BigComputerList AS d
ON e.ParentCategoryId = d.ItemID
)
Select * From BigComputerList
I would highly recommend reading this article if you want to understand what the query is doing.

SQL Select set of records from one table, join each record to top 1 record of second table matching 1 column, sorted by a column in the second table

This is my first question on here, so I apologize if I break any rules.
Here's the situation. I have a table that lists all the employees and the building to which they are assigned, plus training hours, with ssn as the id column, I have another table that list all the employees in the company, also with ssn, but including name, and other personal data. The second table contains multiple records for each employee, at different points in time. What I need to do is select all the records in the first table from a certain building, then get the most recent name from the second table, plus allow the result set to be sorted by any of the columns returned.
I have this in place, and it works fine, it is just very slow.
A very simplified version of the tables are:
table1 (ssn CHAR(9), buildingNumber CHAR(7), trainingHours(DEC(5,2)) (7200 rows)
table2 (ssn CHAR(9), fName VARCHAR(20), lName VARCHAR(20), sequence INT) (708,000 rows)
The sequence column in table 2 is a number that corresponds to a predetermined date to enter these records, the higher number, the more recent the entry. It is common/expected that each employee has several records. But several may not have the most recent(i.e. '8').
My SProc is:
#BuildingNumber CHAR(7), #SortField VARCHAR(25)
BEGIN
DECLARE #returnValue TABLE(ssn CHAR(9), buildingNumber CAHR(7), fname VARCHAR(20), lName VARCHAR(20), rowNumber INT)
INSERT INTO #returnValue(...)
SELECT(ssn,buildingNum,fname,lname,rowNum)
FROM SELECT(...,CASE #SortField Row_Number() OVER (PARTITION BY buildingNumber ORDER BY {sortField column} END AS RowNumber)
FROM table1 a
OUTER APPLY(SELECT TOP 1 fName,lName FROM table2 WHERE ssn = a.ssn ORDER BY sequence DESC) AS e
where buildingNumber = #BuildingNumber
SELECT * from #returnValue ORDER BY RowNumber
END
I have indexes for the following:
table1: buildingNumber(non-unique,nonclustered)
table2: sequence_ssn(unique,nonclustered)
Like I said this gets me the correct result set, but it is rather slow. Is there a better way to go about doing this?
It's not possible to change the database structure or the way table 2 operates. Trust me if it were it would be done. Are there any indexes I could make that would help speed this up?
I've looked at the execution plans, and it has a clustered index scan on table 2(18%), then a compute scalar(0%), then an eager spool(59%), then a filter(0%), then top n sort(14%).
That's 78% of the execution so I know it's in the section to get the names, just not sure of a better(faster) way to do it.
The reason I'm asking is that table 1 needs to be updated with current data. This is done through a webpage with a radgrid control. It has a range, start index, all that, and it takes forever for the users to update their data.
I can change how the update process is done, but I thought I'd ask about the query first.
Thanks in advance.
I would approach this with window functions. The idea is to assign a sequence number to records in the table with duplicates (I think table2), such as the most recent records have a value of 1. Then just select this as the most recent record:
select t1.*, t2.*
from table1 t1 join
(select t2.*,
row_number() over (partition by ssn order by sequence desc) as seqnum
from table2 t2
) t2
on t1.ssn = t1.ssn and t2.seqnum = 1
where t1.buildingNumber = #BuildingNumber;
My second suggestion is to use a user-defined function rather than a stored procedure:
create function XXX (
#BuildingNumber int
)
returns table as
return (
select t1.ssn, t1.buildingNum, t2.fname, t2.lname, rowNum
from table1 t1 join
(select t2.*,
row_number() over (partition by ssn order by sequence desc) as seqnum
from table2 t2
) t2
on t1.ssn = t1.ssn and t2.seqnum = 1
where t1.buildingNumber = #BuildingNumber;
);
(This doesn't have the logic for the ordering because that doesn't seem to be the central focus of the question.)
You can then call it as:
select *
from dbo.XXX(<building number>);
EDIT:
The following may speed it up further, because you are only selecting a small(ish) subset of the employees:
select *
from (select t1.*, t2.*, row_number() over (partition by ssn order by sequence desc) as seqnum
from table1 t1 join
table2 t2
on t1.ssn = t1.ssn
where t1.buildingNumber = #BuildingNumber
) t
where seqnum = 1;
And, finally, I suspect that the following might be the fastest:
select t1.*, t2.*, row_number() over (partition by ssn order by sequence desc) as seqnum
from table1 t1 join
table2 t2
on t1.ssn = t1.ssn
where t1.buildingNumber = #BuildingNumber and
t2.sequence = (select max(sequence) from table2 t2a where t2a.ssn = t1.ssn)
In all these cases, an index on table2(ssn, sequence) should help performance.
Try using some temp tables instead of the table variables. Not sure what kind of system you are working on, but I have had pretty good luck. Temp tables actually write to the drive so you wont be holding and processing so much in memory. Depending on other system usage this might do the trick.
Simple define the temp table using #Tablename instead of #Tablename. Put the name sorting subquery in a temp table before everything else fires off and make a join to it.
Just make sure to drop the table at the end. It will drop the table at the end of the SP when it disconnects, but it is a good idea to make tell it to drop to be on the safe side.

SQL Server Comparing Subsequent Rows for Duplicates

I am trying to write a SQL Server query but have had no luck and was wondering if anyone may have any ideas on how to achieve my query.
What i'm trying to do:
I have a table with several columns naming the ones that i am dealing with TaskID, StatusCode, Timestamp. Now this table just holds tasks for one of our systems that run throughout the day and when something runs it gets a timestamp and the statuscode depending on the status for that task.
Sometimes what happens is the task table will be updated with a new timestamp but the statusCode will not have changed since the last update of the task so for two or more consecutive rows of a given task the statusCode can be the same. When i say consecutive rows i mean with regards to timestamp.
So example task 88 could have twenty rows at statusCode 2 after which the status code changes to something else.
Now what i am trying to do with no luck at the moment is to retrieve a list from this table of all the tasks and the statuscodes and the timestamps but in the case where i have more than one consecutive row for a task with the same statuscode i just want to take the first row with the lowest timestamp and ignore the rest of the row until the statuscode for that task changes.
To make it simpler in this case you can assume that i have a taskid which i am filtering on so i am just looking at a single task.
Does anyone have any ideas as to how i can do this or perhaps something that i coudl probably read to help me?
Thanks
Irfan.
This are a couple ways of getting what you want:
SELECT
T1.task_id,
T1.status_code,
T1.status_timestamp
FROM
My_Table T1
LEFT OUTER JOIN My_Table T2 ON
T2.task_id = T1.task_id AND
T2.status_timestamp < T1.status_timestamp
LEFT OUTER JOIN My_Table T3 ON
T3.task_id = T1.task_id AND
T3.status_timestamp < T1.status_timestamp AND
T3.status_timestamp > T2.status_timestamp
WHERE
T3.task_id IS NULL AND
(T2.status_code IS NULL OR T2.status_code <> T1.status_code)
ORDER BY
T1.status_timestamp
or
SELECT
T1.task_id,
T1.status_code,
T1.status_timestamp
FROM
My_Table T1
LEFT OUTER JOIN My_Table T2 ON
T2.task_id = T1.task_id AND
T2.status_timestamp = (
SELECT
MAX(status_timestamp)
FROM
My_Table T3
WHERE
T3.task_id = T1.task_id AND
T3.status_timestamp < T1.status_timestamp)
WHERE
(T2.status_code IS NULL OR T2.status_code <> T1.status_code)
ORDER BY
T1.status_timestamp
Both methods rely on there being no exact matches of the status_timestamp values (two rows can't have the same exact status_timestamp for a given task_id.)
Something like
select TaskID,StatusCode,Min(TimeStamp)
from table
group by TaskID,StatusCode
order by 1,2
Note that is statuscode can duplicate, you will need an additional field, but hopefully this can point you in the right direction...
Something like the following should get you in the right direction....
CREATE TABLE #T
(
TaskId INT
,StatusCode INT
,StatusTimeStamp DATETIME
)
INSERT INTO #T
SELECT 1, 1, '2009-12-01 14:20'
UNION SELECT 1, 2, '2009-12-01 16:20'
UNION SELECT 1, 2, '2009-12-02 09:15'
UNION SELECT 1, 2, '2009-12-02 12:15'
UNION SELECT 1, 3, '2009-12-02 18:15'
;WITH CTE AS
(
SELECT TaskId
,StatusCode
,StatusTimeStamp
,ROW_NUMBER() OVER (PARTITION BY TaskId, StatusCode ORDER BY TaskId, StatusTimeStamp DESC) AS RNUM
FROM #T
)
SELECT TaskId
,StatusCode
,StatusTimeStamp
FROM CTE
WHERE RNUM = 1
DROP TABLE #T

Resources