This is an example of table data that I am working on (the table contained a lot of columns, I am showing here only the relevant ones):
Id
job_number
status
parent_id
1
42FWD-42
0
0
2
42FWD-42
1
1
3
42FWD-42
5
1
Id is auto generated. parent_id links the job using the id.
When a new job is created via the app, a new row is created (with status "0"). The auto-generated Id is then used for subsequent rows of same job, and set as parent id.
Another record with status "1" (which is code for started) is also created just after parent record.
Explanation of the problem: due to a bug in the app, there are duplicate set of rows for the same job.
Example of problem
Id
job_number
status
parent_id
1
42FWD-42
0
0
2
42FWD-42
0
0
3
42FWD-42
1
1
4
42FWD-42
1
2
5
42FWD-42
5
1
As you can see from this example, due to the bug, there are 2 rows with "0" status for the same job, and 2 rows with "1" status.
This creates a lot of problems in operation in app where the job is updated using the job number.
The status number should not repeat for a specific job.
What I want to do is to find all duplicates like those in example. For example, I want a query where I can find all duplicates which have same job number, but different parent_id and NO "5" status.
Example result using the example table above, I need the query to return:
Id
job_number
status
parent_id
2
42FWD-42
0
0
4
42FWD-42
1
2
Explanation of this result:
Row with Id=1 is considered the correct record because it has an associated record with status "5"
Row with Id=2 is considered duplicate and its associated records are also considered duplicate
Another possible case: there are duplicate rows, but none have status=5. These rows can be discarded, ie need not be shown in results.
A brief explanation of how the query works will be appreciated.
EDIT:
I forgo to add an important information:
job_number is case sensitive.
ie: 42FWD-42 and 42fwd-42 are different and valid job number. They should not be considered duplicates, and are 2 separate jobs.
The reason for this is the actual job number is not small text as in my example. It is a long string like a guid.
First I must mention you should block identical rows by means of a unique constraint. I suggest that once you have eliminated all duplicates you put up a such a constraint to keep this from happening again.
Now for your question, you can do this by grouping on the duplicate columns, and have only those that count more than one.
Here is an example
declare #t table (id int, job_number varchar(10), status int, parent_id int)
insert into #t
values (1, '42FWD-42', 0, 0), (2, '42FWD-42', 0, 0), (3, '42FWD-42', 1, 1), (4, '42FWD-42', 1, 2), (5, '42FWD-42', 5, 1)
select max(t.id) as id, t.job_number, t.status
from #t t
group by t.job_number, t.status
having count(*) > 1
the result is
id job_number status
2 42FWD-42 0
4 42FWD-42 1
and to get also the parent_id you can add a self join
select max(t.id) as id,
t.job_number,
t.status,
(select t2.parent_id from #t t2 where t2.id = max(t.id)) as parent_id
from #t t
group by t.job_number, t.status
having count(*) > 1
this returns
id job_number status parent_id
2 42FWD-42 0 0
4 42FWD-42 1 2
EDIT
To solve the addional problem in the edit of your question, about the case sensitive, you can fix that by using a COLLATE in your field retrieval and your comparision
this should do it
declare #t table (id int, job_number varchar(10), status int, parent_id int)
insert into #t
values (1, '42FWD-42', 0, 0),
(2, '42FWD-42', 0, 0),
(3, '42FWD-42', 1, 1),
(4, '42fwd-42', 1, 2), -- LOWERCASE !!!
(5, '42FWD-42', 5, 1)
select max(t.id) as id,
t.job_number COLLATE Latin1_General_CS_AS,
t.status,
(select t2.parent_id from #t t2 where t2.id = max(t.id)) as parent_id
from #t t
group by t.job_number COLLATE Latin1_General_CS_AS, t.status
having count(*) > 1
and now the result will be
id job_number status parent_id
2 42FWD-42 0 0
Yet another edit
Now, suppose you need to use the result of these duplicate id's in another query, you could do something like this
select t.*
from #t t
where t.id in ( select max(t.id) as id
from #t t
group by t.job_number COLLATE Latin1_General_CS_AS, t.status
having count(*) > 1
)
What I am doing here is getting only the duplicate id's in a form that can be used to feed a where clause in another query.
This way you can use the result set in any way you wish.
Also note that for this we don't need the self join to retrieve the parent_id anymore.
One possible use of this could be to delete duplicate rows, you can write
delete from yourtable
where id in ( select max(t.id) as id
from #t t
group by t.job_number COLLATE Latin1_General_CS_AS, t.status
having count(*) > 1
)
you can try to use ROW_NUMBER window function to get duplicate row data and its id by job_number, then using cte recursive to find all error records by this id
Query 1:
;WITH CTE AS (
SELECT *,ROW_NUMBER() OVER (PARTITION BY job_number ORDER BY Id) rn
FROM T
WHERE status = 0
), CTE1 AS (
SELECT id,job_number,status,parent_id
FROM CTE
WHERE rn > 1
UNION ALL
SELECT t.id,t.job_number,t.status,t.parent_id
FROM CTE1 c INNER JOIN T t
ON c.id = t.parent_id
)
SELECT *
FROM CTE1
Results:
| id | job_number | status | parent_id |
|----|------------|--------|-----------|
| 2 | 42FWD-42 | 0 | 0 |
| 4 | 42FWD-42 | 1 | 2 |
I am unable to find the logic for below question in SQL Server.
I have a table like below.
id ParentID
---------------
1 NULL
2 NULL
3 1
4 2
5 3
6 5
I need a query which will return hierarchy of a row, like this:
Hierarchy id ParentID
----------------------------
1 1 NULL
1 2 NULL
2 3 1
2 4 2
3 5 3
4 6 5
I will explain the hierarchy:
For any row if ParentId is null then Hierarchy will be 1
Any row if ParentId is not null and ParentId's ParentId is null then 2
Any row if ParentId is not null, ParentId's ParentId is not null and next ParentId's ParentId is null then 3
And it goes on
How can write the query for this logic.
You can achieve this with a recursive query:
(in the below example your initial table data is stored in #a):
;With DATA AS (
SELECT 1 as hierarchy
,Id
,parentid
from #a
where parentid is null
UNION ALL
SELECT Data.Hierarchy + 1
,a.id
,a.parentid
FROM #a a
INNER JOIN DATA
ON Data.id = a.parentid
)
SELECT *
FROM DATA
ORDER BY hierarchy, Id
Create two table parent and child
Create table parent (ID int, ParentId int,Text varchar(20))
Create table Child (child1Id int, ParentId int, point int)
ID ParentId Text
1 NULL Sony
2 1 phone
3 2 sale
4 2 Rate
child1Id ParentIdId point
100 3 10
200 4 20
I tried something like this
Select sum(b.point),a.ParentId,a.Text from parent A join Child B on a.ID =b.ParentId group by a.ParentId,a.Text
I need output like this.
ID ParentId Text Point
1 NULL Sony null
2 1 phone 30
You can do this with a subquery
SELECT p.ID
, p.parentId
, p.[Text] AS 'Text'
, cld.pointSum AS 'Point'
FROM dbo.Parent AS p
LEFT JOIN (
SELECT c.ParentId
, SUM(c.point) AS 'pointSum'
FROM dbo.Child AS c
GROUP BY c.ParentId
) cld ON (p.ID = cld.ParentId)
I am trying to update a lead table to assign a random person from a lookup table. Here is the generic schema:
TableA (header),
ID int,
name varchar (30)
TableB (detail),
ID int,
fkTableA int, (foreign key to TableA.ID)
recordOwner varchar(30) null
other detail colums..
TableC (owners),
ID int,
fkTableA int (foreign key to TableA.ID)
name varchar(30)
TableA has 10 entries, one for each type of sales lead pool. TableB has thousands of entries for each row in TableA. I want to assign the correct recordOwners from TableC to and even number of rows each (or as close as I can). TableC will have anywhere from one entry for each row in tableA or up to 10.
Can this be done in one statement? It doesn't have to be. I can't seem to figure out the best approach for speed. Any thoughts or samples are appreciated.
Updated:
TableA has a 1 to many relation ship with TableC. For every record of TableA, TableC will have at least one row, which represents an owner that will need to be assigned to a row in TableB.
TableA
int name
1 LeadSourceOne
2 LeadSourceTwo
TableC
int(id) int(fkTableA) varchar(name)
1 1 Tom
2 1 Bob
3 2 Timmy
4 2 John
5 2 Steve
6 2 Bill
TableB initial data
int(id) int(fkTableA) varchar(recordOwner) (other detail columns)
1 1 NULL ....
2 1 NULL ....
3 1 NULL ....
4 2 NULL ....
5 2 NULL ....
6 2 NULL ....
7 2 NULL ....
8 2 NULL ....
9 2 NULL ....
TableB end result
int(id) int(fkTableA) varchar(recordOwner) (other detail columns)
1 1 TOM ....
2 1 BOB ....
3 1 TOM ....
4 2 TIMMY ....
5 2 JOHN ....
6 2 STEVE ....
7 2 BILL ....
8 2 TIMMY ....
9 2 BILL ....
Basically I need to randomly assign a record from tableC to tableB based on the relationship to tableA.
UPDATE TabB SET name = (SELECT TOP 1 coalesce(tabC.name,'') FROM TabC INNER JOIN TabA ON TabC.idA = TabA.id WHERE tabA.Id = TabB.idA )
Should work but its not tested.
Try this:
UPDATE TableB
SET recordOwner = (SELECT TOP(1) [name]
FROM TableC
ORDER BY NEWID())
WHERE recordOwner IS NULL
I ended up looping thru and updating x percent of the detail records based on how many owners I had. The end result is something like this:
create table #tb_owners(userId varchar(30), processed bit)
insert into #tb_owners(
userId,
processed)
select userId = name,
processed = 0
from tableC
where fkTableA = 1
select #percentUpdate = cast(100 / count(*) as numeric(8,2))
from #tb_owners
while exists(select 1 from #tb_owners o where o.processed = 0)
begin
select top 1
#userFullName = o.name
from #tb_owners o
where o.processed = 0
order by newId()
update tableB
set recordOwner = #userFullName
from tableB ptbpd
inner join (
select top (#percentUpdate) percent
id
from tableB
where recordOwner is null
order by newId()
) nd on (ptbpd.id = nd.id)
update #tb_owners
set processed = 1
where userId = #oUserId
end
--there may be some left over, set to last person used
update tableB
set recordOwner = #userFullName
from tableB
where ptbpd.recordOwner is null
I have table data like this
id id1 name
1 1 test1
1 1 test1
1 2 test2
2 1 test1
2 2 test2
3 1 test1
3 2 test2
3 2 test2
now from table i want the data as below
like
for id = 1 order by id1 asc the first name = test1
so i want the first two row
id id1 name
1 1 test1
1 1 test1
not third row
For id=2 order by id1 asc the first name = test1
so i want first row as test1 has assign only ones for id=2
id id1 name
2 1 test1
And for id=3 same as id=2
Please suggest me how can get the perticlur value for ID , because the scenerio is differnt for ID=1
Use RANK() or DENSE_RANK() to get the first ranked rows, including duplicates, for each id.
select * from (
select *, dense_rank() over (partition by [id] order by [id2]) as ranking
from [table]
) as t
where ranking = 1
Sounds to me like you just want select * from [tablename] where id1 = 1, but I might be wrong. I find the question a bit, well, vague...
Perhaps this:
select * from [table] where id = 1 order by id1
I think the point is that you can use one column in the where clause, and a different column in the order by clause. that's no problem.
But I'm not sure you could actually have the data table you describe, because the first two rows are identical (how can SQL tell them apart? Or more technically, there'd be a primary key violation)?
select * from [table] where id = 1 and name like "test1" order by id1