SQL - View which duplicates values for missing entries

SQL - View which duplicates values for missing entries - sql-server

I need to create a view, which would propagate missing values by creating duplicates. Here is an example:
With such table:
NR|Description|FK
0 |Text1 |0
0 |Text2 |1
0 |Text4 |2
1 |Text3 |0
Create such view:
NR|Description|FK
0 |Text1 |0
0 |Text2 |1
0 |Text4 |2
1 |Text3 |0
1 |Text3 |1
1 |Text3 |2
The original table will always have at least one entry with specific NR column and column FK valued 0. So in short, if there is a row with unique NR and column FK with value 0 and there is no row with FK valued 1 then create one based on the row with FK value 0
Edit:
There can be more than one unique FK value

This should do it
declare #T table (NR int, Description varchar(10), FK int);
insert into #T values
(0, 'Text1', 0)
, (0, 'Text2', 1)
, (1, 'Text3', 0);
select t1.NR, t1.Description, t1.FK
from #T t1
union
select t1.NR, t1.Description, 1
from #T t1
left join #T t2
on t2.NR = t1.NR
and t1.FK = 0
and t2.fk = 1
where t2.NR is null;

You could do something like this:
SELECT [NR], [Description], newtbl2.[FK] FROM (
SELECT [NR], [Description], newtbl.[FK] FROM [dbo].[myTable] oldtbl
LEFT OUTER JOIN (select [NR], [Description], 1 as [FK]) newtbl ON newtbl.[FK] = oldtbl.[FK]
) joined
LEFT OUTER JOIN (select [NR], [Description], 0 as [FK]) newtbl2 ON newtbl2.[FK] = joined.[FK]
What's happening here is: First, I'm left joining a duplicate table with 1 as the FK to the original table. That way if there are no rows with 1 as the FK, it will create one, if there are a row, it will just join on that row - giving you the same NR and Description.
Next, I'm joining an additional table with 0 as the FK. Basically the same as the first step, but with 0 instead of 1.
The aliasing may need some work, but in principal, this approach should work.

Related

Identify duplicates based on multiple columns and parent row

This is an example of table data that I am working on (the table contained a lot of columns, I am showing here only the relevant ones):
Id
job_number
status
parent_id
1
42FWD-42
0
0
2
42FWD-42
1
1
3
42FWD-42
5
1
Id is auto generated. parent_id links the job using the id.
When a new job is created via the app, a new row is created (with status "0"). The auto-generated Id is then used for subsequent rows of same job, and set as parent id.
Another record with status "1" (which is code for started) is also created just after parent record.
Explanation of the problem: due to a bug in the app, there are duplicate set of rows for the same job.
Example of problem
Id
job_number
status
parent_id
1
42FWD-42
0
0
2
42FWD-42
0
0
3
42FWD-42
1
1
4
42FWD-42
1
2
5
42FWD-42
5
1
As you can see from this example, due to the bug, there are 2 rows with "0" status for the same job, and 2 rows with "1" status.
This creates a lot of problems in operation in app where the job is updated using the job number.
The status number should not repeat for a specific job.
What I want to do is to find all duplicates like those in example. For example, I want a query where I can find all duplicates which have same job number, but different parent_id and NO "5" status.
Example result using the example table above, I need the query to return:
Id
job_number
status
parent_id
2
42FWD-42
0
0
4
42FWD-42
1
2
Explanation of this result:
Row with Id=1 is considered the correct record because it has an associated record with status "5"
Row with Id=2 is considered duplicate and its associated records are also considered duplicate
Another possible case: there are duplicate rows, but none have status=5. These rows can be discarded, ie need not be shown in results.
A brief explanation of how the query works will be appreciated.
EDIT:
I forgo to add an important information:
job_number is case sensitive.
ie: 42FWD-42 and 42fwd-42 are different and valid job number. They should not be considered duplicates, and are 2 separate jobs.
The reason for this is the actual job number is not small text as in my example. It is a long string like a guid.

First I must mention you should block identical rows by means of a unique constraint. I suggest that once you have eliminated all duplicates you put up a such a constraint to keep this from happening again.
Now for your question, you can do this by grouping on the duplicate columns, and have only those that count more than one.
Here is an example
declare #t table (id int, job_number varchar(10), status int, parent_id int)
insert into #t
values (1, '42FWD-42', 0, 0), (2, '42FWD-42', 0, 0), (3, '42FWD-42', 1, 1), (4, '42FWD-42', 1, 2), (5, '42FWD-42', 5, 1)
select max(t.id) as id, t.job_number, t.status
from #t t
group by t.job_number, t.status
having count(*) > 1
the result is
id job_number status
2 42FWD-42 0
4 42FWD-42 1
and to get also the parent_id you can add a self join
select max(t.id) as id,
t.job_number,
t.status,
(select t2.parent_id from #t t2 where t2.id = max(t.id)) as parent_id
from #t t
group by t.job_number, t.status
having count(*) > 1
this returns
id job_number status parent_id
2 42FWD-42 0 0
4 42FWD-42 1 2
EDIT
To solve the addional problem in the edit of your question, about the case sensitive, you can fix that by using a COLLATE in your field retrieval and your comparision
this should do it
declare #t table (id int, job_number varchar(10), status int, parent_id int)
insert into #t
values (1, '42FWD-42', 0, 0),
(2, '42FWD-42', 0, 0),
(3, '42FWD-42', 1, 1),
(4, '42fwd-42', 1, 2), -- LOWERCASE !!!
(5, '42FWD-42', 5, 1)
select max(t.id) as id,
t.job_number COLLATE Latin1_General_CS_AS,
t.status,
(select t2.parent_id from #t t2 where t2.id = max(t.id)) as parent_id
from #t t
group by t.job_number COLLATE Latin1_General_CS_AS, t.status
having count(*) > 1
and now the result will be
id job_number status parent_id
2 42FWD-42 0 0
Yet another edit
Now, suppose you need to use the result of these duplicate id's in another query, you could do something like this
select t.*
from #t t
where t.id in ( select max(t.id) as id
from #t t
group by t.job_number COLLATE Latin1_General_CS_AS, t.status
having count(*) > 1
)
What I am doing here is getting only the duplicate id's in a form that can be used to feed a where clause in another query.
This way you can use the result set in any way you wish.
Also note that for this we don't need the self join to retrieve the parent_id anymore.
One possible use of this could be to delete duplicate rows, you can write
delete from yourtable
where id in ( select max(t.id) as id
from #t t
group by t.job_number COLLATE Latin1_General_CS_AS, t.status
having count(*) > 1
)

you can try to use ROW_NUMBER window function to get duplicate row data and its id by job_number, then using cte recursive to find all error records by this id
Query 1:
;WITH CTE AS (
SELECT *,ROW_NUMBER() OVER (PARTITION BY job_number ORDER BY Id) rn
FROM T
WHERE status = 0
), CTE1 AS (
SELECT id,job_number,status,parent_id
FROM CTE
WHERE rn > 1
UNION ALL
SELECT t.id,t.job_number,t.status,t.parent_id
FROM CTE1 c INNER JOIN T t
ON c.id = t.parent_id
)
SELECT *
FROM CTE1
Results:
| id | job_number | status | parent_id |
|----|------------|--------|-----------|
| 2 | 42FWD-42 | 0 | 0 |
| 4 | 42FWD-42 | 1 | 2 |

SQL Server : except with results from both datasets

I have the following tables:
Stores:
StoreID | Name
1 | Store1
2 | Store2
3 | Store3
EmID | StoreID
1 | 1
2 | 1
3 | 1
1 | 2
3 | 2
Employee:
EmID | Employee | Important
1 | Cashier | 1
2 | Manager | 1
3 | Guard | 0
I need a query to return StoreID and EmID where Employee is important (Important = 1) and the store and employee are not connected. Basically, the result should be:
StoreID | EmId
--------+-------
2 | 2
3 | 1
3 | 2
I have tried joins, outer joins / apply-es, except, cte, temporary tables, but still haven't found the answer.
Can someone help me with the code, or at least point me in the right direction?
Any idea will be very much appreciated.
Thanks.

You use a cross join to get the set of all possible employee/store combinations, and a left join to then remove the combinations that exist in the join table1:
declare #Stores table (StoreID int, Name char(6))
insert into #Stores (StoreID,Name) values
(1,'Store1'),
(2,'Store2'),
(3,'Store3')
declare #Employees table (EmID int, Employee varchar(8), Important bit)
insert into #Employees (EmID,Employee,Important) values
(1,'Cashier',1),
(2,'Manager',1),
(3,'Guard' ,0)
declare #Staffing table (EmID int, StoreID int)
insert into #Staffing (EmID,StoreID) values
(1,1),
(2,1),
(3,1),
(1,2),
(3,2)
select
*
from
#Stores s
cross join
#Employees e
left join
#Staffing st
on
s.StoreID = st.StoreID and
e.EmID = st.EmID
where
e.Important = 1 and
st.EmID is null
Results:
StoreID Name EmID Employee Important EmID StoreID
----------- ------ ----------- -------- --------- ----------- -----------
3 Store3 1 Cashier 1 NULL NULL
2 Store2 2 Manager 1 NULL NULL
3 Store3 2 Manager 1 NULL NULL
1The one I've named Staffing and you didn't name in the question. Note also (for future questions) that my presentation of the sample data takes up approximately as much space as yours in the question, provides the data types, and is a runnable script.

Please use Cross join followed by Left join and filter on IMP and StoreID null.
create table #Stores
(storeID int, Name varchar(100))
create table #ES
(empid int,storeID int)
create table #E
(eid int,employee varchar(100), imp int)
insert into #stores values(
1,'Store1'),
(2,'Store2'),
(3,'Store3')
insert into #ES values(
1,1),(2,1),(3,1),(1,2),(3,2)
insert into #E values
(1,'Cashier',1),
(2,'Manager', 1),
(3,'Guard',0)
select * from #Stores
select * from #ES
select * from #E
select #stores.storeid,#E.eid from #Stores
cross join #E
LEFT join #ES
on #ES.storeid = #Stores.storeid
and #E.eid = #ES.empid
where #E.imp = 1
and #ES.storeID is null

Try this query.
I assumed the table name of the "Employee" is dbo.Employee and table name of "Stores" is dbo.Stores and the intermediate table is "dbo.EmpStore"
SELECT S.StoreID, E.EmID
FROM dbo.Stores S
CROSS JOIN dbo.Employees E
LEFT JOIN dbo.EmpStore ES ON ES.EmID = E.EmID AND ES.StoreID = S.StoreID
WHERE E.Important=1 AND ES.EmID IS NULL

Update Table1 adding values from Table2

Table1
Columns PK_Table1 Name | DoYouGoToSchool |DoYouhaveACar |DoYouWorkFullTime | DoYouWorkPartTime | Score
1 joe Yes Yes No Yes
2 amy No Yes Yes No
Table2
Columns Pk_Table2 |Question | Answer(Bit Column) |Value
1 DoYouGoToSchool True 3
2 DoYouhaveACar True 2
3 DoYouWorkFullTime True 4
4 DoYouWorkPartTime True 2
Based on the information from Table2 What i need to do is UPDATE Table1 ColumnName Score by summing up the Value from Table2 with the information he has provided.
for example i expect the Score column in table1 to be 7 for record 1
and 5 for record 2
Here is a query to play with
IF OBJECT_ID('tempdb..#Table2') IS NOT NULL DROP TABLE #Table2
GO
IF OBJECT_ID('tempdb..#Table1') IS NOT NULL DROP TABLE #Table1
GO
create table #Table1
(
PK_Table1 int,
Name Varchar(50),
DoYouGoToSchool Varchar(8),
DoYouhaveACar Varchar(8),
DoYouWorkFullTime Varchar(8),
DoYouWorkPartTime Varchar(8),
Score INT NULL,
)
create table #Table2
(
PK_Table2 int,
Questions Varchar(50),
Answer BIT NOT NULL DEFAULT(0),
VALUE INT NULL
)
INSERT INTO #Table1 (Name,DoYouGoToSchool,DoYouhaveACar,DoYouWorkFullTime,DoYouWorkPartTime)
VALUES ('joe','Yes','Yes','No','Yes'), ('amy','NO','Yes','Yes','No')
INSERT INTO #Table2(Questions,Answer,VALUE)
VALUES ('DoYouGoToSchool','True',3 ),('DoYouhaveACar','True',2 ),('DoYouWorkFullTime','True',4 ),('DoYouWorkPartTime','True',2 )
This is what is missing from answer below that tells you to create new FK contraint to the Table2 --Inserting Data into the table with the new FK Column
insert into #Table2 (FK_Table1, Questions, Answer) select t.PK_Table1, t1.cols, colsval from #Table1 t cross apply (values (PK_Table1,'DoYouGoToSchool', DoYouGoToSchool), (PK_Table1,'DoYouhaveACar', DoYouhaveACar), (PK_Table1,'DoYouWorkFullTime', DoYouWorkFullTime), (PK_Table1,'DoYouWorkPartTime', DoYouWorkPartTime) ) t1 (PK_Table1,cols, colsval);

First create a relation between these two tables and add Primary key of Table1 in Table2 as a foreign key so your Table2 becomes:
Table2 Columns:
FK_Table1 |Pk_Table2 |Question | Answer(Bit Column) |Value
1 1 DoYouGoToSchool True 3
1 2 DoYouhaveACar True 2
1 3 DoYouWorkFullTime True 4
1 4 DoYouWorkPartTime True 2
You can add in table by using this Query:
ALTER TABLE Table2
ADD FK_Table1 INTEGER,
ADD CONSTRAINT FOREIGN KEY(FK_Table1) REFERENCES Table1(PK_Table1)
means that it is only for that person whose PK_Table1 = 1
Then you can extract his score from this query:
SELECT Sum(Value) FROM Table2 WHERE FK_Table1 = 1;
And then update query:
UPDATE Table1
SET score = (enter here the returned score from above query)
WHERE PK_Table1 = 1;
Or you can do in a single query like this:
UPDATE Table1
SET score = (SELECT Sum(Value) FROM Table2 WHERE FK_Table1 = 1)
WHERE PK_Table1 = 1;

You will need to add another table. This table will be your relational table. It can be called Table1_Table2 with three columns. The first column will be the primary key for the table. The next column will be the primary key of Table1 and the third column will be the primary key for Table 2.
When an instance of Table2 occurs that relates with Table1, insert a record into Table1_Table2 that relates the two tables together with each others primary key. Then a query can be done on the relational table, Table1_Table2 that allows you to sum the relationships.
|Table1_Table2 |
| PK | PK_Table1 | PK_Table2 |
| 1 | 1 | 1 |
| 2 | 1 | 3 |
| 3 | 2 | 1 |
| 4 | 2 | 4 |
As we can see, we can now perform an update on Table1
UPDATE TABLE1 A SET A.SCORE = (Select SUM(B.Value) FROM Table2 B, Table1_Table2 C WHERE C.PK_Table2 = B.PK_Table2 AND C.PK_Table1 = A.PK_Table1);

Lookup primary ID from multiple tables having another (unique) field

I'm trying to add values in a junction table of a many to many relationship.
Tables look like these (all IDs are integers):
Table A
+------+----------+
| id_A | ext_id_A |
+------+----------+
| 1 | 100 |
| 2 | 101 |
| 3 | 102 |
+------+----------+
Table B is conceptually similar
+------+----------+
| id_B | ext_id_B |
+------+----------+
| 1 | 200 |
| 2 | 201 |
| 3 | 202 |
+------+----------+
Tables PK are id_A and id_B, as columns in my junction table are FK to those columns, but I have to insert values having only external ids (ext_id_A, ext_id_B).
External IDs are unique columns, (and therefore in a 1:1 with table id itself), so having ext_id I can lookup the exact row and get the id need to insert into junction table.
This is an example of what I've done so far, but doesn't look like an optimized sql statement:
-- Example table I receive with test values
declare #temp as table (
ext_id_a int not null,
ext_id_b int not null
);
insert into #temp values (100, 200), (101, 200), (101, 201);
--Insertion - code from my sp
declare #final as table (
id_a int not null,
id_b int not null
);
insert into #final
select a.id_a, b.id_b
from #temp as t
inner join table_a a on a.ext_id_a = t.ext_id_a
inner join table_b b on b.ext_id_b = t.ext_id_b
merge into junction_table as jt
using #final as f
on f.id_a = jt.id_a and f.id_b = tj.id_b
when not matched by target then
insert (id_a, id_b) values (id_a, id_b);
I was thinking about a MERGE statement since my stored procedure receives data in a Table Value Parameters parameter and I also have to check for already existing references.
Is anything I can do to improve insertion of these values?

No need to use the #final table variable:
; with cte as (
select tA.id_A, tB.id_B
from #temp t
join table_A tA on t.ext_id_a = tA.ext_id_A
join table_B tB on t.ext_id_B = tB.ext_id_B
)
merge into junction_table
using cte
on cte.id_A = junction_table.id_A and cte.id_B = junction_table.id_B
when not matched by target then
insert (id_A, id_B) values (cte.id_A, cte.id_B);

SQL Update row column with random lookup value

I am trying to update a lead table to assign a random person from a lookup table. Here is the generic schema:
TableA (header),
ID int,
name varchar (30)
TableB (detail),
ID int,
fkTableA int, (foreign key to TableA.ID)
recordOwner varchar(30) null
other detail colums..
TableC (owners),
ID int,
fkTableA int (foreign key to TableA.ID)
name varchar(30)
TableA has 10 entries, one for each type of sales lead pool. TableB has thousands of entries for each row in TableA. I want to assign the correct recordOwners from TableC to and even number of rows each (or as close as I can). TableC will have anywhere from one entry for each row in tableA or up to 10.
Can this be done in one statement? It doesn't have to be. I can't seem to figure out the best approach for speed. Any thoughts or samples are appreciated.
Updated:
TableA has a 1 to many relation ship with TableC. For every record of TableA, TableC will have at least one row, which represents an owner that will need to be assigned to a row in TableB.
TableA
int name
1 LeadSourceOne
2 LeadSourceTwo
TableC
int(id) int(fkTableA) varchar(name)
1 1 Tom
2 1 Bob
3 2 Timmy
4 2 John
5 2 Steve
6 2 Bill
TableB initial data
int(id) int(fkTableA) varchar(recordOwner) (other detail columns)
1 1 NULL ....
2 1 NULL ....
3 1 NULL ....
4 2 NULL ....
5 2 NULL ....
6 2 NULL ....
7 2 NULL ....
8 2 NULL ....
9 2 NULL ....
TableB end result
int(id) int(fkTableA) varchar(recordOwner) (other detail columns)
1 1 TOM ....
2 1 BOB ....
3 1 TOM ....
4 2 TIMMY ....
5 2 JOHN ....
6 2 STEVE ....
7 2 BILL ....
8 2 TIMMY ....
9 2 BILL ....
Basically I need to randomly assign a record from tableC to tableB based on the relationship to tableA.

UPDATE TabB SET name = (SELECT TOP 1 coalesce(tabC.name,'') FROM TabC INNER JOIN TabA ON TabC.idA = TabA.id WHERE tabA.Id = TabB.idA )
Should work but its not tested.

Try this:
UPDATE TableB
SET recordOwner = (SELECT TOP(1) [name]
FROM TableC
ORDER BY NEWID())
WHERE recordOwner IS NULL

I ended up looping thru and updating x percent of the detail records based on how many owners I had. The end result is something like this:
create table #tb_owners(userId varchar(30), processed bit)
insert into #tb_owners(
userId,
processed)
select userId = name,
processed = 0
from tableC
where fkTableA = 1
select #percentUpdate = cast(100 / count(*) as numeric(8,2))
from #tb_owners
while exists(select 1 from #tb_owners o where o.processed = 0)
begin
select top 1
#userFullName = o.name
from #tb_owners o
where o.processed = 0
order by newId()
update tableB
set recordOwner = #userFullName
from tableB ptbpd
inner join (
select top (#percentUpdate) percent
id
from tableB
where recordOwner is null
order by newId()
) nd on (ptbpd.id = nd.id)
update #tb_owners
set processed = 1
where userId = #oUserId
end
--there may be some left over, set to last person used
update tableB
set recordOwner = #userFullName
from tableB
where ptbpd.recordOwner is null

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight