Update postgres table to squash duplicate values in second table

Update postgres table to squash duplicate values in second table - database

I have a postgresql schema with two tables:
tableA: tableB:
| id | username | | fk_id | resource |
| 1 | user1 | | 2 | item1 |
| 2 | user1 | | 1 | item3 |
| 3 | user1 | | 1 | item2 |
| 4 | user2 | | 4 | item5 |
| 5 | user2 | | 5 | item8 |
| 6 | user3 | | 3 | item9 |
The foreign key fk_id in tableB references id in tableA.
How can I update all of the foreign key id's of tableB to point to the lowest entry for a unique username in tableA?

update table_b b
set fk_id = d.id
from table_a a
join (
select distinct on (username) username, id
from table_a
order by 1, 2
) d using(username)
where a.id = b.fk_id;
Test it here.
The query used inside the update gives actual_id, username, desired_id:
select a.id actual_id, username, d.id desired_id
from table_a a
join (
select distinct on (username) username, id
from table_a
order by 1, 2
) d using(username)
actual_id | username | desired_id
-----------+----------+------------
1 | user1 | 1
2 | user1 | 1
3 | user1 | 1
4 | user2 | 4
5 | user2 | 4
6 | user3 | 6
(6 rows)

We define your tables:
CREATE TABLE tableA (id, username) AS
SELECT * FROM
(
VALUES
(1, 'user1'),
(2, 'user1'),
(3, 'user1'),
(4, 'user2'),
(5, 'user2'),
(6, 'user2')
) AS x ;
CREATE TABLE tableB (fk_id, resource) AS
SELECT * FROM
(
VALUES
(2, 'item1'),
(1, 'item3'),
(1, 'item2'),
(4, 'item5'),
(5, 'item8'),
(3, 'item9')
) AS x ;
With that info, you can create a (virtual) conversion table, and use it to update your data:
-- Using tableA, make a new table with the
-- minimum id for every username
WITH username_to_min_id AS
(
SELECT
min(id) AS min_id, username
FROM
tableA
GROUP BY
username
)
-- Convert the previous table to a id -> min_id
-- conversion table
, id_to_min_id AS
(
SELECT
id, min_id
FROM
tableA
JOIN username_to_min_id USING(username)
)
-- Use this conversion table to update tableB
UPDATE
tableB
SET
fk_id = min_id
FROM
id_to_min_id
WHERE
-- JOIN condition with table to update
id_to_min_id.id = tableB.fk_id
-- Take out the ones that won't change
AND (fk_id <> min_id)
RETURNING
* ;
The result you would get is:
+-------+----------+----+--------+
| fk_id | resource | id | min_id |
+-------+----------+----+--------+
| 1 | item1 | 2 | 1 |
| 1 | item9 | 3 | 1 |
| 4 | item8 | 5 | 4 |
+-------+----------+----+--------+
Shows you that three rows have been updated, that had fk_id = (2, 3, 5), and have now (1, 1, 4). (The id is the "old" fk_id value).
You can check it at http://rextester.com/EQPH47434
You can "squeeze everything" [change every virtual table name by its definition, and do a couple of SELECT optimizations] and get this equivalent query (probably less clear, yet totally equivalent):
UPDATE
tableB
SET
fk_id = min_id
FROM
tableA
JOIN
(
SELECT
min(id) AS min_id, username
FROM
tableA
GROUP BY
username
) AS username_to_min_id
USING (username)
WHERE
tableA.id = tableB.fk_id
AND (fk_id <> min_id)
RETURNING
* ;

Related

Separate comma values into individual values

I need to separate columns in SQL Server
Table: columnsseparates
CREATE TABLE [dbo].[columnsseparates](
[id] [varchar](50) NULL,
[name] [varchar](500) NULL
)
INSERT [dbo].[columnsseparates] ([id], [name]) VALUES (N'1,2,3,4', N'abc,xyz,mn')
GO
INSERT [dbo].[columnsseparates] ([id], [name]) VALUES (N'4,5,6', N'xy,yz')
GO
INSERT [dbo].[columnsseparates] ([id], [name]) VALUES (N'7,100', N'yy')
INSERT [dbo].[columnsseparates] ([id], [name]) VALUES (N'101', N'oo,yy')
GO
based on above data I want output like below:
id | Name
1 |abc
2 |xyz
3 |mn
4 |null
4 |xy
5 |yz
6 |null
7 |yy
100 |null
101 |oo
null |yy
How to achieve this task in SQL Server?

Storing non-atomic values in column is a sign that schema should be normalised.
Naive approach using PARSENAME(up to 4 comma separated values):
SELECT DISTINCT s.id, s.name
FROM [dbo].[columnsseparates]
CROSS APPLY(SELECT REVERSE(REPLACE(id,',','.')) id,REVERSE(REPLACE(name, ',','.')) name) sub
CROSS APPLY(VALUES (REVERSE(PARSENAME(sub.id,1)), REVERSE(PARSENAME(sub.name,1))),
(REVERSE(PARSENAME(sub.id,2)), REVERSE(PARSENAME(sub.name,2))),
(REVERSE(PARSENAME(sub.id,3)), REVERSE(PARSENAME(sub.name,3))),
(REVERSE(PARSENAME(sub.id,4)), REVERSE(PARSENAME(sub.name,4)))
) AS s(id, name)
ORDER BY s.id;
db<>fiddle demo
Output:
+------+------+
| id | name |
+------+------+
| | |
| | yy |
| 1 | abc |
| 100 | |
| 101 | oo |
| 2 | xyz |
| 3 | mn |
| 4 | |
| 4 | xy |
| 5 | yz |
| 6 | |
| 7 | yy |
+------+------+

If you have more than 4 values, then you'll to use a string splitter that can return the ordinal value. I use delimitedsplit8k_LEAD here:
WITH Ids AS(
SELECT cs.id,
cs.name,
DS.ItemNumber,
DS.Item
FROM dbo.columnsseparates cs
CROSS APPLY dbo.DelimitedSplit8K_LEAD (cs.id,',') DS),
Names AS (
SELECT cs.id,
cs.name,
DS.ItemNumber,
DS.Item
FROM dbo.columnsseparates cs
CROSS APPLY dbo.DelimitedSplit8K_LEAD (cs.[name],',') DS)
SELECT I.Item AS ID,
N.Item AS [Name]
FROM Ids I
FULL OUTER JOIN Names N ON I.id = N.id
AND I.ItemNumber = N.ItemNumber
ORDER BY CASE WHEN I.Item IS NULL THEN 1 ELSE 0 END,
TRY_CONVERT(int,I.Item);

How to Select all Entrys from one Table, and SUM a subset of another table

I have a larger Database with Times that employees entered. They enter an activity, when it was and how long they spent on it, as well as a customer.
I'm now trying to return a table with all employees, that Sums their times, but only if it's timed for a subset of Customers. I can get either a table with The Correct times, but employees that didn't enter any time are omitted, or I get all employees but with the sum time from all customers.
The tables I have are:
EMPLOYEE for the employees
ACTIVITY for all activities
CUSTOMER for the customers
To have some "example Data":
| EMPLOYEE | | ACTIVITY |
+------------+---------+ +------------+------------+------------+
| I_EMPLOYEE | S_NAME1 | | I_EMPLOYEE | I_CUSTOMER | N_DURETIME |
+------------+---------+ +------------+------------+------------+
| 1 | A | | 1 | 1 | 5 |
| 2 | B | | 2 | 3 | 10 |
| 3 | C | | 1 | 3 | 15 |
+------------+---------+ | 3 | 2 | 10 |
| 1 | 2 | 10 |
+------------+------------+------------+
What i'd expect to get when i want all times except Customer 2:
+----------+----------+
| EMPLOYEE | DURETIME |
+----------+----------+
| 1 | 20 |
| 2 | 10 |
| 3 | - |
+----------+----------+
I get either of those two out:
+----------+----------+ +----------+----------+
| EMPLOYEE | DURETIME | | EMPLOYEE | DURETIME |
+----------+----------+ +----------+----------+
| 1 | 20 | | 1 | 30 |
| 2 | 10 | | 2 | 10 |
+----------+----------+ | 3 | 10 |
+----------+----------+
To get the correct times i use the following:
SELECT emp.S_NAME1 AS Mitarbeiter, SUM(act.N_DURETIME)/60 as Zeit
FROM EMPLOYEE AS emp
LEFT JOIN ACTIVITY AS act on act.I_EMPLOYEE = emp.I_EMPLOYEE
LEFT JOIN CUSTOMER AS cust on cust.I_CUSTOMER = act.I_CUSTOMER
WHERE cust.CUSTNO NOT '2'
to get the full list of employees i used:
SELECT emp.S_NAME1 AS Mitarbeiter, SUM(act.N_DURETIME)/60 as Zeit
FROM EMPLOYEE AS emp
LEFT JOIN ACTIVITY AS act on act.I_EMPLOYEE = emp.I_EMPLOYEE
LEFT JOIN CUSTOMER AS cust on cust.I_CUSTOMER = act.I_CUSTOMER AND cust.CUSTNO NOT '2'
So, depending on whether I put my "Customer Filter" in the JOIN or the WHERE statement, I get half of the correct table. How can I combine those to get the correct output?

Create Table #emp
(
i_emp Int,
s_name1 Char(1)
)
Insert Into #emp Values
(1,'A'),
(2,'B'),
(3,'C')
Create Table #Activity
(
i_emp Int,
i_cust Int,
n_duretime Int
)
Insert Into #Activity Values
(1,1,5),
(2,3,10),
(1,3,15),
(3,2,10),
(1,2,10)
Query
Select
e.i_emp,
Sum(Case When a.i_cust = 2 Then Null Else a.n_duretime End) As durationTot
From
#emp e Left Join
#Activity a On e.i_emp = a.i_emp
Group By
e.i_emp
Result:
i_emp durationTot
1 20
2 10
3 NULL

You can try the following query
create table Employee(I_EMPLOYEE int, S_NAME1 char(1))
insert into Employee Values (1, 'A'),(2, 'B'),(3, 'C')
create table ACTIVITY (I_EMPLOYEE int, I_CUSTOMER int, N_DURETIME int)
insert into ACTIVITY Values(1, 1, 5 ),( 2, 3, 10), (1, 3, 15), ( 3, 2, 10), ( 1 , 2 , 10 )
select EMPLOYEE, sum(isnull(DURETIME, 0)) as DURETIME from(
select EMPLOYEE.S_NAME1 as EMPLOYEE, case I_Customer when 2 then 0 else N_DURETIME end as DURETIME from activity
inner join Employee on activity.I_EMPLOYEE = Employee.I_EMPLOYEE
)a group by EMPLOYEE
Below is the output
I_EMPLOYEE EMPLOYEE DURETIME
--------------------------------
1 A 20
2 B 10
3 C 0

How could I update multiple columns in Oracle with same id?

I am new in Oracle SQL and I am trying to make an update of a table with the next context:
I have a table A:
+---------+---------+---------+----------+
| ColumnA | name | ColumnC | Column H |
+---------+---------+---------+----------+
| 1 | Harry | null | null |
| 2 | Harry | null | null |
| 3 | Harry | null | null |
+---------+---------+---------+----------+
And a table B:
+---------+---------+---------+
| name | ColumnE | ColumnF |
+---------+---------+---------+
| Harry | a | d |
| Ron | b | e |
| Hermione| c | f |
+---------+---------+---------+
And I want to update the table A so that the result will be the next:
+---------+---------+---------+----------+
| ColumnA | name | ColumnC | Column H |
+---------+---------+---------+----------+
| 1 | Harry | a | d |
| 2 | Harry | a | d |
| 3 | Harry | a | d |
+---------+---------+---------+----------+
How could I do it?

merge into tableA a
using tableB b
on (a.name=b.name)
when matched then update set
columnC = b.columnE,
columnH = b.columnF
create table tableA (columnC varchar2(20), columnH varchar2(20), name varchar2(20), columnA number);
create table tableB (columnE varchar2(20), columnF varchar2(20), name varchar2(20));
insert into tableA values (null, null,'Harry',1);
insert into tableA values (null, null,'Harry',3);
insert into tableA values (null, null,'Harry',3);
insert into tableB values ('a', 'd','Harry');
insert into tableB values ('b', 'e','Ron');
insert into tableB values ('c', 'f','Hermione');
select * from tableA;
merge into tableA a
using tableB b
on (a.name=b.name)
when matched then update set
columnC = b.columnE,
columnH = b.columnF;
select * from tableA;
I got no error

UPDATE tableA t1
SET (ColumnC, ColumnH) = (SELECT t2.ColumnE, t2.ColumnF
FROM table2 t2
WHERE t1.name = t2.name)
WHERE EXISTS (
SELECT 1
FROM table2 t2
WHERE t1.name = t2.name)
This should work. You can refer to this answer for more info:
Oracle SQL: Update a table with data from another table

I think you can use below query and update your table A.
Update all rows with 'a' and 'd';
update table A
set (columnC , columnh ) = (SELECT COLUMNE,COLUMNF
FROM TABLE B
where b.name =a.name);
Alternatively you can also use:
UPDATE (SELECT T2.COLUMNE COLE,
T2.COLUMNF COLF,
T1.COLUMNC COLC,
T1.COLUMNH COLH
FROM tableB T2,
tableA T1
WHERE T1.NAME = T2.NAME)
SET COLC = COLE,
COLH = COLF ;
and Output is :
+---------+---------+---------+----------+
| ColumnA | name | ColumnC | Column H |
+---------+---------+---------+----------+
| 1 | Harry | a | d |
| 2 | Harry | a | d |
| 3 | Harry | a | d |
+---------+---------+---------+----------+

Split table in two tables plus a link table

I have a table with three columns with double values, but no double rows. Now I want to split this table in two table with unique values and a link table. I think the Problem gets clearer when I Show you example tables:
Original:
| ID | Column_1 | Column_2 | Column_3 |
|----|----------|----------|----------|
| 1 | A | 123 | A1 |
| 2 | A | 123 | A2 |
| 3 | B | 234 | A2 |
| 4 | C | 456 | A1 |
Table_1
| ID | Column_1 | Column_2 |
|----|----------|----------|
| 1 | A | 123 |
| 2 | B | 234 |
| 3 | C | 456 |
Table_2
| ID | Column_3 |
|----|----------|
| 1 | A1 |
| 2 | A2 |
Link-Table
| ID | fk1 | fk2 |
|----|-----|-----|
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 2 | 2 |
| 4 | 3 | 1 |
Table_1 I created like this:
INSERT INTO Table_1(Column_1, Column_2)
SELECT DISTINCT Column_1, Column_2 FROM Original
WHERE Original.Column_1 NOT IN (SELECT Column_1 FROM Table_1)
Table_2 I created in the same way.
The question now is, how to create the Link-Table?
The original table does grow continuesly, so only new entries should be added.
Do I have to use a Cursor, or is there a better way?
SOLUTION:
MERGE Link_Table AS LT
USING (SELECT DISTINCT T1.ID AS T1ID, T2.ID AS T2ID FROM Original AS O
INNER JOIN Table_1 AS T1 ON T1.Column_1 = O.Column_1
INNER JOIN Table_2 AS T2 ON T2.Column_3 = O.Column_3) AS U
ON LT.fk1 = U.T1ID
WHEN NOT MATCHED THEN
INSERT (fk1, fk2)
VALUES (U.T1ID, U.T2ID);

You can JOIN all 3 tables to get proper data for link table:
--INSERT INTO [Link-Table]
SELECT t1.ID,
t2.ID
FROM Original o
INNER JOIN Table_1 t1
ON t1.Column_1 = o.Column_1
INNER JOIN Table_2 t2
ON t2.Column_3 = o.Column_3
If your original table will grow, then you need to use MERGE to update/insert new data.

You have to inner join your Original,Table_1 and Table_2 to get the desired result.
Try like this, Its similar to gofr1 post.
DECLARE #orginal TABLE (
ID INT
,Column_1 VARCHAR(10)
,Column_2 INT
,Column_3 VARCHAR(10)
)
DECLARE #Table_1 TABLE (
ID INT
,Column_1 VARCHAR(10)
,Column_2 INT
)
DECLARE #Table_2 TABLE (
ID INT
,Column_3 VARCHAR(10)
)
Insert into #orginal values
(1,'A',123,'A1')
,(2,'A',123,'A2')
,(3,'B',234,'A2')
,(4,'C',456,'A1')
Insert into #Table_1 values
(1,'A',123)
,(2,'B',234)
,(3,'C',456)
Insert into #Table_2 values
(1,'A1')
,(2,'A2')
SELECT O.ID
,T1.ID
,T2.ID
FROM #orginal O
INNER JOIN #Table_1 T1 ON T1.Column_1 = O.Column_1
INNER JOIN #Table_2 T2 ON T2.Column_3 = O.Column_3

SQL statement - join based on date

I need to write a statement joining two tables based on dates.
Table 1 contains time recording entries.
+----+-----------+--------+---------------+
| ID | Date | UserID | DESC |
+----+-----------+--------+---------------+
| 1 | 1.10.2010 | 5 | did some work |
| 2 | 1.10.2011 | 5 | did more work |
| 3 | 1.10.2012 | 4 | me too |
| 4 | 1.11.2012 | 4 | me too |
+----+-----------+--------+---------------+
Table 2 contains the position of each user in the company. The ValidFrom date is the date at which the user has been or will be promoted.
+----+-----------+--------+------------+
| ID | ValidFrom | UserID | Pos |
+----+-----------+--------+------------+
| 1 | 1.10.2009 | 5 | PM |
| 2 | 1.5.2010 | 5 | Senior PM |
| 3 | 1.10.2010 | 4 | Consultant |
+----+-----------+--------+------------+
I need a query which outputs table one with one added column which is the position of the user at the time the entry has been made. (the Date column)
All date fileds are of type date.
I hope someone can help. I tried a lot but don't get it working.

Try this using a subselect in the where clause:
SQL Fiddle
MS SQL Server 2008 Schema Setup:
CREATE TABLE TimeRecord
(
ID INT,
[Date] Date,
UserID INT,
Description VARCHAR(50)
)
INSERT INTO TimeRecord
VALUES (1,'2010-01-10',5,'did some work'),
(2, '2011-01-10',5,'did more work'),
(3, '2012-01-10', 4, 'me too'),
(4, '2012-11-01',4,'me too')
CREATE TABLE UserPosition
(
ID Int,
ValidFrom Date,
UserId INT,
Pos VARCHAR(50)
)
INSERT INTO UserPosition
VALUES (1, '2009-01-10', 5, 'PM'),
(2, '2010-05-01', 5, 'Senior PM'),
(3, '2010-01-10', 4, 'Consultant ')
Query 1:
SELECT TR.ID,
TR.[Date],
TR.UserId,
TR.Description,
UP.Pos
FROM TimeRecord TR
INNER JOIN UserPosition UP
ON UP.UserId = TR.UserId
WHERE UP.ValidFrom = (SELECT MAX(ValidFrom)
FROM UserPosition UP2
WHERE UP2.UserId = UP.UserID AND
UP2.ValidFrom <= TR.[Date])
Results:
| ID | Date | UserId | Description | Pos |
|----|------------|--------|---------------|-------------|
| 1 | 2010-01-10 | 5 | did some work | PM |
| 2 | 2011-01-10 | 5 | did more work | Senior PM |
| 3 | 2012-01-10 | 4 | me too | Consultant |
| 4 | 2012-11-01 | 4 | me too | Consultant |

You can do it using OUTER APPLY:
SELECT ID, [Date], UserID, [DESC], x.Pos
FROM table1 AS t1
OUTER APPLY (
SELECT TOP 1 Pos
FROM table2 AS t2
WHERE t2.UserID = t1.UserID AND t2.ValidFrom <= t1.[Date]
ORDER BY t2.ValidFrom DESC) AS x(Pos)
For every row of table1 OUTER APPLY operation fetches all table2 rows of the same user that have a ValidFrom date that is older or the same as [Date]. These rows are sorted in descending order and the most recent of these is finally returned.
Note: If no match is found by the OUTER APPLY sub-query then a NULL value is returned, meaning that no valid position exists in table2 for the corresponding record in table1.
Demo here

This works by using a rank function and subquery. I tested it with some sample data.
select sub.ID,sub.Date,sub.UserID,sub.Description,sub.Position
from(
select rank() over(partition by t1.userID order by t2.validfrom desc)
as 'rank', t1.ID as'ID',t1.Date as'Date',t1.UserID as'UserID',t1.Descr
as'Description',t2.pos as'Position', t2.validfrom as 'validfrom'
from temployee t1 inner join jobs t2 on -- replace join tables with your own table names
t1.UserID=t2.UserID
) as sub
where rank=1

This query would work
select t1.*,t2.pos from Table1 t1 left outer join Table2 t2 on
t1.Date=t2.Date and t1.UserID=t2.UserID

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Update postgres table to squash duplicate values in second table - database

Related

Separate comma values into individual values

How to Select all Entrys from one Table, and SUM a subset of another table

How could I update multiple columns in Oracle with same id?

Split table in two tables plus a link table

SQL statement - join based on date

Categories

Resources