I have a table that looks like this
ID RefernceID Field1 Field2 Field3
-- ---------- ------ -------- -------
1 A01 Cat NULL Dog
2 A01 Cat Fish NULL
3 A02 Banana Apple NULL
4 A02 Banana NULL Mango
I'm trying to get this
ID RefernceID Field1 Field2 Field3
-- ---------- ------ -------- -------
1 A01 Cat Fish Dog
3 A02 Banana Apple Mango
So basically the rows are GROUPED by ReferenceID and Field 1 and then I want them to merge with the NULL's replaced.
Any help would be appreciated.
EDIT: Sorry, forgot to add that there are other columns as well (I just didn't mention the, and I still need one of the ID values.
You want aggregation :
select referenceid, field1, max(field2), max(field3)
from table t
group by referenceid, field1;
You can use a simple aggregation( max ) as :
select RefernceID,
min(ID) as ID,
max(field1) as field1,
max(field2) as field2,
max(field3) as field3
from tab
group by RefernceID
A neat trick is to use max or min, that just ignore nulls. So if you have only one non-null value, max will return it. Since you just need one of the ids, you could arbitrarily use min, which will return the result shown in the question:
SELECT MIN(id), referenceid, field1, MAX(field2), MAX(field3)
FROM mytable
GROUP BY referenceid, field1
Related
With these tables:
Table1
Date | Barcode | Sold | AmountSold
Table2
Barcode | Description | RetailPrice
00001 Item1 1.00
00002 Item2 2.00
00003 Item3 3.00
00004 Item4 4.00
00005 Item5 5.00
Is there a way to use an INSERT to Table1, like this:
INSERT INTO dbo.Table1
VALUES ('07/11/2017', '00003', 5, (? * 5))
With the ? being the RetailPrice (which is 3.00) of 00003 from Table2, then multiplied with Sold (which is 5)?
I have stumbled upon INSERT INTO SELECT, but this requires that all column that will be inserted will have a matching value from SELECT, which I do not need.
Note: the first three values will come from an external source, so the 4th value will be the only one that need to come from another table
I can of course use another query first to get the RetailPrice before inserting, but I'm avoiding to use this way to reduce loading time.
I believe that you are after something like this one:
INSERT INTO dbo.Table1 (Date, Barcode , Sold , AmountSold)
SELECT '07/11/2017', '00003', 5, 5 * RetailPrice
FROM Table2
-- WHERE Barcode = 'XXX'
INSERT INTO dbo.table1
VALUES ('07/11/2017', '00003', 5, ((SELECT RetailPrice
FROM dbo.table2
WHERE dbo.table2.Barcode = '00003') * 5))
We need to mask some Personally Identifiable Information in our Oracle 10g database. The process I'm using is based on another masking script that we are using for Sybase (which works fine), but since the information in the Oracle and Sybase databases is quite different, I've hit a bit of a roadblock.
The process is to select all data out of the PERSON table, into a PERSON_TRANSFER table. We then use a random number to select a random name from the PERSON_TRANSFER table, and then update the PERSON table with that random name. This works fine in Sybase because there is only one row per person in the PERSON table.
The issue I've encountered is that in the Oracle DB, there are multiple rows per PERSON, and the name may or may not be different for each row, e.g.
|PERSON|
:-----------------:
|PERSON_ID|SURNAME|
|1 |Purple |
|1 |Purple |
|1 |Pink | <--
|2 |Gray |
|2 |Blue | <--
|3 |Black |
|3 |Black |
The PERSON_TRANSFER is a copy of this table. The table is in the millions of rows, so I'm just giving a very basic example here :)
The logic I'm currently using would just update all rows to be the same for that PERSON_ID, e.g.
|PERSON|
:-----------------:
|PERSON_ID|SURNAME|
|1 |Brown |
|1 |Brown |
|1 |Brown | <--
|2 |White |
|2 |White | <--
|3 |Red |
|3 |Red |
But this is incorrect as the name that is different for that PERSON_ID needs to be masked differently, e.g.
|PERSON|
:-----------------:
|PERSON_ID|SURNAME|
|1 |Brown |
|1 |Brown |
|1 |Yellow | <--
|2 |White |
|2 |Green | <--
|3 |Red |
|3 |Red |
How do I get the script to update the distinct names separately, rather than just update them all based on the PERSON_ID? My script currently looks like this
DECLARE
v_SURNAME VARCHAR2(30);
BEGIN
select pt.SURNAME
into v_SURNAME
from PERSON_TRANSFER pt
where pt.PERSON_ID = (SELECT PERSON_ID FROM
( SELECT PERSON_ID FROM PERSON_TRANSFER
ORDER BY dbms_random.value )
WHERE rownum = 1);
END;
Which causes an error because too many rows are returned for that random PERSON_ID.
1) Is there a more efficient way to update the PERSON table so that names are randomly assigned?
2) How do I ensure that the PERSON table is masked correctly, in that the various surnames are kept distinct (or the same, if they are all the same) for any single PERSON_ID?
I'm hoping this is enough information. I've simplified it a fair bit (the table has a lot more columns, such as First Name, DOB, TFN, etc.) in the hope that it makes the explanation easier.
Any input/advice/help would be greatly appreciated :)
Thanks.
One of the complications is that the same surname may appear under different person_id's in the PERSON table. You may be better off using a separate, auxiliary table holding surnames that are distinct (for example you can populate it by selecting distinct surnames from PERSONS).
Setup:
create table persons (person_id, surname) as (
select 1, 'Purple' from dual union all
select 1, 'Purple' from dual union all
select 1, 'Pink' from dual union all
select 2, 'Gray' from dual union all
select 2, 'Blue' from dual union all
select 3, 'Black' from dual union all
select 3, 'Black' from dual
);
create table mask_names (person_id, surname) as (
select 1, 'Apple' from dual union all
select 2, 'Banana' from dual union all
select 3, 'Grape' from dual union all
select 4, 'Orange' from dual union all
select 5, 'Pear' from dual union all
select 6, 'Plum' from dual
);
commit;
CTAS to create PERSON_TRANSFER:
create table person_transfer (person_id, surname) as (
select ranked.person_id, rand.surname
from ( select person_id, surname,
dense_rank() over (order by surname) as rk
from persons
) ranked
inner join
( select surname, row_number() over (order by dbms_random.value()) as rnd
from mask_names
) rand
on ranked.rk = rand.rnd
);
commit;
Outcome:
SQL> select * from person_transfer order by person_id, surname;
PERSON_ID SURNAME
---------- -------
1 Pear
1 Pear
1 Plum
2 Banana
2 Grape
3 Apple
3 Apple
Added at OP's request: The scope has been extended - the requirement now is to update surname in the original table (PERSONS). This can be best done with the merge statement and the join (sub)query I demonstrated earlier. This works best when the PERSONS table has a PK, and indeed the OP said the real-life table PERSONS has such a PK, made up of the person_id column and an additional column, date_from. In the script below, I drop persons and recreate it to include this additional column. Then I show the query and the result.
Note - a mask_names table is still needed. A tempting alternative would be to just shuffle the surnames already present in persons so there would be no need for a "helper" table. Alas that won't work. For example, in a trivial example persons has only one row. To obfuscate surnames, one MUST come up with surnames not in the original table. More interestingly, assume every person_id has exactly two rows, with distinct surnames, but those surnames in every case are 'John' and 'Mary'. It doesn't help to just shuffle those two names. One does need a "helper" table like mask_names.
New setup:
drop table persons;
create table persons (person_id, date_from, surname) as (
select 1, date '2016-01-04', 'Purple' from dual union all
select 1, date '2016-01-20', 'Purple' from dual union all
select 1, date '2016-03-20', 'Pink' from dual union all
select 2, date '2016-01-24', 'Gray' from dual union all
select 2, date '2016-03-21', 'Blue' from dual union all
select 3, date '2016-04-02', 'Black' from dual union all
select 3, date '2016-02-13', 'Black' from dual
);
commit;
select * from persons;
PERSON_ID DATE_FROM SURNAME
---------- ---------- -------
1 2016-01-04 Purple
1 2016-01-20 Purple
1 2016-03-20 Pink
2 2016-01-24 Gray
2 2016-03-21 Blue
3 2016-04-02 Black
3 2016-02-13 Black
7 rows selected.
New query and result:
merge into persons p
using (
select ranked.person_id, ranked.date_from, rand.surname
from (
select person_id, date_from, surname,
dense_rank() over (order by surname) as rk
from persons
) ranked
inner join (
select surname, row_number() over (order by dbms_random.value()) as rnd
from mask_names
) rand
on ranked.rk = rand.rnd
) t
on (p.person_id = t.person_id and p.date_from = t.date_from)
when matched then update
set p.surname = t.surname;
commit;
select * from persons;
PERSON_ID DATE_FROM SURNAME
---------- ---------- -------
1 2016-01-04 Apple
1 2016-01-20 Apple
1 2016-03-20 Orange
2 2016-01-24 Plum
2 2016-03-21 Grape
3 2016-04-02 Banana
3 2016-02-13 Banana
7 rows selected.
I'm trying to write a SQL query that will return a list of aggregated values; however, I want to group the query by one of the aggregated values (a count):
select t.Field1, count(distinct(t.Field2), SUM(t.Value1)
from MyTable t
group by t.Field1, count(t.Field2)
I've tried putting the count into a subquery, and putting the whole query into a subquery and grouping there. Is there an way to do this that doesn't involve creating a temporary table (I don't have anything against temporary tables per se).
The desired outcome would look like this:
Field1 Count Sum
----------------------------------------------------
CAT1 3 19.5
CAT1 2 100
CAT2 2 62
The data that I'm working with looks like this:
Field1 Field2 Field3 Value1
-----------------------------------------------------
CAT1 1 1 5
CAT1 2 1 2.5
CAT1 3 1 12
CAT1 4 2 50
CAT1 5 2 50
CAT2 6 3 50
CAT2 7 3 12
So, I want a grouping by the number of distinct Field2 values per Field3
If I understand you correctly, then the follow should work.
select Field1 , Count , Sum(Value1)
from
(
select t.Field1, count(*) as Count, SUM(t.Value1) as Value1
from MyTable t
group by t.Field1, t.Field3
)
as t2
group by Field1, Count
How can I get an average value and one other value from the same column into two different columns in a new table?
I have this:
Person_ID col2 col3_values
1 101010A 20000
1 101010B 30000
2 101010A 25000
2 101010B 30000
3 101010A 22000
3 101010B 24000
And I want a table that average col3_values with ID:s from col1_ID (1,2,3) and then compare this average value with a column wich holds the col1_ID: value like this:
col2 AVG(value personID_1-3) Value PersonID_1
101010 A 22333 20000
101010 B 28000 30000
I have tried a lot of code but nothing had worked. Can someone please help me with this? If this worked I would be grateful if I also could get a fourth column thay show the difference between the averagecolumn and the third column that hold ID_1:s values.
There's many ways to do this, one would be to use the outer apply construct:
select
col2,
AVG(t.col3_values) as "AVG(value personID_1-3)",
a.col3_values as "Value PersonID_1",
AVG(t.col3_values) - a.col3_values as "Difference"
from your_table t
outer apply (
select col3_values from your_table where Person_ID = 1 and t.col2 = col2
) a
group by col2, a.col3_values
Or you could use a correlated subquery:
select
col2,
AVG(t.col3_values) as "AVG(value personID_1-3)",
(
select col3_values from your_table where Person_ID = 1 and t.col2 = col2
) as "Value PersonID_1"
from your_table t
group by col2
Sample output:
Query 1:
col2 AVG(value personID_1-3) Value PersonID_1 Difference
-------------------- ----------------------- ---------------- -----------
101010A 22333 20000 2333
101010B 28000 30000 -2000
Query 2:
col2 AVG(value personID_1-3) Value PersonID_1
-------------------- ----------------------- ----------------
101010A 22333 20000
101010B 28000 30000
I'm working with SQL Server 2005 and looking to export some data off of a table I have. However, prior to do that I need to update a status column based upon a field called "VisitNumber", which can contain multiple entries same value entries. I have a table set up in the following manner. There are more columns to it, but I am just putting in what's relevant to my issue
ID Name MyReport VisitNumber DateTimeStamp Status
-- --------- -------- ----------- ----------------------- ------
1 Test John Test123 123 2014-01-01 05.00.00.000
2 Test John Test456 123 2014-01-01 07.00.00.000
3 Test Sue Test123 555 2014-01-02 08.00.00.000
4 Test Ann Test123 888 2014-01-02 09.00.00.000
5 Test Ann Test456 888 2014-01-02 10.00.00.000
6 Test Ann Test789 888 2014-01-02 11.00.00.000
Field Notes
ID column is a unique ID in incremental numbers
MyReport is a text value and can actually be thousands of characters. Shortened for simplicity. In my scenario the text would be completely different
Rest of fields are varchar
My Goal
I need to address putting in a status of "F" for two conditions:
* If there is only one VisitNumber, update the status column of "F"
* If there is more than one visit number, only put "F" for the one based upon the earliest timestamp. For the other ones, put in a status of "A"
So going back to my table, here is the expectation
ID Name MyReport VisitNumber DateTimeStamp Status
-- --------- -------- ----------- ----------------------- ------
1 Test John Test123 123 2014-01-01 05.00.00.000 F
2 Test John Test456 123 2014-01-01 07.00.00.000 A
3 Test Sue Test123 555 2014-01-02 08.00.00.000 F
4 Test Ann Test123 888 2014-01-02 09.00.00.000 F
5 Test Ann Test456 888 2014-01-02 10.00.00.000 A
6 Test Ann Test789 888 2014-01-02 11.00.00.000 A
I was thinking I could handle this by splitting each types of duplicates/triplicates+ (2,3,4,5). Then updating every other (or every 3,4,5 rows). Then delete those from the original table and combine them together to export the data in SSIS. But I am thinking there is a much more efficient way of handling it.
Any thoughts? I can accomplish this by updating the table directly in SQL for this status column and then export normally through SSIS. Or if there is some way I can manipulate the column for the exact conditions I need, I can do it all in SSIS. I am just not sure how to proceed with this.
WITH cte AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY VisitNumber ORDER BY DateTimeStamp) rn from MyTable
)
UPDATE cte
SET [status] = (CASE WHEN rn = 1 THEN 'F' ELSE 'A' END)
I put together a test script to check the results. For your purposes, use the update statements and replace the temp table with your table name.
create table #temp1 (id int, [name] varchar(50), myreport varchar(50), visitnumber varchar(50), dts datetime, [status] varchar(1))
insert into #temp1 (id,[name],myreport,visitnumber, dts) values (1,'Test John','Test123','123','2014-01-01 05:00')
insert into #temp1 (id,[name],myreport,visitnumber, dts) values (2,'Test John','Test456','123','2014-01-01 07:00')
insert into #temp1 (id,[name],myreport,visitnumber, dts) values (3,'Test Sue','Test123','555','2014-01-01 08:00')
insert into #temp1 (id,[name],myreport,visitnumber, dts) values (4,'Test Ann','Test123','888','2014-01-01 09:00')
insert into #temp1 (id,[name],myreport,visitnumber, dts) values (5,'Test Ann','Test456','888','2014-01-01 10:00')
insert into #temp1 (id,[name],myreport,visitnumber, dts) values (6,'Test Ann','Test789','888','2014-01-01 11:00')
select * from #temp1;
update #temp1 set status = 'F'
where id in (
select id from #temp1 t1
join (select min(dts) as mindts, visitnumber
from #temp1
group by visitNumber) t2
on t1.visitnumber = t2.visitnumber
and t1.dts = t2.mindts)
update #temp1 set status = 'A'
where id not in (
select id from #temp1 t1
join (select min(dts) as mindts, visitnumber
from #temp1
group by visitNumber) t2
on t1.visitnumber = t2.visitnumber
and t1.dts = t2.mindts)
select * from #temp1;
drop table #temp1
Hope this helps