How to split column into separate columns based on same ID? - sql-server

I have a dataset that looks like this and is called ALL_SOURCES_MERGE
OVID
ID_VALUE
NAME
SOURCE
LOCATION
MLP_ID
TYPE
686B93E0-A047-4BF5-B6B1-A775C20390EF
ABC
PETE
SOURCE_1
USA
1X
ZZ
7374F6A8-352D-44CF-8FA2-EBCB205EE18B
DEF
ADAM
SOURCE_2
ZAF
2X
SS
686B93E0-A047-4BF5-B6B1-A775C20390EF
GHI
PETE
SOURCE_3
ZAF
3X
QQ
27C063A0-E8DD-4808-8CC5-12108C380DD4
JKL
JAKE
SOURCE_1
RUS
4X
EE
7374F6A8-352D-44CF-8FA2-EBCB205EE18B
MNO
LEAH
SOURCE_1
JAP
5X
RR
686B93E0-A047-4BF5-B6B1-A775C20390EF
ABC
MEAH
SOURCE_4
CHN
6X
GG
I want to see per ID where the [ID_VALUE] matches/differs from the [ID_VALUE]
& where the [NAME] matches/differs from the [NAME]
Essentially I want to see all the data per ID - I want to split the columns.
Something in the line of:
OVID
SOURCE_1_ID
SOURCE_2_ID
SOURCE_3_ID
SOURCE_4_ID
SOURCE_1_NAME
SOURCE_2_NAME
SOURCE_3_NAME
SOURCE_4_NAME
SOURCE_1_LOCATION
SOURCE_2_LOCATION
SOURCE_3_LOCATION
SOURCE_4_LOCATION
SOURCE_1_MLP_ID
SOURCE_2_MLP_ID
SOURCE_3_MLP_ID
SOURCE_4_MLP_ID
SOURCE_1_TYPE
SOURCE_2_TYPE
SOURCE_3_TYPE
SOURCE_4_TYPE
686B93E0-A047-4BF5-B6B1-A775C20390EF
ABC
NULL
GHI
ABC
PETE
NULL
PETE
MEAH
USA
NULL
ZAF
CHN
1X
NULL
3X
6X
ZZ
NULL
QQ
GG
7374F6A8-352D-44CF-8FA2-EBCB205EE18B
MNO
DEF
NULL
NULL
LEAH
ADAM
NULL
NULL
JAP
ZAF
NULL
NULL
5X
2X
NULL
NULL
RR
SS
NULL
NULL
27C063A0-E8DD-4808-8CC5-12108C380DD4
ABC
NULL
NULL
NULL
JAKE
NULL
NULL
NULL
RUS
NULL
NULL
NULL
4X
NULL
NULL
NULL
EE
NULL
NULL
NULL
In summary, what I want to check is where one ID has different names AND/OR different ID_VALUES.
I think I will be able to compare if I split the columns ID_VALUES and NAME into separate columns.
I have tried pivoting the ID_VALUE AND NAME columns but the result is a table that has a structure like the desired outcome but all entries are NULL in the SOURCE_#ID and SOURCE#_NAME columns.

Related

Fill columns which have priority over other columns in SQL Server

I have a table like this:
id
mail_1
mail_2
mail_3
1
john
john_v2
john_v3
2
clarisse
NULL
clarisse_company
3
NULL
julie
NULL
4
mark
markus_91
NULL
5
alfred
NULL
NULL
And I would like to achieve that:
id
mail_1
mail_2
mail_3
1
john
john_v2
john_v3
2
clarisse
clarisse_company
NULL
3
julie
NULL
NULL
4
mark
markus_91
NULL
5
alfred
NULL
NULL
As you can see, if mail_2 or mail_3 are not null and mail_1 is null, mail_1 should be fulfilled. The thing here is if the id has two mails, this two mails must be in mail_1 and mail_2, not in mail_2 and mail_3 nor mail_1 and mail_3. If an id has just one mail, this mail must be in mail_1.
So the logic here is that mail_1 has priority over the other two, and mail_2 has priority over mail_3.
How could I achieve that in SQL Server (version 15)?
This should do. Just play by changing the values of the table variable below
declare #temp table(mail_1 varchar(20),mail_2 varchar(20),mail_3 varchar(20))
insert into #temp values(null,'middlename','lastname')
select coalesce(mail_1,mail_2,mail_3) as mail_1,
case when mail_1
is null and mail_2 is not null then mail_3
when mail_1
is not null and mail_2 is null
then
mail_3
else mail_2 end mail_2,
case when (mail_1 is null or mail_2 is null) then null else mail_3 end mail_3
from #temp

How to aggregate several columns into a JSON file in HIVE and avoid nulls

user_id reservation_id nights price
--------------------------------------
AAA 10001 1 100
AAA 10002 1 120
BBB 20003 7 350
ccc 10005 150
DDD 10007 3
CCC 10006 5
to
user_id reservation_details
AAA [{"nights":"1", "price":"100"}, {"nights":"1","price":"120"}]
BBB [{"nights":"7", "price":"350"}]
CCC [{"price":"150"}, {"nights":"3"}]
DDD [{"nights":"5"}]
Here my query is
select user_id
,concat("
{",concat_ws(',',collect_list(concat(string(reservation_id),":
{'nights':",string(nights),",'price':",string(price),"}"))),"}") as
reservation_details
from mytable
group by user_id
I want to eliminate the columns which have value as nulls and convert that single quotes into double quotes which looks like a exact JSON.
Using in-built datatypes map and array along with a case expression to handle nulls.
select user_id,collect_list(map_nights_price)
from (select user_id,
case when nights is null then map('price',price)
when price is null then map('nights',nights)
else map('nights',nights,'price',price) end as map_nights_price
from mytable
where not (price is null and nights is null) --ignore row where price and nights are null
) t
group by user_id

Insert date ranges for each Row with from-to dates, Add a fixed value to rows as per date change & Payment recieved adjusting to fixed value

I am using VB.NET and SQL Server 2012.
I have a SQL Database named DB_COLLECTOR,
as well as a table named Fee_Payment,
with 6 columns named:
'S_No' Int Primary key identity(1,1),
'Date_Start' datetime Null,
'Date_End' datetime Null,
'Prefixed_Fee' decimal(10) Null,
'Paid_Amount' decimal(10) Null,
'Balance' decimal(10) Null
I also have 2 Forms:
The 1st Form takes Person Name, Deal_Start_Date, Deal_End_Date, Monthly_Fee and saves it into a Customer_Master table in the database.
And, another form, which takes Person Name, Payment_Amount and saves that 'Fee_Payment' table.
The problem:
If I enter a Deal_Start_Date of 01/04/2018 and a Deal_End_Date of 31/03/2019, for a customer named "John" who has a monthly fee of $50.00 on the 1st Form, there are three things that should happen:
It should automatically add 12 Rows into the Fee_Payment table with each row having the Date_Start as 01-Apr-2018, and Date_End as 30-Apr-2018, apart from the relevant data saved to Customer_Master. Like this:
S_No Date_Start Date_End Prefixed_Fee Paid_Amount Balance
---- ---------- -------- ------------ ----------- -------
1 01-Apr-2018 30-Apr-2018 Null Null Null
2 01-May-2018 31-May-2018 Null Null Null
3 01-Jun-2018 30-Jun-2018 Null Null Null
.. .... .... .. .. ..
12 01-Mar-2019 31-Mar-2019 Null Null Null
It should, also, automatically replace Null with $50.00 in the Prefixed_Fee Column of the Apr-2018 row (the top one) if the date has come under the range of 01-Apr-2018 to 30-Apr-2018 i.e. if the current date is 02-Apr-2018. Like this:
S_No Date_Start Date_End Prefixed_Fee Paid_Amount Balance
---- ---------- -------- ------------ ----------- -------
1 01-Apr-2018 30-Apr-2018 50.00 Null 50.00
2 01-May-2018 31-May-2018 Null Null Null
3 01-Jun-2018 30-Jun-2018 Null Null Null
.. .... .... .. .. ..
12 01-Mar-2019 31-Mar-2019 Null Null Null
So, if the customer, John, had paid $125.00, it should allocate $50.00 (from $125.00) to the first row, by filling the amount mentioned in Prefixed_Fee and then from the Balance, put $50.00 into second row and, finally, the Balance amount of $25.00 to the third row.
S_No Date_Start Date_End Prefixed_Fee Paid_Amount Balance
---- ---------- -------- ------------ ----------- -------
1 01-Apr-2018 30-Apr-2018 50.00 50.00 0.00
2 01-May-2018 31-May-2018 50.00 50.00 0.00
3 01-Jun-2018 30-Jun-2018 50.00 25.00 25.00
.. .... .... .. .. ..
12 01-Mar-2019 31-Mar-2019 Null Null Null
How do I do this?

deleting duplicates based on value of another column

I have a table with 3 columns and the first column is 'name'. Some names are entered twice, some 3 times and some more than that. I would like to keep only one value for each name and delete the extra rows based on the values of Column 2 and 3. If column 2 and 3 are null, I would like to delete that row.
There are no primary keys or id column.
There are about 2.75 million rows in the table.
Would like to delete using one query(preferably) in SQL 14. Can someone help please?
Name column2 column3
Suzy english null
Suzy null null
Suzy null 5
John null null
John 7 7
George null benson
George null null
George benson null
George 5 benson
Would like to have it as:
Name column2 column3
Suzy english null
Suzy null 5
John 7 7
George benson null
George 5 benson
Many thanks in advance.
Use partitions over name with the appropriate order by:
WITH cte as (
SELECT ROW_NUMBER()
OVER (PARTITION BY name
ORDER BY case
when column1 = 'null' and column2 = 'null' then 3
when column2 = 'null' then 2
when column1 = 'null' then 1
else 0 end
) num
FROM mytable
)
delete from cte where num > 1
This deletes duplicates, keeping in order of preference, rows with:
both column1 and column2 not null (random one kept if there are multiple of these)
column1 not null
column2 not null
both column1 and column2 null
Note that is query assumes (based on comments to question) that your "null" values are actually the text string "null" and not an SQL null.
If they were actually nulls, replace = 'null' with IS NULL.
Delete from yourtable
where column2 is null and column3 is null
above query is Based on this..
I would like to keep only one value for each name and delete the extra rows based on the values of Column 2 and 3. If column 2 and 3 are null, I would like to delete that row

SQL: Update Nulls with Previous Timeframe Values

I currently have a resulting table in SQL that shows during which time period (college semester) a person's address changed This doesn't happen every time period, so some rows are showing null as expected I am needing these to update ("fill down") for each subsequent time period until a new address change is entered
I included these two IDs because they represent the two possible cases of what I am seeing:
-ID 1234 should fill the preceding Terms with the Sequence 1 county (shown here)
-ID 5678 should fill the preceding Terms with the Sequence 1 county as well (CLAY in this case) based on a previously joined table
Currently, I am showing something along the lines of:
ID TERM COUNTY SEQUENCE
------------------------------------------
1234 201308 null null
1234 201401 null null
1234 201408 ORANGE 1
1234 201501 null null
1234 201505 null null
1234 201508 OSCEOLA 3
1234 201601 null null
5678 201301 null null
5678 201305 null null
5678 201308 ST JOHNS 3
5678 201401 null null
5678 201405 null null
5678 201408 null null
5678 201501 null null
5678 201505 DUVAL 4
And I need the output to look like:
ID TERM COUNTY SEQUENCE
---------------------------------------------
1234 201308 ORANGE null
1234 201401 ORANGE null
1234 201408 ORANGE 1
1234 201501 ORANGE null
1234 201505 ORANGE null
1234 201508 OSCEOLA 3
1234 201601 OSCEOLA null
5678 201301 CLAY null
5678 201305 CLAY null
5678 201308 ST JOHNS 3
5678 201401 ST JOHNS null
5678 201405 ST JOHNS null
5678 201408 ST JOHNS null
5678 201501 ST JOHNS null
5678 201505 DUVAL 4
This is my first time coming across an update clause need like this, so any insight you may be able to provide will be greatly appreciated!
*I am not sure how much of the previous code will be relevant, but here is essentially the temp table code that feeds into the final output ("PIDM" is the ID):
DROP TABLE #ADDRESS_PT_1--, #ADDRESS_PT_2
GO
SELECT SPRADDR_PIDM 'PIDM', Y.TERM, SPRADDR_SEQNO 'SEQNO', SPRADDR_STAT_CODE 'STATE', SPRADDR_CNTY_CODE 'CNTY',
BANNR_TERM = CASE
WHEN Y2.TERM IS NULL THEN Y.TERM
ELSE Y2.TERM
END
INTO #ADDRESS_PT_1
FROM SPRADDR
LEFT OUTER JOIN RCACYR Y2
ON (SPRADDR_ACTIVITY_DATE BETWEEN Y2.BEGIN_DATE AND Y2.END_DATE
AND SUBSTRING(Y2.TERM,5,2) IN ('50','80','10')),
RCACYR Y
WHERE SPRADDR_ACTIVITY_DATE BETWEEN Y.BEGIN_DATE AND Y.END_DATE
AND SUBSTRING(Y.TERM,5,2) IN ('05','08','01')
AND SPRADDR_ATYP_CODE = 'MA'
ORDER BY SPRADDR_PIDM, SPRADDR_SEQNO
GO
/* Get the individuals addresses for each term */
SELECT *
--INTO #ADDRESS_PT_2
FROM #ADDRESS_PT_1 X
LEFT JOIN RCCNTY C
ON C.COUNTY = X.CNTY
WHERE X.SEQNO = (SELECT MAX(A.SEQNO)
FROM #ADDRESS_PT_1 A
WHERE X.PIDM = A.PIDM
AND X.TERM = A.TERM)
--AND X.PIDM = 5678
ORDER BY X.PIDM, X.SEQNO
GO
The output from this is:
PIDM TERM SEQNO STATE CNTY BANNR_TERM COUNTY COUNTY_TITLE COUNTY_REGION COUNTY_REGION_TITLE
5678 201108 1 FL CLAY 201108 CLAY CLAY 2 Northeast Florida
5678 201308 3 FL ST J 201308 ST J ST. JOHNS 2 Northeast Florida
5678 201505 5 FL DUVA 201550 DUVA DUVAL 2 Duval County
I put in CTE the sample you provided. Then I OUTER APPLY (p) previous row with NOT NULL COUNTY, and another OUTER APPLY that gets the row with [SEQUENCE] = 1 for each ID. Instead of FROM cte in last OUTER APPLY use table (FROM SPRADDR) that have rows with [SEQUENCE] = 1 which might not be in CTE.
;WITH cte AS (
SELECT *
FROM (VALUES
(1234, 201308, null, null),
(1234, 201401, null, null),
(1234, 201408, 'ORANGE', 1),
(1234, 201501, null, null),
(1234, 201505, null, null),
(1234, 201508, 'OSCEOLA', 3),
(1234, 201601, null, null),
(5678, 201301, null, null),
(5678, 201305, null, null),
(5678, 201308, 'ST JOHNS', 3),
(5678, 201401, null, null),
(5678, 201405, null, null),
(5678, 201408, null, null),
(5678, 201501, null, null),
(5678, 201505, 'DUVAL', 4)
) as t(ID, TERM, COUNTY, [SEQUENCE])
)
SELECT c.ID,
c.TERM,
COALESCE(c.COUNTY,p.COUNTY,p1.COUNTY) as COUNTY,
c.[SEQUENCE]
FROM cte c
OUTER APPLY (
SELECT TOP 1 COUNTY
FROM cte
WHERE ID = c.ID
AND TERM < c.TERM
AND COUNTY IS NOT NULL
ORDER BY TERM DESC) as p
OUTER APPLY (
SELECT TOP 1 COUNTY
FROM cte
WHERE ID = c.ID
AND [SEQUENCE] = 1
ORDER BY TERM DESC) as p1
Will give you:
ID TERM COUNTY SEQUENCE
1234 201308 ORANGE NULL
1234 201401 ORANGE NULL
1234 201408 ORANGE 1
1234 201501 ORANGE NULL
1234 201505 ORANGE NULL
1234 201508 OSCEOLA 3
1234 201601 OSCEOLA NULL
5678 201301 NULL NULL
5678 201305 NULL NULL
5678 201308 ST JOHNS 3
5678 201401 ST JOHNS NULL
5678 201405 ST JOHNS NULL
5678 201408 ST JOHNS NULL
5678 201501 ST JOHNS NULL
5678 201505 DUVAL 4
See an example:
SELECT * into tbl_filltest FROM (
VALUES (1,Null),(2,Null),(3,5),(4,Null),(5,Null),(6,Null),(7,4),(8,Null),9,Null),(10,1)
) as t(c1,c2)
GO
SELECT * FROM tbl_filltest
GO
;WITH GoodValues as (SELECT * FROM tbl_filltest WHERE c2 is not null),
NullValues as (SELECT * FROM tbl_filltest WHERE c2 is null)
UPDATE n SET c2 = g1.c2 FROM GoodValues as g1
OUTER APPLY (SELECT MAX(c1) as Min_c1 FROM GoodValues as i WHERE g1.c1 > i.c1) as g2
INNER JOIN NullValues as n
ON n.c1 > IsNull(g2.Min_c1,0) and n.c1 < g1.c1
GO
SELECT * FROM tbl_filltest
GO

Resources