SQL: Update Nulls with Previous Timeframe Values - sql-server

I currently have a resulting table in SQL that shows during which time period (college semester) a person's address changed This doesn't happen every time period, so some rows are showing null as expected I am needing these to update ("fill down") for each subsequent time period until a new address change is entered
I included these two IDs because they represent the two possible cases of what I am seeing:
-ID 1234 should fill the preceding Terms with the Sequence 1 county (shown here)
-ID 5678 should fill the preceding Terms with the Sequence 1 county as well (CLAY in this case) based on a previously joined table
Currently, I am showing something along the lines of:
ID TERM COUNTY SEQUENCE
------------------------------------------
1234 201308 null null
1234 201401 null null
1234 201408 ORANGE 1
1234 201501 null null
1234 201505 null null
1234 201508 OSCEOLA 3
1234 201601 null null
5678 201301 null null
5678 201305 null null
5678 201308 ST JOHNS 3
5678 201401 null null
5678 201405 null null
5678 201408 null null
5678 201501 null null
5678 201505 DUVAL 4
And I need the output to look like:
ID TERM COUNTY SEQUENCE
---------------------------------------------
1234 201308 ORANGE null
1234 201401 ORANGE null
1234 201408 ORANGE 1
1234 201501 ORANGE null
1234 201505 ORANGE null
1234 201508 OSCEOLA 3
1234 201601 OSCEOLA null
5678 201301 CLAY null
5678 201305 CLAY null
5678 201308 ST JOHNS 3
5678 201401 ST JOHNS null
5678 201405 ST JOHNS null
5678 201408 ST JOHNS null
5678 201501 ST JOHNS null
5678 201505 DUVAL 4
This is my first time coming across an update clause need like this, so any insight you may be able to provide will be greatly appreciated!
*I am not sure how much of the previous code will be relevant, but here is essentially the temp table code that feeds into the final output ("PIDM" is the ID):
DROP TABLE #ADDRESS_PT_1--, #ADDRESS_PT_2
GO
SELECT SPRADDR_PIDM 'PIDM', Y.TERM, SPRADDR_SEQNO 'SEQNO', SPRADDR_STAT_CODE 'STATE', SPRADDR_CNTY_CODE 'CNTY',
BANNR_TERM = CASE
WHEN Y2.TERM IS NULL THEN Y.TERM
ELSE Y2.TERM
END
INTO #ADDRESS_PT_1
FROM SPRADDR
LEFT OUTER JOIN RCACYR Y2
ON (SPRADDR_ACTIVITY_DATE BETWEEN Y2.BEGIN_DATE AND Y2.END_DATE
AND SUBSTRING(Y2.TERM,5,2) IN ('50','80','10')),
RCACYR Y
WHERE SPRADDR_ACTIVITY_DATE BETWEEN Y.BEGIN_DATE AND Y.END_DATE
AND SUBSTRING(Y.TERM,5,2) IN ('05','08','01')
AND SPRADDR_ATYP_CODE = 'MA'
ORDER BY SPRADDR_PIDM, SPRADDR_SEQNO
GO
/* Get the individuals addresses for each term */
SELECT *
--INTO #ADDRESS_PT_2
FROM #ADDRESS_PT_1 X
LEFT JOIN RCCNTY C
ON C.COUNTY = X.CNTY
WHERE X.SEQNO = (SELECT MAX(A.SEQNO)
FROM #ADDRESS_PT_1 A
WHERE X.PIDM = A.PIDM
AND X.TERM = A.TERM)
--AND X.PIDM = 5678
ORDER BY X.PIDM, X.SEQNO
GO
The output from this is:
PIDM TERM SEQNO STATE CNTY BANNR_TERM COUNTY COUNTY_TITLE COUNTY_REGION COUNTY_REGION_TITLE
5678 201108 1 FL CLAY 201108 CLAY CLAY 2 Northeast Florida
5678 201308 3 FL ST J 201308 ST J ST. JOHNS 2 Northeast Florida
5678 201505 5 FL DUVA 201550 DUVA DUVAL 2 Duval County

I put in CTE the sample you provided. Then I OUTER APPLY (p) previous row with NOT NULL COUNTY, and another OUTER APPLY that gets the row with [SEQUENCE] = 1 for each ID. Instead of FROM cte in last OUTER APPLY use table (FROM SPRADDR) that have rows with [SEQUENCE] = 1 which might not be in CTE.
;WITH cte AS (
SELECT *
FROM (VALUES
(1234, 201308, null, null),
(1234, 201401, null, null),
(1234, 201408, 'ORANGE', 1),
(1234, 201501, null, null),
(1234, 201505, null, null),
(1234, 201508, 'OSCEOLA', 3),
(1234, 201601, null, null),
(5678, 201301, null, null),
(5678, 201305, null, null),
(5678, 201308, 'ST JOHNS', 3),
(5678, 201401, null, null),
(5678, 201405, null, null),
(5678, 201408, null, null),
(5678, 201501, null, null),
(5678, 201505, 'DUVAL', 4)
) as t(ID, TERM, COUNTY, [SEQUENCE])
)
SELECT c.ID,
c.TERM,
COALESCE(c.COUNTY,p.COUNTY,p1.COUNTY) as COUNTY,
c.[SEQUENCE]
FROM cte c
OUTER APPLY (
SELECT TOP 1 COUNTY
FROM cte
WHERE ID = c.ID
AND TERM < c.TERM
AND COUNTY IS NOT NULL
ORDER BY TERM DESC) as p
OUTER APPLY (
SELECT TOP 1 COUNTY
FROM cte
WHERE ID = c.ID
AND [SEQUENCE] = 1
ORDER BY TERM DESC) as p1
Will give you:
ID TERM COUNTY SEQUENCE
1234 201308 ORANGE NULL
1234 201401 ORANGE NULL
1234 201408 ORANGE 1
1234 201501 ORANGE NULL
1234 201505 ORANGE NULL
1234 201508 OSCEOLA 3
1234 201601 OSCEOLA NULL
5678 201301 NULL NULL
5678 201305 NULL NULL
5678 201308 ST JOHNS 3
5678 201401 ST JOHNS NULL
5678 201405 ST JOHNS NULL
5678 201408 ST JOHNS NULL
5678 201501 ST JOHNS NULL
5678 201505 DUVAL 4

See an example:
SELECT * into tbl_filltest FROM (
VALUES (1,Null),(2,Null),(3,5),(4,Null),(5,Null),(6,Null),(7,4),(8,Null),9,Null),(10,1)
) as t(c1,c2)
GO
SELECT * FROM tbl_filltest
GO
;WITH GoodValues as (SELECT * FROM tbl_filltest WHERE c2 is not null),
NullValues as (SELECT * FROM tbl_filltest WHERE c2 is null)
UPDATE n SET c2 = g1.c2 FROM GoodValues as g1
OUTER APPLY (SELECT MAX(c1) as Min_c1 FROM GoodValues as i WHERE g1.c1 > i.c1) as g2
INNER JOIN NullValues as n
ON n.c1 > IsNull(g2.Min_c1,0) and n.c1 < g1.c1
GO
SELECT * FROM tbl_filltest
GO

Related

Need help in rewriting the query using CASE statement

SELECT distinct
ID,
LOWER(IFF(REGEXP_COUNT(pos_id, '^[0-9]+$')= 1, NULL, pos_id)) as pos
FROM table1
WHERE date='2022-02-02'
AND pos_id is not null
AND id='12345';
When I run the above query, I'm getting results like
ID
POS
12345
894f4bb2597f
But when I run the query below where I have used CASE, I'm getting NULL values as well as NOT NULL values.
SELECT distinct
ID,
CASE WHEN REGEXP_COUNT(pos_id, '^[0-9]+$')= 1 AND pos_id IS NOT NULL THEN NULL ELSE pos_id
END as pos
FROM table1
WHERE date='2022-02-02'
AND pos_id is not null
AND id='12345';
ID
POS
12345
894f4bb2597f
12345
NULL
For one ID, I'm getting NULL as well not NULL values.
I need to remove pos_id is not null in the WHERE clause and add that in CASE statement.
How to rewrite this query using CASE statement by removing the condition - pos_id is not null from the WHERE clause?
I tried the below query using CASE statement but not getting the correct results:
SELECT distinct
ID,
CASE when REGEXP_COUNT(pos_id,'^[0-9]+$')=1 and
pos_id is not null
THEN null else pos_id end
FROM table1
WHERE date='2022-02-02';
When I use CASE, I'm getting the count as 1,455,345 ROWS
but when I use - LOWER(IFF(REGEXP_COUNT(pos_id, '^[0-9]+$')= 1, NULL, pos_id
I'm getting COUNT as 2768 rows
so the key point of you question is the final sentence:
When I use CASE, I'm getting the count as 1,455,345 ROWS but when I use - LOWER(IFF(REGEXP_COUNT(pos_id, '^[0-9]+$')= 1, NULL, pos_id I'm getting COUNT as 2768 rows
So take your SQL and pushing it together with some input trying to understand "why the results are different" etc etc.
And really you are asking why do I how more distinct values when I don't to lower then then when I do.
The point is case sensitive counts will always be same or great than case insensitive counts.
SELECT
column2 as pos_id,
LOWER(IFF(REGEXP_COUNT(pos_id, '^[0-9]+$')= 1, NULL, pos_id)) as pos_i,
CASE
WHEN REGEXP_COUNT(pos_id, '^[0-9]+$') = 1 AND pos_id IS NOT NULL THEN NULL
ELSE pos_id
END as pos_c
,lower(pos_c) as lower_pos_c
,count(distinct pos_i) over() as IFF_rows_count
,count(distinct pos_c) over() as CASE_rows_count
,count(distinct lower_pos_c) over() as LOWER_CASE_rows_count
FROM VALUES
(12345, '894f4bb2597f'),
(12346, '1234'),
(12346, null),
(123, 'aaa'),
(123, 'Aaa'),
(123, 'aAa'),
(123, 'aaA');
POS_ID
POS_I
POS_C
LOWER_POS_C
IFF_ROWS_COUNT
CASE_ROWS_COUNT
LOWER_CASE_ROWS_COUNT
894f4bb2597f
894f4bb2597f
894f4bb2597f
894f4bb2597f
2
5
2
1234
null
null
null
2
5
2
null
null
null
null
2
5
2
aaa
aaa
aaa
aaa
2
5
2
Aaa
aaa
Aaa
aaa
2
5
2
aAa
aaa
aAa
aaa
2
5
2
aaA
aaa
aaA
aaa
2
5
2

How to Group Rows together in SQL Server?

I need to group the same ID into one row and keep the row containing the most data.
If an ID group have no data, I still want 1 row returned. I have roughly 30 data columns.
Example:
ID
City
Country
Data A
Data B
Data30
1
City1
Country1
DataA1
DataB1
DataN1
1
City2
Country2
Null
Null
Null
1
City3
Country3
Null
Null
Null
2
City1
Country1
DataA1
DataB1
DataN1
2
City2
Country2
Null
Null
Null
2
City3
Country3
Null
Null
Null
3
City1
Country1
Null
Null
Null
3
City2
Country2
Null
Null
Null
3
City3
Country3
Null
Null
Null
Result:
ID
City
Country
Data A
Data B
Data30
1
City1
Country1
DataA1
DataB1
DataN1
2
City1
Country1
DataA1
DataB1
DataN1
3
City1
Country1
Null
Null
Null
Any suggestion would greatly be appreciated!
This is easy to do if you sum the number of non-null columns and then apply a row_number
with cte as (
select *, Row_Number() over(partition by id order by tot desc) rn
from (
select * , Iif(data1 is null,0,1) + Iif(data2 is null,0,1) + Iif(data30 is null,0,1) tot
from t
)x
)
select id, city, country, data1, data2, data30
from cte
where rn=1
See Working demo

How to aggregate several columns into a JSON file in HIVE and avoid nulls

user_id reservation_id nights price
--------------------------------------
AAA 10001 1 100
AAA 10002 1 120
BBB 20003 7 350
ccc 10005 150
DDD 10007 3
CCC 10006 5
to
user_id reservation_details
AAA [{"nights":"1", "price":"100"}, {"nights":"1","price":"120"}]
BBB [{"nights":"7", "price":"350"}]
CCC [{"price":"150"}, {"nights":"3"}]
DDD [{"nights":"5"}]
Here my query is
select user_id
,concat("
{",concat_ws(',',collect_list(concat(string(reservation_id),":
{'nights':",string(nights),",'price':",string(price),"}"))),"}") as
reservation_details
from mytable
group by user_id
I want to eliminate the columns which have value as nulls and convert that single quotes into double quotes which looks like a exact JSON.
Using in-built datatypes map and array along with a case expression to handle nulls.
select user_id,collect_list(map_nights_price)
from (select user_id,
case when nights is null then map('price',price)
when price is null then map('nights',nights)
else map('nights',nights,'price',price) end as map_nights_price
from mytable
where not (price is null and nights is null) --ignore row where price and nights are null
) t
group by user_id

Combining multiple rows in SQL with distinct identifier

I'm pulling some data from SQL Server from this table.
ID_Number Date_01 Date_02 Date_03 Date_04 Date_05
---------------------------------------------------------------------
1001 6/1/2015 6/5/2015 Null Null 6/6/2015
1001 Null Null 6/5/2015 Null 6/7/2015
1002 6/20/2015 Null Null 6/21/2015 Null
1002 6/21/2015 6/22/2015 6/23/2015 6/19/2015 6/20/2015
1003 6/25/2015 Null Null 6/26/2015 6/29/2015
I'm not sure what CTE query will I use to return only one row per ID and get the max date per column for each ID.
Here's the sample result:
ID_Number Date_01 Date_02 Date_03 Date_04 Date_05
----------------------------------------------------------------------
1001 6/1/2015 6/5/2015 6/5/2015 Null 6/7/2015
1002 6/21/2015 6/22/2015 6/23/2015 6/21/2015 6/20/2015
1003 6/25/2015 Null Null 6/26/2015 6/29/2015
You don't need CTE to do this, If am not wrong simple Group by with Max aggregate should work for you
select
ID_Number,
Date_01=max(Date_01),
Date_02=max(Date_02),
Date_03=max(Date_03),
Date_04=max(Date_04),
Date_05=max(Date_05)
from yourtable
group by ID_Number
max date per column for each ID
Grouping by ID_Number :
SELECT ID_Number AS Expr1, MAX(Date_01) AS Date_01, MAX(Date_02) AS Date_02, MAX(Date_03) AS Date_03, MAX(Date_04) AS Date_04, MAX(Date_05) AS Date_05
FROM ta1
GROUP BY ID_Number

replace NULL values with latest non-NULL value in resultset series (SQL Server 2008 R2)

for SQL Server 2008 R2
I have a resultset that looks like this (note [price] is numeric, NULL below represents a
NULL value, the result set is ordered by product_id and timestamp)
product timestamp price
------- ---------------- -----
5678 2008-01-01 12:00 12.34
5678 2008-01-01 12:01 NULL
5678 2008-01-01 12:02 NULL
5678 2008-01-01 12:03 23.45
5678 2008-01-01 12:04 NULL
I want to transform that to a result set that (essentially) copies a non-null value from the latest preceding row, to produce a resultset that looks like this:
product timestamp price
------- ---------------- -----
5678 2008-01-01 12:00 12.34
5678 2008-01-01 12:01 12.34
5678 2008-01-01 12:02 12.34
5678 2008-01-01 12:03 23.45
5678 2008-01-01 12:04 23.45
I don't find any aggregate/windowing function that will allow me to do this (again this ONLY needed for SQL Server 2008 R2.)
I was hoping to find an analytic aggregate function that do this for me, something like...
LAST_VALUE(price) OVER (PARTITION BY product_id ORDER BY timestamp)
But I don't seem to find any way to do a "cumulative latest non-null value" in the window (to bound the window to the preceding rows, rather than the entire partition)
Aside from creating a table-valued user defined function, is there any builtin that would accomplish this?
UPDATE:
Apparently, this functionality is available in the 'Denali' CTP, but not in SQL Server 2008 R2.
LAST_VALUE http://msdn.microsoft.com/en-us/library/hh231517%28v=SQL.110%29.aspx
I just expected it to be available in SQL Server 2008. It's available in Oracle (since 10gR2 at least), and I can do something similar in MySQL 5.1, using a local variable.
http://download.oracle.com/docs/cd/E14072_01/server.112/e10592/functions083.htm
You can try the following:
* Updated **
-- Test Data
DECLARE #YourTable TABLE(Product INT, Timestamp DATETIME, Price NUMERIC(16,4))
INSERT INTO #YourTable
SELECT 5678, '20080101 12:00:00', 12.34
UNION ALL
SELECT 5678, '20080101 12:01:00', NULL
UNION ALL
SELECT 5678, '20080101 12:02:00', NULL
UNION ALL
SELECT 5678, '20080101 12:03:00', 23.45
UNION ALL
SELECT 5678, '20080101 12:04:00', NULL
;WITH CTE AS
(
SELECT *
FROM #YourTable
)
-- Query
SELECT A.Product, A.Timestamp, ISNULL(A.Price,B.Price) Price
FROM CTE A
OUTER APPLY ( SELECT TOP 1 *
FROM CTE
WHERE Product = A.Product AND Timestamp < A.Timestamp
AND Price IS NOT NULL
ORDER BY Product, Timestamp DESC) B
--Results
Product Timestamp Price
5678 2008-01-01 12:00:00.000 12.3400
5678 2008-01-01 12:01:00.000 12.3400
5678 2008-01-01 12:02:00.000 12.3400
5678 2008-01-01 12:03:00.000 23.4500
5678 2008-01-01 12:04:00.000 23.4500
I have a table containing the following data. I want to update all nulls in salary columns with previous value without taking null value.
Table:
id name salary
1 A 4000
2 B
3 C
4 C
5 D 2000
6 E
7 E
8 F 1000
9 G 2000
10 G 3000
11 G 5000
12 G
here is the query that works for me.
select a.*,first_value(a.salary)over(partition by a.value order by a.id) as abc from
(
select *,sum(case when salary is null then 0 else 1 end)over(order by id) as value from test)a
output:
id name salary Value abc
1 A 4000 1 4000
2 B 1 4000
3 C 1 4000
4 C 1 4000
5 D 2000 2 2000
6 E 2 2000
7 E 2 2000
8 F 1000 3 1000
9 G 2000 4 2000
10 G 3000 5 3000
11 G 5000 6 5000
12 G 6 5000
Try this:
;WITH SortedData AS
(
SELECT
ProductID, TimeStamp, Price,
ROW_NUMBER() OVER(PARTITION BY ProductID ORDER BY TimeStamp DESC) AS 'RowNum'
FROM dbo.YourTable
)
UPDATE SortedData
SET Price = (SELECT TOP 1 Price
FROM SortedData sd2
WHERE sd2.RowNum > SortedData.RowNum
AND sd2.Price IS NOT NULL)
WHERE
SortedData.Price IS NULL
Basically, the CTE creates a list sorted by timestamp (descending) - the newest first. Whenever a NULL is found, the next row that contains a NOT NULL price will be found and that value is used to update the row with the NULL price.

Resources