How to aggregate several columns into a JSON file in HIVE and avoid nulls

How to aggregate several columns into a JSON file in HIVE and avoid nulls - arrays

user_id reservation_id nights price
--------------------------------------
AAA 10001 1 100
AAA 10002 1 120
BBB 20003 7 350
ccc 10005 150
DDD 10007 3
CCC 10006 5
to
user_id reservation_details
AAA [{"nights":"1", "price":"100"}, {"nights":"1","price":"120"}]
BBB [{"nights":"7", "price":"350"}]
CCC [{"price":"150"}, {"nights":"3"}]
DDD [{"nights":"5"}]
Here my query is
select user_id
,concat("
{",concat_ws(',',collect_list(concat(string(reservation_id),":
{'nights':",string(nights),",'price':",string(price),"}"))),"}") as
reservation_details
from mytable
group by user_id
I want to eliminate the columns which have value as nulls and convert that single quotes into double quotes which looks like a exact JSON.

Using in-built datatypes map and array along with a case expression to handle nulls.
select user_id,collect_list(map_nights_price)
from (select user_id,
case when nights is null then map('price',price)
when price is null then map('nights',nights)
else map('nights',nights,'price',price) end as map_nights_price
from mytable
where not (price is null and nights is null) --ignore row where price and nights are null
) t
group by user_id

Related

SQL Server select join detect if common column between two tables are different

I am trying to write a function to check between two tables which have a common column with the same name and ID values.
Table 1: CompanyRecords
CompanyRecordsID CompanyId CompanyName CompanyProcessID
-----------------------------------------------------------
1 222 Sears 123
2 333 JCPenny 456
Table 2: JointCompanies
JointCompaniesID CompanyId CompanyName ComanyProcessID
-----------------------------------------------------------
3 222 KMart 123
4 444 Walmart 001
They both use the same foreign key CompanyProcessID with value 123.
How do I write a select statement when it is passed the CompanyProcessID to tell if the CompanyId has changed for the same CompanyProcessId.
I assume it is a join between the two tables with WHERE CompanyProcessID
Thanks for any help.

Is this what you want?
select max(case when cr.name = jc.name then 0 else 1 end) as name_not_same
from CompanyRecords cr join
JointCompanies jc
on cr.ComanyProcessID = jc.ComanyProcessID
where cr.ComanyProcessID = ?

Get last row by datetime in SQL Server

I have below SQL table:
Id | Code | DateTime1 | DateTime2
1 3AA2 2017-02-01 14:23:00.000 2017-02-01 20:00:00.000
2 E323 2017-02-12 17:34:34.032 2017-02-12 18:34:34.032
3 DFG3 2017-03-08 09:20:10.032 2017-03-08 12:30:10.032
4 LKF0 2017-04-24 11:14:00.000 2017-04-24 13:40:00.000
5 DFG3 2017-04-20 13:34:42.132 2017-04-20 15:12:12.132
6 DFG3 2017-04-20 13:34:42.132 NULL
Id is an auto numeric field.
Code is string and Datetime1 and DateTime2 are datetime type. Also DateTime1 cannot be null but datetime2 can be.
I would like to obtain the last row by datetime1 (MAX datetime1, most recent one) that match a concrete code and it has datetime2 set to NULL.
For example, taken into account above table, for code DFG3 I would like to obtain row with Id=6, its max date for datetime1, that is "2017-04-20 13:34:42.132"
But now imagine the following case:
Id | Code | DateTime1 | DateTime2
1 3AA2 2017-02-01 14:23:00.000 2017-02-01 20:00:00.000
2 E323 2017-02-12 17:34:34.032 2017-02-12 18:34:34.032
3 DFG3 2017-03-08 09:20:10.032 2017-03-08 12:30:10.032
4 LKF0 2017-04-24 11:14:00.000 2017-04-24 13:40:00.000
5 DFG3 2017-04-20 13:34:42.132 NULL
6 DFG3 2017-05-02 16:34:34.032 2017-05-02 21:00:00.032
Again, taken into account above table, I would like to obtain the same, that is, the last row by datetime1 (Max datetime1, most recent one) that match a concrete code and it has datetime2 set to NULL.
Then, in this last case for code DFG3 no rows must be return because row with Id=6 is the last by datetime1 (most recent) for code DFG3 but is not NULL.
How can I do this?

Can you try this query and let me know if it works for your case
Select * From [TableName] where [Code]='DFG3' and [datetime2] is null and [datetime1] = (select max([datetime1]) from [TableName] where [Code]='DFG3')

This bring you all the latest code on your table, then you select only the one with datetime2 is null
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Code
ORDER BY DateTime1 Desc) as rn
FROM yourTable
) as T
WHERE rn = 1 -- The row with latest date for each code will have 1
and dateTime2 IS NULL
and code = 'DFG3' -- OPTIONAL

Filter some records based on date range

I have below tables structures,
Trans Table:
Trans_Id(PK) User_Id(FK) Arroved_Date
________________________________________________
1 101 05-06-2016
2 101 12-06-2016
3 101 20-06-2016
4 102 06-06-2016
5 103 10-06-2016
6 103 25-06-2016
Table2:
Id(Pk) User_Id(Fk) Start_Date End_Date Is_Revoked
_________________________________________________________________________
1 101 01-06-2016 15-06-2016 1
2 101 10-06-2016 15-06-2016 0
3 103 05-06-2016 20-06-2016 0
I want to filter out the transaction, if the Approved_Date is not between the users Start_Date and End_Date of table2.
If is_revoked = 1 then it should not consider.
Expected Result:
Trans_Id
________
1
3
4
6

Try this
SELECT A.Trans_ID
FROM TransTable A JOIN Table2 B
ON (A.User_Id = B.User_Id)
WHERE B.Is_Revoked = 1 OR A.Approved_Date NOT BETWEEN B.Start_Date AND B.End_Date

This resolves you,
SELECT Trans_Id FROM TRANS AS A JOIN TABLE2 AS B ON A.[USER_ID]=B.[USER_ID]
WHERE B.IS_REVOKED=0 AND ARROVED_DATE NOT BETWEEN B.START_DATE AND B.END_DATE

there are basically two condition you want to meet.
if is_revoked is equal to 1 then ignore the row from the result..
if Approved_Date is between start_date and end_date then dont consider then also ignore this from the result.
So this can be easily done using join.
SELECT A.Trans_ID
FROM TransTable A, Table2 B
WHERE B.Is_Revoked != 1 AND A.Approved_Date NOT BETWEEN B.Start_Date AND
B.End_Date
As far as I understand from the expected result that you mentioned. you really dont' care if the User_id matches or not.

Sybase ASE Transpose Indefinite Number of Rows to Columns

I have tried doing pivot tables but only for fixed number of rows.
I have the following records:
ID CODE
=== ====
1 AAA
1 BBB
1 CCC
2 DDD
3 EEE
3 FFF
4 GGG
4 HHH
4 III
4 JJJ
And my expected result is:
ID CODE1 CODE2 CODE3 CODE4
=== ===== ===== ===== =====
1 AAA BBB CCC
2 DDD
3 EEE FFF
4 GGG HHH III JJJ
Take note that the number of rows returned per id is not fixed. I want to avoid cursor as much as possible.

To do it without a loop, you need to adding an artificial row number, for example through an identity column. If you do not want to change your schema, copy the whole table into a temp table first.
(I didn't check for syntax errors but you'll get the idea)
alter table yourtab add seq int identity not null
select id, min_seq=min(seq) into #t from yourtab group by id
select id=max(id), code1=max(code1), code2=max(code2),
code3=max(code3), (etc) from ( select id = yourtab.id,
code1=case (yourtab.id-#t.min_seq) when 0 then code else null end,
code2=case (yourtab.id-#t.min_seq) when 1 then code else null end,
code3=case (yourtab.id-#t.min_seq) when 2 then code else null end,
[...etc...]
from yourtab, #t where yourtab.id = #t.id order by
yourtab.id ) as newtab

replace NULL values with latest non-NULL value in resultset series (SQL Server 2008 R2)

for SQL Server 2008 R2
I have a resultset that looks like this (note [price] is numeric, NULL below represents a
NULL value, the result set is ordered by product_id and timestamp)
product timestamp price
------- ---------------- -----
5678 2008-01-01 12:00 12.34
5678 2008-01-01 12:01 NULL
5678 2008-01-01 12:02 NULL
5678 2008-01-01 12:03 23.45
5678 2008-01-01 12:04 NULL
I want to transform that to a result set that (essentially) copies a non-null value from the latest preceding row, to produce a resultset that looks like this:
product timestamp price
------- ---------------- -----
5678 2008-01-01 12:00 12.34
5678 2008-01-01 12:01 12.34
5678 2008-01-01 12:02 12.34
5678 2008-01-01 12:03 23.45
5678 2008-01-01 12:04 23.45
I don't find any aggregate/windowing function that will allow me to do this (again this ONLY needed for SQL Server 2008 R2.)
I was hoping to find an analytic aggregate function that do this for me, something like...
LAST_VALUE(price) OVER (PARTITION BY product_id ORDER BY timestamp)
But I don't seem to find any way to do a "cumulative latest non-null value" in the window (to bound the window to the preceding rows, rather than the entire partition)
Aside from creating a table-valued user defined function, is there any builtin that would accomplish this?
UPDATE:
Apparently, this functionality is available in the 'Denali' CTP, but not in SQL Server 2008 R2.
LAST_VALUE http://msdn.microsoft.com/en-us/library/hh231517%28v=SQL.110%29.aspx
I just expected it to be available in SQL Server 2008. It's available in Oracle (since 10gR2 at least), and I can do something similar in MySQL 5.1, using a local variable.
http://download.oracle.com/docs/cd/E14072_01/server.112/e10592/functions083.htm

You can try the following:
* Updated **
-- Test Data
DECLARE #YourTable TABLE(Product INT, Timestamp DATETIME, Price NUMERIC(16,4))
INSERT INTO #YourTable
SELECT 5678, '20080101 12:00:00', 12.34
UNION ALL
SELECT 5678, '20080101 12:01:00', NULL
UNION ALL
SELECT 5678, '20080101 12:02:00', NULL
UNION ALL
SELECT 5678, '20080101 12:03:00', 23.45
UNION ALL
SELECT 5678, '20080101 12:04:00', NULL
;WITH CTE AS
(
SELECT *
FROM #YourTable
)
-- Query
SELECT A.Product, A.Timestamp, ISNULL(A.Price,B.Price) Price
FROM CTE A
OUTER APPLY ( SELECT TOP 1 *
FROM CTE
WHERE Product = A.Product AND Timestamp < A.Timestamp
AND Price IS NOT NULL
ORDER BY Product, Timestamp DESC) B
--Results
Product Timestamp Price
5678 2008-01-01 12:00:00.000 12.3400
5678 2008-01-01 12:01:00.000 12.3400
5678 2008-01-01 12:02:00.000 12.3400
5678 2008-01-01 12:03:00.000 23.4500
5678 2008-01-01 12:04:00.000 23.4500

I have a table containing the following data. I want to update all nulls in salary columns with previous value without taking null value.
Table:
id name salary
1 A 4000
2 B
3 C
4 C
5 D 2000
6 E
7 E
8 F 1000
9 G 2000
10 G 3000
11 G 5000
12 G
here is the query that works for me.
select a.*,first_value(a.salary)over(partition by a.value order by a.id) as abc from
(
select *,sum(case when salary is null then 0 else 1 end)over(order by id) as value from test)a
output:
id name salary Value abc
1 A 4000 1 4000
2 B 1 4000
3 C 1 4000
4 C 1 4000
5 D 2000 2 2000
6 E 2 2000
7 E 2 2000
8 F 1000 3 1000
9 G 2000 4 2000
10 G 3000 5 3000
11 G 5000 6 5000
12 G 6 5000

Try this:
;WITH SortedData AS
(
SELECT
ProductID, TimeStamp, Price,
ROW_NUMBER() OVER(PARTITION BY ProductID ORDER BY TimeStamp DESC) AS 'RowNum'
FROM dbo.YourTable
)
UPDATE SortedData
SET Price = (SELECT TOP 1 Price
FROM SortedData sd2
WHERE sd2.RowNum > SortedData.RowNum
AND sd2.Price IS NOT NULL)
WHERE
SortedData.Price IS NULL
Basically, the CTE creates a list sorted by timestamp (descending) - the newest first. Whenever a NULL is found, the next row that contains a NOT NULL price will be found and that value is used to update the row with the NULL price.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to aggregate several columns into a JSON file in HIVE and avoid nulls - arrays

Related

SQL Server select join detect if common column between two tables are different

Get last row by datetime in SQL Server

Filter some records based on date range

Sybase ASE Transpose Indefinite Number of Rows to Columns

replace NULL values with latest non-NULL value in resultset series (SQL Server 2008 R2)

Categories

Resources