How to aggregate several columns into a JSON file in HIVE and avoid nulls - arrays

user_id reservation_id nights price
--------------------------------------
AAA 10001 1 100
AAA 10002 1 120
BBB 20003 7 350
ccc 10005 150
DDD 10007 3
CCC 10006 5
to
user_id reservation_details
AAA [{"nights":"1", "price":"100"}, {"nights":"1","price":"120"}]
BBB [{"nights":"7", "price":"350"}]
CCC [{"price":"150"}, {"nights":"3"}]
DDD [{"nights":"5"}]
Here my query is
select user_id
,concat("
{",concat_ws(',',collect_list(concat(string(reservation_id),":
{'nights':",string(nights),",'price':",string(price),"}"))),"}") as
reservation_details
from mytable
group by user_id
I want to eliminate the columns which have value as nulls and convert that single quotes into double quotes which looks like a exact JSON.

Using in-built datatypes map and array along with a case expression to handle nulls.
select user_id,collect_list(map_nights_price)
from (select user_id,
case when nights is null then map('price',price)
when price is null then map('nights',nights)
else map('nights',nights,'price',price) end as map_nights_price
from mytable
where not (price is null and nights is null) --ignore row where price and nights are null
) t
group by user_id

Related

SQL Server select join detect if common column between two tables are different

I am trying to write a function to check between two tables which have a common column with the same name and ID values.
Table 1: CompanyRecords
CompanyRecordsID CompanyId CompanyName CompanyProcessID
-----------------------------------------------------------
1 222 Sears 123
2 333 JCPenny 456
Table 2: JointCompanies
JointCompaniesID CompanyId CompanyName ComanyProcessID
-----------------------------------------------------------
3 222 KMart 123
4 444 Walmart 001
They both use the same foreign key CompanyProcessID with value 123.
How do I write a select statement when it is passed the CompanyProcessID to tell if the CompanyId has changed for the same CompanyProcessId.
I assume it is a join between the two tables with WHERE CompanyProcessID
Thanks for any help.
Is this what you want?
select max(case when cr.name = jc.name then 0 else 1 end) as name_not_same
from CompanyRecords cr join
JointCompanies jc
on cr.ComanyProcessID = jc.ComanyProcessID
where cr.ComanyProcessID = ?

Get last row by datetime in SQL Server

I have below SQL table:
Id | Code | DateTime1 | DateTime2
1 3AA2 2017-02-01 14:23:00.000 2017-02-01 20:00:00.000
2 E323 2017-02-12 17:34:34.032 2017-02-12 18:34:34.032
3 DFG3 2017-03-08 09:20:10.032 2017-03-08 12:30:10.032
4 LKF0 2017-04-24 11:14:00.000 2017-04-24 13:40:00.000
5 DFG3 2017-04-20 13:34:42.132 2017-04-20 15:12:12.132
6 DFG3 2017-04-20 13:34:42.132 NULL
Id is an auto numeric field.
Code is string and Datetime1 and DateTime2 are datetime type. Also DateTime1 cannot be null but datetime2 can be.
I would like to obtain the last row by datetime1 (MAX datetime1, most recent one) that match a concrete code and it has datetime2 set to NULL.
For example, taken into account above table, for code DFG3 I would like to obtain row with Id=6, its max date for datetime1, that is "2017-04-20 13:34:42.132"
But now imagine the following case:
Id | Code | DateTime1 | DateTime2
1 3AA2 2017-02-01 14:23:00.000 2017-02-01 20:00:00.000
2 E323 2017-02-12 17:34:34.032 2017-02-12 18:34:34.032
3 DFG3 2017-03-08 09:20:10.032 2017-03-08 12:30:10.032
4 LKF0 2017-04-24 11:14:00.000 2017-04-24 13:40:00.000
5 DFG3 2017-04-20 13:34:42.132 NULL
6 DFG3 2017-05-02 16:34:34.032 2017-05-02 21:00:00.032
Again, taken into account above table, I would like to obtain the same, that is, the last row by datetime1 (Max datetime1, most recent one) that match a concrete code and it has datetime2 set to NULL.
Then, in this last case for code DFG3 no rows must be return because row with Id=6 is the last by datetime1 (most recent) for code DFG3 but is not NULL.
How can I do this?
Can you try this query and let me know if it works for your case
Select * From [TableName] where [Code]='DFG3' and [datetime2] is null and [datetime1] = (select max([datetime1]) from [TableName] where [Code]='DFG3')
This bring you all the latest code on your table, then you select only the one with datetime2 is null
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Code
ORDER BY DateTime1 Desc) as rn
FROM yourTable
) as T
WHERE rn = 1 -- The row with latest date for each code will have 1
and dateTime2 IS NULL
and code = 'DFG3' -- OPTIONAL

Filter some records based on date range

I have below tables structures,
Trans Table:
Trans_Id(PK) User_Id(FK) Arroved_Date
________________________________________________
1 101 05-06-2016
2 101 12-06-2016
3 101 20-06-2016
4 102 06-06-2016
5 103 10-06-2016
6 103 25-06-2016
Table2:
Id(Pk) User_Id(Fk) Start_Date End_Date Is_Revoked
_________________________________________________________________________
1 101 01-06-2016 15-06-2016 1
2 101 10-06-2016 15-06-2016 0
3 103 05-06-2016 20-06-2016 0
I want to filter out the transaction, if the Approved_Date is not between the users Start_Date and End_Date of table2.
If is_revoked = 1 then it should not consider.
Expected Result:
Trans_Id
________
1
3
4
6
Try this
SELECT A.Trans_ID
FROM TransTable A JOIN Table2 B
ON (A.User_Id = B.User_Id)
WHERE B.Is_Revoked = 1 OR A.Approved_Date NOT BETWEEN B.Start_Date AND B.End_Date
This resolves you,
SELECT Trans_Id FROM TRANS AS A JOIN TABLE2 AS B ON A.[USER_ID]=B.[USER_ID]
WHERE B.IS_REVOKED=0 AND ARROVED_DATE NOT BETWEEN B.START_DATE AND B.END_DATE
there are basically two condition you want to meet.
if is_revoked is equal to 1 then ignore the row from the result..
if Approved_Date is between start_date and end_date then dont consider then also ignore this from the result.
So this can be easily done using join.
SELECT A.Trans_ID
FROM TransTable A, Table2 B
WHERE B.Is_Revoked != 1 AND A.Approved_Date NOT BETWEEN B.Start_Date AND
B.End_Date
As far as I understand from the expected result that you mentioned. you really dont' care if the User_id matches or not.

Sybase ASE Transpose Indefinite Number of Rows to Columns

I have tried doing pivot tables but only for fixed number of rows.
I have the following records:
ID CODE
=== ====
1 AAA
1 BBB
1 CCC
2 DDD
3 EEE
3 FFF
4 GGG
4 HHH
4 III
4 JJJ
And my expected result is:
ID CODE1 CODE2 CODE3 CODE4
=== ===== ===== ===== =====
1 AAA BBB CCC
2 DDD
3 EEE FFF
4 GGG HHH III JJJ
Take note that the number of rows returned per id is not fixed. I want to avoid cursor as much as possible.
To do it without a loop, you need to adding an artificial row number, for example through an identity column. If you do not want to change your schema, copy the whole table into a temp table first.
(I didn't check for syntax errors but you'll get the idea)
alter table yourtab add seq int identity not null
select id, min_seq=min(seq) into #t from yourtab group by id
select id=max(id), code1=max(code1), code2=max(code2),
code3=max(code3), (etc) from ( select id = yourtab.id,
code1=case (yourtab.id-#t.min_seq) when 0 then code else null end,
code2=case (yourtab.id-#t.min_seq) when 1 then code else null end,
code3=case (yourtab.id-#t.min_seq) when 2 then code else null end,
[...etc...]
from yourtab, #t where yourtab.id = #t.id order by
yourtab.id ) as newtab

replace NULL values with latest non-NULL value in resultset series (SQL Server 2008 R2)

for SQL Server 2008 R2
I have a resultset that looks like this (note [price] is numeric, NULL below represents a
NULL value, the result set is ordered by product_id and timestamp)
product timestamp price
------- ---------------- -----
5678 2008-01-01 12:00 12.34
5678 2008-01-01 12:01 NULL
5678 2008-01-01 12:02 NULL
5678 2008-01-01 12:03 23.45
5678 2008-01-01 12:04 NULL
I want to transform that to a result set that (essentially) copies a non-null value from the latest preceding row, to produce a resultset that looks like this:
product timestamp price
------- ---------------- -----
5678 2008-01-01 12:00 12.34
5678 2008-01-01 12:01 12.34
5678 2008-01-01 12:02 12.34
5678 2008-01-01 12:03 23.45
5678 2008-01-01 12:04 23.45
I don't find any aggregate/windowing function that will allow me to do this (again this ONLY needed for SQL Server 2008 R2.)
I was hoping to find an analytic aggregate function that do this for me, something like...
LAST_VALUE(price) OVER (PARTITION BY product_id ORDER BY timestamp)
But I don't seem to find any way to do a "cumulative latest non-null value" in the window (to bound the window to the preceding rows, rather than the entire partition)
Aside from creating a table-valued user defined function, is there any builtin that would accomplish this?
UPDATE:
Apparently, this functionality is available in the 'Denali' CTP, but not in SQL Server 2008 R2.
LAST_VALUE http://msdn.microsoft.com/en-us/library/hh231517%28v=SQL.110%29.aspx
I just expected it to be available in SQL Server 2008. It's available in Oracle (since 10gR2 at least), and I can do something similar in MySQL 5.1, using a local variable.
http://download.oracle.com/docs/cd/E14072_01/server.112/e10592/functions083.htm
You can try the following:
* Updated **
-- Test Data
DECLARE #YourTable TABLE(Product INT, Timestamp DATETIME, Price NUMERIC(16,4))
INSERT INTO #YourTable
SELECT 5678, '20080101 12:00:00', 12.34
UNION ALL
SELECT 5678, '20080101 12:01:00', NULL
UNION ALL
SELECT 5678, '20080101 12:02:00', NULL
UNION ALL
SELECT 5678, '20080101 12:03:00', 23.45
UNION ALL
SELECT 5678, '20080101 12:04:00', NULL
;WITH CTE AS
(
SELECT *
FROM #YourTable
)
-- Query
SELECT A.Product, A.Timestamp, ISNULL(A.Price,B.Price) Price
FROM CTE A
OUTER APPLY ( SELECT TOP 1 *
FROM CTE
WHERE Product = A.Product AND Timestamp < A.Timestamp
AND Price IS NOT NULL
ORDER BY Product, Timestamp DESC) B
--Results
Product Timestamp Price
5678 2008-01-01 12:00:00.000 12.3400
5678 2008-01-01 12:01:00.000 12.3400
5678 2008-01-01 12:02:00.000 12.3400
5678 2008-01-01 12:03:00.000 23.4500
5678 2008-01-01 12:04:00.000 23.4500
I have a table containing the following data. I want to update all nulls in salary columns with previous value without taking null value.
Table:
id name salary
1 A 4000
2 B
3 C
4 C
5 D 2000
6 E
7 E
8 F 1000
9 G 2000
10 G 3000
11 G 5000
12 G
here is the query that works for me.
select a.*,first_value(a.salary)over(partition by a.value order by a.id) as abc from
(
select *,sum(case when salary is null then 0 else 1 end)over(order by id) as value from test)a
output:
id name salary Value abc
1 A 4000 1 4000
2 B 1 4000
3 C 1 4000
4 C 1 4000
5 D 2000 2 2000
6 E 2 2000
7 E 2 2000
8 F 1000 3 1000
9 G 2000 4 2000
10 G 3000 5 3000
11 G 5000 6 5000
12 G 6 5000
Try this:
;WITH SortedData AS
(
SELECT
ProductID, TimeStamp, Price,
ROW_NUMBER() OVER(PARTITION BY ProductID ORDER BY TimeStamp DESC) AS 'RowNum'
FROM dbo.YourTable
)
UPDATE SortedData
SET Price = (SELECT TOP 1 Price
FROM SortedData sd2
WHERE sd2.RowNum > SortedData.RowNum
AND sd2.Price IS NOT NULL)
WHERE
SortedData.Price IS NULL
Basically, the CTE creates a list sorted by timestamp (descending) - the newest first. Whenever a NULL is found, the next row that contains a NOT NULL price will be found and that value is used to update the row with the NULL price.

Resources