Max value of collect_list(column) in Hive - database

I am using below command in Hive. and getting correct result.
select acct_id,collect_list(expr_dt) from experiences
> group by acct_id;
Output:
900 ["2015-03-31"]
707 ["2015-03-31","2014-12-10"]
903 ["2015-03-31"]
-435 ["2015-03-31"]
718 ["2015-03-31","2014-06-03"]
I want to get the max date for each account.
When I am trying execute below query I am getting error.
select acct_id,max(collect_list(expr_dt)) from experiences
> group by acct_id;
and the error is -
SemanticException [Error 10128]: Line 1:19 Not yet supported place for
UDAF 'collect_list'
I want to do total operation in a single query.

You can go with max without collect_list if your goal is to only find out max expr_dt for each acct_id group
input:
hive> select * from experiences;
OK
900 2015-03-31
707 2015-03-31
707 2014-12-10
903 2015-03-31
-435 2015-03-31
718 2015-03-31
718 2014-06-03
query:
hive> select acct_id,max(expr_dt) from experiences group by acct_id;
output:
Total MapReduce CPU Time Spent: 4 seconds 30 msec
OK
-435 2015-03-31
707 2015-03-31
718 2015-03-31
900 2015-03-31
903 2015-03-31

Related

Google Data Studio: Compare daily sales to 7-day average

I have a data source with daily sales per product.
I want to create a field that calculates the average daily sales for the 7 last days, for each product and day (e.g. on day 10 for product A, it will give me the average sales for product A on days 3 - 9; on Day 15 for product B, I'll see the average sales of B on days 8 - 14).
Is this possible?
Example data (I have the first 3 columns. need to generate the fourth)
Date Product Sales 7-Day Average
1/11 A 983 201
2/11 A 650 983
3/11 A 328 817
4/11 A 728 654
5/11 A 246 672
6/11 A 613 587
7/11 A 575 591
8/11 A 601 589
9/11 A 462 534
10/11 A 979 508
11/11 A 148 601
12/11 A 238 518
13/11 A 53 517
14/11 A 500 437
15/11 A 684 426
16/11 A 261 438
17/11 A 69 409
18/11 A 159 279
19/11 A 964 281
20/11 A 429 384
21/11 A 731 438
1/11 B 790 471
2/11 B 265 486
3/11 B 94 487
4/11 B 66 490
5/11 B 124 477
6/11 B 555 357
7/11 B 190 375
8/11 B 232 298
9/11 B 747 218
10/11 B 557 287
11/11 B 432 353
12/11 B 526 405
13/11 B 690 463
14/11 B 350 482
15/11 B 512 505
16/11 B 273 545
17/11 B 679 477
18/11 B 164 495
19/11 B 799 456
20/11 B 749 495
21/11 B 391 504
Haven't really tried anything. Couldn't figure out how to do get started with this)
This may not be the super perfect solution but it does give your expected result in a crude way.
Cross-join the same data source first as shown in the screenshot
Use the calculated field to get the last 7 day average
(CASE WHEN Date (Table 2) BETWEEN DATETIME_SUB(Date (Table 1), INTERVAL 7 DAY) AND DATETIME_SUB(Date (Table 1), INTERVAL 1 DAY) THEN Sales (Table 2) ELSE 0 END)/7
-

SQL Get minimum hour from multiple datetimes registers

I need display INFO column, having the minimum hour for each date in REGISTRATION column, one for LOG
Log CAT INFO REGISTRATION
10 1 551203 2018-06-04 08:47:54.000
10 1 551549 2018-06-05 08:59:02.000
579 1 551675 2018-06-05 10:13:36.000
579 1 553681 2018-06-05 11:31:44.000
579 1 551707 2018-06-05 12:57:33.000
579 1 551364 2018-06-04 10:16:04.000
579 1 551378 2018-06-04 10:39:01.000
579 1 551379 2018-06-04 10:40:22.000
579 1 551406 2018-06-04 15:47:52.000
580 1 550922 2018-06-04 11:21:01.000
580 1 551001 2018-06-04 12:43:22.000
580 1 553321 2018-06-04 15:37:52.000
exactly this, where each INFO are the minimum hour of each date, of each LOG
INFO
551203 -->(2018-06-04 08:47:54.000)
551675 -->(2018-06-05 10:13:36.000)
551364 -->(2018-06-04 10:16:04.000)
550922 -->(2018-06-04 11:21:01.000)
thanks!!
Assuming that info values appear in increasing order then I believe this is what you're looking for:
select min(info) as info, min(registration) as registration
from log
group by log, cast(registration as date);
Or just use row_number() to avoid making that assumption:
with data as (
select *,
row_number() over
(partition by log, cast(registration as date) order by registration) as rn
from log
)
select * from data where rn = 1;

Sql command: How to get rows from specified value?

Please advise SQL command.
I have a table with 3 columns: Data, Quantity, Price
But number of rows about thousand.
I have exact number of rows (for example only 5), which I want to pickup from this table (see below).
So I want to collect data after "06.02.2013" (if this date not in table, possible to take next nearest date after this date, it will be 11.02.2013),
and collect 5 rows after this date (result see below)
table_Prices:
Date Qty Price
-----------------------
01.02.2013 24 1025
06.02.2013 26 1150
11.02.2013 47 2014
16.02.2013 5 1025
21.02.2013 7 1023
26.02.2013 8 1025
03.03.2013 95 1203
08.03.2013 63 1203
13.03.2013 25 2012
18.03.2013 48 1032
23.03.2013 105 1253
28.03.2013 48 1452
Desired result:
06.02.2013 26 1150
11.02.2013 47 2014
16.02.2013 5 1025
21.02.2013 7 1023
26.02.2013 8 1025
select top 5 *
from table_Prices
where Date > cast('06-02-2013' as datetime)
order by Date asc

Predicted Values From Forecast functions

My question is very simple.
library(fpp)
ts <- ausbeer # seasonal with period 4
f.seasonal <-snaive(ts, h = 20)
I would like to see what the beer production is in the third quarter of 2010. I can do
f.seasonal$mean
It returns the table:
Qtr1 Qtr2 Qtr3 Qtr4
2008 473
2009 420 390 410 473
2010 420 390 410 473
2011 420 390 410 473
2012 420 390 410 473
2013 420 390 410
Obviously, I can see the answer in the table. Is there a snippet of code to retain the predicted value easier from forecast objects?
fc <- window(f.seasonal$mean, start=c(2010,3), end=c(2010,3))

Aligning Data in SQL

I am using Sybase SQL.
I have two tables.
Table A:
Column1_A:
100
501
504
810
810
950
955
955
Table B:
Column1_B:
100
250
503
810
807
949
950
955
955
I want to achieve the following:
Column1_A Column1_B
100 NULL
501 250
504 503
810 503
810 503
950 949
955 950
955 950
So, basically I want to align the Column1_B from Table B to Column1_A from Table A so that maximum of Column1_B is less than Column1_A for each row. It should give NULL if there is no such element in Table B
The values in the Column1_A or Column1_B are for illustration only. The real values are like 1000, 1500, 2504, and they not necessarily the values in Column1_B are Column1_A - 1.
Edit:
I modified the data so that logic can be generalized. I am using Sybase SQL.
Sorry but it's not clear for me what you want to obtain. But final result that you presented could be obtained by:
SELECT Column1_A, Column1_B FROM A
LEFT JOIN B ON Column1_A = Column1_B -1
Edit.
You might try a correlated subquery then:
SELECT Column1_A a, (SELECT MAX(Column1_B) FROM B where Column1_B < a) FROM A

Resources