indexOf alike function in HIVEQL - arrays

I have the following query:
SELECT
col1,
case when array_contains(col1, "c") then "c exists" end as col2
FROM
(
SELECT
*
FROM
(
SELECT
array("a","b","c") AS col1
) q1
) q2;
I want to check if the element "c" appears just before the element "b" in the array. In JavaScript I could use indexOf(), so if there were something similar in HiveQL I would do something like case when col1.indexOf("b") = col1.indexOf("c") - 1.
I have read the documentation, and it seems that the functions dealing with arrays are minimal.
I wouldn't like to split the array and check with LAG or LEAD.
I have tried with field("c", concat_ws(',',col1)) but this seems not to work neither.

You can concatenate array and use like or rlike. Example:
SELECT concat_ws(',',col1) rlike 'c,b' as c_before_b_flag
FROM
(
SELECT
array("a","b","c") AS col1
) q1
Result:
false
rlike 'b,c' gives true

Related

Replacing elements in an array column in snowflake

I have sample data as follows;
team_id
mode
123
[1,2]
Here mode is an array.The goal is to replace the values in column mode by literal values, such as 1 stands for Ocean, and 2 stands for Air
Expected Output
team_id
mode
123
[Ocean,Air]
Present Approach
As an attempt, I tried to first flatten the data into multiple rows;
team_id
mode
123
1
123
2
Then we can define a new column assigning literal values to mode column using a case statement, followed by aggregating the values into an array to get desired output.
Can I get some help here to do the replacement directly in the array? Thanks in advance.
Using FLATTEN and ARRAY_AGG:
CREATE OR REPLACE TABLE tab(team_id INT, mode ARRAY) AS SELECT 123, [1,2];
SELECT TEAM_ID,
ARRAY_AGG(CASE f.value::TEXT
WHEN 1 THEN 'Ocean'
WHEN 2 THEN 'Air'
ELSE 'Unknown'
END) WITHIN GROUP(ORDER BY f.index) AS new_mode
FROM tab
,LATERAL FLATTEN(tab.mode) AS f
GROUP BY TEAM_ID;
Output:
TEAM_ID
NEW_MODE
123
[ "Ocean", "Air" ]
For an alternative solution with easy array manipulation. you could create a JS UDF:
create or replace function replace_vals_in_array(A variant)
returns variant
language javascript
as $$
dict = {1:'a', 2:'b', 3:'c', 4:'d'};
return A.map(x => dict[x]);
$$;
Then to update your table:
update arrs
set arr = replace_vals_in_array(arr);
Example setup:
create or replace temp table arrs as (
select 1 id, [1,2,3] arr
union all select 2, [2,4]
);
select *, replace_vals_in_array(arr)
from arrs;

Determine how similar two arrays are in PostgreSQL

I'm aware you can compare two arrays in PostgreSQL to see if the elements in one are contained in the elements of another like so,
SELECT ARRAY[1,2] <# ARRAY[1,2,3] --> true
Is there any way to get # of matches or say "if matches 2 of 3" ??
SELECT ARRAY[1,2] ?? ARRAY[1,2,3] --> 2/3 or 66.6666%
I'm open to interesting solutions.. I want to take an array and ultimately say it must match 2 of 3 elements from another array in an inline query.. or >= 66% or something of that nature.
Ideally like this..
SELECT * FROM SOMETABLE WHERE ARRAY[1,2] ?? ARRAY[1,2,3] >= 66.66666666666667
Thanks in advance.
From here Array functions
with match as
(select count(a) as match_ct from unnest(ARRAY[1,2] ) as a
join
(select * from unnest(ARRAY[2,1,3]) as b) t on a=t.b)
select
match_ct/total_ct::numeric from match,
(select count(*) as total_ct from unnest(ARRAY[1,2], ARRAY[2,1,3]) as t(a, b)) as total ;
?column?
------------------------
0.66666666666666666667
You could create a function for that:
CREATE FUNCTION array_similarity(anyarray, anyarray)
RETURNS double precision
LANGUAGE sql
IMMUTABLE STRICT AS
$$SELECT 100.0 * count(*) / cardinality($2)
FROM unnest($1) AS a1(e1)
WHERE ARRAY[e1] <# $2$$;

How to cast a string to array of struct in HiveQL

I have a hive table with the column "periode", the type of the column is string.
The column have values like the following:
[{periode:20160118-20160205,nb:1},{periode:20161130-20161130,nb:1},{periode:20161130-20161221,nb:1}]
[{periode:20161212-20161217,nb:0}]
I want to cast this column in array<struct<periode:string, nb:int>>.
The final goal is to have one raw by periode.
For this I want to use lateral view with explode on the column periode.
That's why I want to convert it to array<struct<string, int>>
Thanks for help.
Sidi
You don't need to "cast" anything, you just need to explode the array and then unpack the struct. I added an index to your data to make it more clear where things are ending up.
Data:
idx arr_of_structs
0 [{periode:20160118-20160205,nb:1},{periode:20161130-20161130,nb:1},{periode:20161130-20161221,nb:1}]
1 [{periode:20161212-20161217,nb:0}]
Query:
SELECT idx -- index
, my_struct.periode AS periode -- unpacks periode
, my_struct.nb AS nb -- unpacks nb
FROM database.table
LATERAL VIEW EXPLODE(arr_of_structs) exptbl AS my_struct
Output:
idx periode nb
0 20160118-20160205 1
0 20161130-20161130 1
0 20161130-20161221 1
1 20161212-20161217 0
It's a bit unclear from your question what the desired result is, but as soon as you update it I'll modify the query accordingly.
EDIT:
The above solution is incorrect, I didn't catch that your input is a STRING.
Query:
SELECT REGEXP_EXTRACT(tmp_arr[0], "([0-9]{8}-[0-9]{8})") AS periode
, REGEXP_EXTRACT(tmp_arr[1], ":([0-9]*)") AS nb
FROM (
SELECT idx
, pos
, COLLECT_SET(tmp_col) AS tmp_arr
FROM (
SELECT idx
, tmp_col
, CASE WHEN PMOD(pos, 2) = 0 THEN pos+1 ELSE pos END AS pos
FROM (
SELECT *
, ROW_NUMBER() OVER () AS idx
FROM database.table ) x
LATERAL VIEW POSEXPLODE(SPLIT(periode, ',')) exptbl AS pos, tmp_col ) y
GROUP BY idx, pos) z
Output:
periode nb
20160118-20160205 1
20161130-20161130 1
20161130-20161221 1
20161212-20161217 0
What about use the split function? you should be able to do something like
select nb, period from
(select split(periode, "-") as periods, nb from yourtable) t
LATERAL VIEW explode(periods) sss AS period;
I didnt tried but it should work :)
EDIT: the above should work if you have a column periodes following a pattern date-date-date.. and a column nb, but it looks like that it isn't the case here. The following query should work for you (verbose but work)
select period, nb from (
select
regexp_replace(split(split(tok1,",")[1],":")[1], "[\\]|}]", "") as nb,
split(split(split(tok1,",")[0],":")[1],"-") as periods
from
(select split(YOURSTRINGCOLUMN, "},") as s1 from YOURTABLE)
r1 LATERAL VIEW explode(s1) ss1 AS tok1
) r2 LATERAL VIEW explode(periods) ss2 AS period;
I realize this question is 1YO, but I ran into this same issue and tackled it by using the json_split brickhouse UDF.
SELECT EXPLODE(
json_split(
'[{"periode":"20160118-20160205","nb":1},{"periode":"20161130-20161130","nb":1},{"periode":"20161130-20161221","nb":1}]'
));
col
{"periode":"20160118-20160205","nb":1}
{"periode":"20161130-20161130","nb":1}
{"periode":"20161130-20161221","nb":1}
Sorry for the spaghetti code.
There's also a similar question here using JSON arrays instead of JSON strings. It's not the same case, but for anyone facing this kind of task it might be useful in a bigger context.

Postgres: Need to select keywords as separate array values

Datatype:
id: int4
keywords: text
objectivable_id: int4
Postgres version: PostgreSQL 9.5.3
Business_objectives table:
id keywords objectivable_id
1 keyword1a,keyword1b,keyword1c 6
2 keyword2a 6
3 testing 5
Currently the query I'm using is :
select array(select b.keywords from business_objectives b where b.objectivable_id = 6)
It selects the keywords of matched objectivable_id as:
{"keyword1a,keyword1b,keyword1c","keyword2a"}
Over here I wanted the result to be :
{"keyword1a","keyword1b","keyword1c","keyword2a"}
I tried using "string_agg(text, delimiter)", but it just combines all the keywords into one single pocket of an array.
You can simply (and cheaply!) use:
SELECT string_to_array(string_agg(keywords, ','), ',')
FROM business_objectives
WHERE objectivable_id = 6;
Concatenate your comma separate lists with string_agg(), and then convert the complete text to an array with string_to_array().
So something like this can give you expected result:
SELECT array_agg( j.keys )
FROM business_objectives b,
LATERAL ( SELECT k
FROM unnest ( string_to_array( b.keywords, ',' ) ) u( k )
) j( keys )
WHERE b.objectivable_id = 6;
array_agg
-------------------------------------------
{keyword1a,keyword1b,keyword1c,keyword2a}
(1 row)
With the LATERAL part, we look at the outer query to create a new view. Simply it does split of your keywords as set of rows which you can then feed into array_agg() function.
See more about LATERAL: https://www.postgresql.org/docs/9.6/static/queries-table-expressions.html#QUERIES-LATERAL

How to replace a value in array

I have a data like below.
id col1[]
--- ------
1 {1,2,3}
2 {3,4,5}
My question is how to use replace function in arrays.
select array_replace(col1, 1, 100) where id = 1;
but it gives an error like:
function array_replace(integer[], integer, integer) does not exist
can anyone suggest how to use it?
Your statement (augmented with the missing FROM clause):
SELECT array_replace(col1, 1, 100) FROM tbl WHERE id = 1;
As commented by #mu, array_replace() was introduced with pg 9.3. I see 3 options for older versions:
1. intarray
As long as ...
we are dealing with integer arrays.
elements are unique.
and the order of elements is irrelevant.
A simple and fast option would be to install the additional module intarray, which (among other things) provides operators to subtract and add elements (or whole arrays) from/to integer arrays:
SELECT CASE col1 && '{1}'::int[] THEN (col1 - 1) + 100 ELSE col1 END AS col1
FROM tbl WHERE id = 1;
2. Emulate with SQL functions
A (slower) drop-in replacement for array_replace() using polymorphic types, so it works for any base type:
CREATE OR REPLACE FUNCTION f_array_replace(anyarray, anyelement, anyelement)
RETURNS anyarray LANGUAGE SQL IMMUTABLE AS
'SELECT ARRAY (SELECT CASE WHEN x = $2 THEN $3 ELSE x END FROM unnest($1) x)';
Does not replace NULL values. Related:
Replace NULL values in an array in PostgreSQL
If you need to guarantee order of elements:
PostgreSQL unnest() with element number
3. Apply patch to source and recompile
Get the patch "Add array_remove() and array_replace() functions" from the git repo, apply it to the source of your version and recompile. May or may not apply cleanly. The older your version the worse are your chances. I have not tried that, I would rather upgrade to the current version.
You can create your own
based on this source :
CREATE TABLE arr(id int, col1 int[]);
INSERT INTO arr VALUES (1, '{1,2,3}');
INSERT INTO arr VALUES (2, '{3,4,5}');
SELECT array(
SELECT CASE WHEN q = 1 THEN 100 ELSE q END
FROM UNNEST(col1::int[]) q)
FROM arr;
array
-----------
{100,2,3}
{3,4,5}
You can create your own function and put it in your public schema if you still want to call by function though it will be slightly different than the original.
UPDATE tbl SET col1 = array_replace(col1, 1, 100) WHERE id = 1;
Here is the sample query for a test-array:
SELECT test_id,
test_array,
(array (
-- Replace existing value 'int' of an array with given value 'Text'
SELECT CASE WHEN a = '0' THEN 'MyEntry'
WHEN a = '1' THEN 'Apple'
WHEN a = '2' THEN 'Banana'
WHEN a = '3' THEN 'ChErRiEs'
WHEN a = '4' THEN 'Dragon Fruit'
WHEN a = '5' THEN 'Eat a Fruit in a Day'
ELSE 'NONE' END
FROM UNNEST(test_array::TEXT[]) a) ::TEXT
-- UNNEST : Lists out values of my_test_array
) test_result
FROM (
--my_test_array
SELECT 1 test_id, '{0,1,2,3,4,5,6,7,8,9}'::TEXT[][] test_array
) test;

Resources