Extract last two elements of an array in HIVE - arrays

I have an array in a hive table, and I want to extract the two last elements of each array, something like this:
["a", "b", "c"] -> ["b", "c"]
I tried a code like this:
SELECT
*,
array[size] AS term_n,
array[size - 1] AS term_n_1
FROM
(SELECT *, size(array) AS size FROM MyTable);
But it didn't work, someone has any idea please?

array is a reserved word and should be qualified.
An inner sub-query should be aliased.
Array index start with 0. If the array size is 5 then the last index is 4.
Demo
with MyTable as (select array('A','B','C','D','E') as `array`)
SELECT *
,`array`[size - 1] AS term_n
,`array`[size - 2] AS term_n_1
FROM (SELECT *
,size(`array`) AS size
FROM MyTable
) t
;
+-----------------------+--------+--------+----------+
| t.array | t.size | term_n | term_n_1 |
+-----------------------+--------+--------+----------+
| ["A","B","C","D","E"] | 5 | E | D |
+-----------------------+--------+--------+----------+

I don't know the error that you are getting, but it should be something like
select
yourarray[size(yourarray)],
yourarray[size(yourarray)-1]
from mytable

This is a solution to extract the last element of an array in the same query (notice it is not very optimal, and you can apply the same principle to extract n last elements of the array), the logic is to calculate the size of the last element (amount of letters minus the separator character) and then make a substring from 0 to the total size minus the calculated amount of characters to extract
Table of example:
col1 | col2
--------------
row1 | aaa-bbb-ccc-ddd
You want to get (extracting the last element, in this case "-ddd"):
row1 | aaa-bbb-ccc
the query you may need:
select col1, substr(col2,0,length(col2)-(length(reverse(split(reverse(col2),'-')[0]))+1)) as shorted_col2_1element from example_table
If you want to add more elements you have to keep adding the positions in the second part of the operation.
Example to extract the last 2 elements:
select col1, substr(col2,0,length(col2)-(length(reverse(split(reverse(col2),'-')[0]))+1) + length(reverse(split(reverse(col2),'-')[1]))+1)) as shorted_col2_2element from example_table
after executing this second command line you will have something like:
row1 | aaa-bbb
*As said previously this is a not optimal solution at all, but may help you

Related

Google Sheets SumIfs with left formula

I want to use the sumifs formula, but the sum interval range has text in it.
Example:
|Criteria|Sum Interval|
|--------|------------|
| A | 1 - Good |
| A | 2 - Regular|
| C | 3 - Bad |
So, I want to check the criteria field and, when met, sum the first character of the Sum Interval. I tried something like this:
= sumifs( arrayformula(left(suminterval, 1)) , criteria, 'A')
In this case, the formula should return 3 (1 + 2)
arrayformula(left(suminterval, 1)) = interval with only first character
This work when used alone, but when I use it as an argument, a receive a message saying that the argument must be a range.
Ps: The hole solution has to be in an only formula.
try:
=INDEX(QUERY({A2:A, REGEXEXTRACT(B2:B, "\d+")*1}, "select sum(Col2) where Col1 = 'A'"), 2)

Alternative solutions to an array search in PostgreSQL

I am not sure if my database design is good for this tricky case and I also ask for help how the query for this could look like.
I plan a query with the following table:
search_array | value | id
-----------------------+-------+----
{XYa,YZb,WQb} | b | 1
{XYa,YZb,WQb,RSc,QZa} | a | 2
{XYc,YZa} | c | 3
{XYb} | a | 4
{RSa} | c | 5
There are 5 main elements in the search_array: XY, YZ, WQ, RS, QZ and 3 Values: a, b, c that are concardinated to each element.
Each row has also one value: a, b or c.
My aim is to find all rows that fit to a specific row in this sense: At first it should be checked if they have any same main elements in their search_arrays (yellow marked in the example).
As example:
Row id 4 an row id 5 wouldnt match because XY != RS.
Row id 1, 2 and 3 would match two times because they have all XY and YZ.
Row id 1 and 2 would even match three times because they have also WQ in common.
And second: if there is a Main Element match it should be 'crosschecked' if the lowercase letters after the Main Elements fit to the value of the other row.
As example: The only match for Row id 1 in the table would be Row id 4 because they both search for XY and the low letters after the elements match each value of the two rows.
Another match would be ROW id 2 and 5 with RS and search c to value c and search a to value a (green and orange marked).
My idea was to cut the search_array elements in the query in two parts with the RIGHT and LEFT command for strings. But I dont know how to combine the subqueries for this search.
Or would be a complete other solution faster? Like splitting the search array into another table with the columns 'foregin key' to the maintable, 'main element' and 'searched_value'. I am not sure if this is the best solution because the program would all the time switch to the main table to find two rows out of 3 million rows to compare their searched_values to the values?
Thank you very much for your answers and your time!
You'll have to represent the data in a normalized fashion. I'll do it in a WITH clause, but it would be better to store the data in this fashion to begin with.
WITH unravel AS (
SELECT t.id, t.value,
substr(u.val, 1, 2) AS arr_main,
substr(u.val, 3, 1) AS arr_val
FROM mytable AS t
CROSS JOIN LATERAL unnest(t.search_array) AS u(val)
)
SELECT a.id AS first_id,
a.value AS first_value,
b.id AS second_id,
b.value AS second_value,
a.arr_main AS main_element
FROM unravel AS a
JOIN unravel AS b
ON a.arr_main = b.arr_main
AND a.arr_val = b.value
AND b.arr_val = a.value;

Insert array of JSONB objects from one table as multiple rows in second table

We are trying to migrate data from an array column containing JSONB to a proper Postgres table.
{{"a":1,"b": 2, "c":"bar"},{"a": 2, "b": 3, "c":"baz"}}
a | b | c
---+---------+---
1 | 2 | "bar"
2 | 3 | "baz"
As part of the process, we have made several attempts using functions like unnest and array_to_json. In the unnest case, we get several JSONB rows, but cannot figure out how to insert them into the second table. In the array_to_json case, we are able to cast the array to a JSON string, but the json_to_recordset does not seem to accept the JSON string from a common table expression.
What would be a good strategy to 'mirror' the array of JSONB items as a proper table, so that we can run the query inside of a stored procedure, triggered on insert?
Use unnest() in a lateral join:
with my_data(json_column) as (
values (
array['{"a":1,"b":2,"c":"bar"}','{"a":2,"b":3,"c":"baz"}']::jsonb[])
)
select
value->>'a' as a,
value->>'b' as b,
value->>'c' as c
from my_data
cross join unnest(json_column) as value
a | b | c
---+---+-----
1 | 2 | bar
2 | 3 | baz
(2 rows)
You may need some casts or converts, e.g.:
select
(value->>'a')::int as a,
(value->>'b')::int as b,
(value->>'c')::text as c
from my_data
cross join unnest(json_column) as value
Lateral joining means that the function unnest() will be executed for each row from the main table. The function returns elements of the array as value.

PostgreSQL: retrieving multiple array elements

Let's say we have a query like:
SELECT regexp_split_to_array('foo,bar', ',');
Results:
+-----------------------+
| regexp_split_to_array |
+-----------------------+
| {foo,bar} |
+-----------------------+
(1 row)
To access a single element of an array we can use code like:
SELECT (regexp_split_to_array('foo,bar', ','))[1];
Which will return:
+-----------------------+
| regexp_split_to_array |
+-----------------------+
| foo |
+-----------------------+
(1 row)
Or use slices like:
SELECT (regexp_split_to_array('foo,bar', ','))[2:];
Result:
+-----------------------+
| regexp_split_to_array |
+-----------------------+
| {bar} |
+-----------------------+
(1 row)
However, when I try to access 2 elements at once, like:
SELECT (regexp_split_to_array('foo,bar', ','))[1,2];
or
SELECT (regexp_split_to_array('foo,bar', ','))[1][2];
or any other syntax, I receive an error:
ERROR: syntax error at or near ","
Is it possible to retrieve two different and not adjacent elements of an array in PostgreSQL?
Extracting multiple elements through a select from an array should either mean you can have them returned as multiple columns or all those elements part of a single array.
This returns you one column as an array of the two elements.
knayak=# select ARRAY[arr[1],arr[2]] FROM regexp_split_to_array('foo,bar', ',') as arr;
array
-----------
{foo,bar}
(1 row)
..and this simply gives you the two elements as columns.
knayak=# select arr[1],arr[2] FROM regexp_split_to_array('foo,bar', ',') as arr;
arr | arr
-----+-----
foo | bar
(1 row)
The colon ':' in the array indexer does allow you to access multiple elements as a from-thru.
select (array[1,2,3,4,5])[2:4]
returns
{2,3,4}
This would work in your example above, but not if 1 an 2 weren't next to each other. If that's the case, the suggestion from #KaushikNayak is the only way I could think of.
Using your example:
SELECT (regexp_split_to_array('foo,bar', ','))[1:2]

postgres + json object to array

I would like to know if is possible to 'covnert' a json object to a json array to iterate over a mixed set of data.
I have two rows that look like
{Data:{BASE:{B1:0,color:green}}}
{Data:{BASE:[{B1:1,color:red},{B1:0,color:blue}]}}
I would like to extract the B1 val from all this rows, but I am a bit blocked :)
My first try was a json_extract_array, but it fails on the 1st row (not an array).
Then my second try was a json_array_length with a case, but that fails at the 1st row (not array)
Can I handle this situation in any way?
Basically I need to extract all the rows where B1 > 0 in one of the json array (or object) and maybe return the node that contains B1 > 0.
Your main problem is you mix the data types under the json -> 'Data' -> 'BASE' path, which cannot be handled easily. I could come up with a solution, but you should fix your schema, f.ex. to only contain arrays at that path.
with v(j) as (
values (json '{"Data":{"BASE":{"B1":0,"color":"green"}}}'),
('{"Data":{"BASE":[{"B1":1,"color":"red"},{"B1":0,"color":"blue"}]}}')
)
select j, node
from v,
lateral (select j #> '{Data,BASE}') b(b),
lateral (select substring(trim(leading E'\x20\x09\x0A\x0D' from b::text) from 1 for 1)) l(l),
lateral (select case
when l = '{' and (b #>> '{B1}')::numeric > 0 then b
when l = '[' then (select e from json_array_elements(b) e where (e #>> '{B1}')::numeric > 0 limit 1)
else null
end) node(node)
where node is not null
SQLFiddle
To return the rows where at least one object has B1 > 0
select *
from t
where true in (
select (j ->> 'B1')::int > 0
from json_array_elements (json_column -> 'Data' -> 'BASE') s (j)
)
With a little help from the other answers here, I did this:
with v(j) as (
values (json '{"Data":{"BASE":{"B1":0,"color":"green"}}}'),
('{"Data":{"BASE":[{"B1":1,"color":"red"},{"B1":0,"color":"blue"}]}}')
)
select j->'Data'->'BASE'->>'B1' as "B1"
from v
where j->'Data'->'BASE'->>'B1' is not null
union all
select json_array_elements(j->'Data'->'BASE')->>'B1'
from v
where j->'Data'->'BASE'->>'B1' is null
Divided it into two queries, one which just fetches a single value if there is only one, and one that unwraps the array if there are multiple, utilizing that PostgreSQL returns null if you ask for the text of something that is an array. Then, I just unioned the result from the two queries, which resulted in:
------
| B1 |
------
| 0 |
| 1 |
| 0 |

Resources