How to cast a string to array of struct in HiveQL - arrays

I have a hive table with the column "periode", the type of the column is string.
The column have values like the following:
[{periode:20160118-20160205,nb:1},{periode:20161130-20161130,nb:1},{periode:20161130-20161221,nb:1}]
[{periode:20161212-20161217,nb:0}]
I want to cast this column in array<struct<periode:string, nb:int>>.
The final goal is to have one raw by periode.
For this I want to use lateral view with explode on the column periode.
That's why I want to convert it to array<struct<string, int>>
Thanks for help.
Sidi

You don't need to "cast" anything, you just need to explode the array and then unpack the struct. I added an index to your data to make it more clear where things are ending up.
Data:
idx arr_of_structs
0 [{periode:20160118-20160205,nb:1},{periode:20161130-20161130,nb:1},{periode:20161130-20161221,nb:1}]
1 [{periode:20161212-20161217,nb:0}]
Query:
SELECT idx -- index
, my_struct.periode AS periode -- unpacks periode
, my_struct.nb AS nb -- unpacks nb
FROM database.table
LATERAL VIEW EXPLODE(arr_of_structs) exptbl AS my_struct
Output:
idx periode nb
0 20160118-20160205 1
0 20161130-20161130 1
0 20161130-20161221 1
1 20161212-20161217 0
It's a bit unclear from your question what the desired result is, but as soon as you update it I'll modify the query accordingly.
EDIT:
The above solution is incorrect, I didn't catch that your input is a STRING.
Query:
SELECT REGEXP_EXTRACT(tmp_arr[0], "([0-9]{8}-[0-9]{8})") AS periode
, REGEXP_EXTRACT(tmp_arr[1], ":([0-9]*)") AS nb
FROM (
SELECT idx
, pos
, COLLECT_SET(tmp_col) AS tmp_arr
FROM (
SELECT idx
, tmp_col
, CASE WHEN PMOD(pos, 2) = 0 THEN pos+1 ELSE pos END AS pos
FROM (
SELECT *
, ROW_NUMBER() OVER () AS idx
FROM database.table ) x
LATERAL VIEW POSEXPLODE(SPLIT(periode, ',')) exptbl AS pos, tmp_col ) y
GROUP BY idx, pos) z
Output:
periode nb
20160118-20160205 1
20161130-20161130 1
20161130-20161221 1
20161212-20161217 0

What about use the split function? you should be able to do something like
select nb, period from
(select split(periode, "-") as periods, nb from yourtable) t
LATERAL VIEW explode(periods) sss AS period;
I didnt tried but it should work :)
EDIT: the above should work if you have a column periodes following a pattern date-date-date.. and a column nb, but it looks like that it isn't the case here. The following query should work for you (verbose but work)
select period, nb from (
select
regexp_replace(split(split(tok1,",")[1],":")[1], "[\\]|}]", "") as nb,
split(split(split(tok1,",")[0],":")[1],"-") as periods
from
(select split(YOURSTRINGCOLUMN, "},") as s1 from YOURTABLE)
r1 LATERAL VIEW explode(s1) ss1 AS tok1
) r2 LATERAL VIEW explode(periods) ss2 AS period;

I realize this question is 1YO, but I ran into this same issue and tackled it by using the json_split brickhouse UDF.
SELECT EXPLODE(
json_split(
'[{"periode":"20160118-20160205","nb":1},{"periode":"20161130-20161130","nb":1},{"periode":"20161130-20161221","nb":1}]'
));
col
{"periode":"20160118-20160205","nb":1}
{"periode":"20161130-20161130","nb":1}
{"periode":"20161130-20161221","nb":1}
Sorry for the spaghetti code.
There's also a similar question here using JSON arrays instead of JSON strings. It's not the same case, but for anyone facing this kind of task it might be useful in a bigger context.

Related

Postgres: Need to select keywords as separate array values

Datatype:
id: int4
keywords: text
objectivable_id: int4
Postgres version: PostgreSQL 9.5.3
Business_objectives table:
id keywords objectivable_id
1 keyword1a,keyword1b,keyword1c 6
2 keyword2a 6
3 testing 5
Currently the query I'm using is :
select array(select b.keywords from business_objectives b where b.objectivable_id = 6)
It selects the keywords of matched objectivable_id as:
{"keyword1a,keyword1b,keyword1c","keyword2a"}
Over here I wanted the result to be :
{"keyword1a","keyword1b","keyword1c","keyword2a"}
I tried using "string_agg(text, delimiter)", but it just combines all the keywords into one single pocket of an array.
You can simply (and cheaply!) use:
SELECT string_to_array(string_agg(keywords, ','), ',')
FROM business_objectives
WHERE objectivable_id = 6;
Concatenate your comma separate lists with string_agg(), and then convert the complete text to an array with string_to_array().
So something like this can give you expected result:
SELECT array_agg( j.keys )
FROM business_objectives b,
LATERAL ( SELECT k
FROM unnest ( string_to_array( b.keywords, ',' ) ) u( k )
) j( keys )
WHERE b.objectivable_id = 6;
array_agg
-------------------------------------------
{keyword1a,keyword1b,keyword1c,keyword2a}
(1 row)
With the LATERAL part, we look at the outer query to create a new view. Simply it does split of your keywords as set of rows which you can then feed into array_agg() function.
See more about LATERAL: https://www.postgresql.org/docs/9.6/static/queries-table-expressions.html#QUERIES-LATERAL

Slicing the word to rows -TERADATA

I want to slice a word eg: SMILE into :
S
M
I
L
E
I did it like this
SEL SUBSTR(EMP_NAME,1,1) FROM etlt5.employe where EMP_ID='28008'
UNION ALL
SEL SUBSTR(EMP_NAME,2,1) FROM etlt5.employe where EMP_ID='28008'
UNION ALL
SEL SUBSTR(EMP_NAME,3,1) FROM etlt5.employe where EMP_ID='28008'
I also tried it with recursive query but no final results.is there a better way of doing this because this looks more like a hardcoded one.
You could use STRTOK_SPLIT_TO_TABLE to do this. STRTOK_SPLIT_TO_TABLE splits a field by a delimiter and then takes each token (stuff between the delimiter) and sticks it in it's own record of a new derived table.
In your case you don't have a delimiter between the characters of "SMILE" so we can use some REGEXP_REPLACE magic to stick a comma between each letter, and then split that to a table:
WITH test (id, word) AS (SELECT 1, 'SMILE')
SELECT D.*
FROM TABLE (strtok_split_to_table(test.id, REGEXP_REPLACE(test.word, '([a-zA-Z])', ',\1'), ',')
RETURNS
( id integer
, rownum integer
, new_col varchar(100)character set unicode)
) as d
I've used this STRTOK_SPLIT_TO_TABLE(REGEXP_REPLACE()) before to split apart document numbers in order to determine a check digit, so it definitely has its uses.
May I ask why you want to do that?
You need a table with a sequence from 1 to the max length of EMP_NAME:
select SUBSTR(EMP_NAME,n,1)
FROM etlt5.employe CROSS JOIN number_table
where EMP_ID='28008'

How to replace a value in array

I have a data like below.
id col1[]
--- ------
1 {1,2,3}
2 {3,4,5}
My question is how to use replace function in arrays.
select array_replace(col1, 1, 100) where id = 1;
but it gives an error like:
function array_replace(integer[], integer, integer) does not exist
can anyone suggest how to use it?
Your statement (augmented with the missing FROM clause):
SELECT array_replace(col1, 1, 100) FROM tbl WHERE id = 1;
As commented by #mu, array_replace() was introduced with pg 9.3. I see 3 options for older versions:
1. intarray
As long as ...
we are dealing with integer arrays.
elements are unique.
and the order of elements is irrelevant.
A simple and fast option would be to install the additional module intarray, which (among other things) provides operators to subtract and add elements (or whole arrays) from/to integer arrays:
SELECT CASE col1 && '{1}'::int[] THEN (col1 - 1) + 100 ELSE col1 END AS col1
FROM tbl WHERE id = 1;
2. Emulate with SQL functions
A (slower) drop-in replacement for array_replace() using polymorphic types, so it works for any base type:
CREATE OR REPLACE FUNCTION f_array_replace(anyarray, anyelement, anyelement)
RETURNS anyarray LANGUAGE SQL IMMUTABLE AS
'SELECT ARRAY (SELECT CASE WHEN x = $2 THEN $3 ELSE x END FROM unnest($1) x)';
Does not replace NULL values. Related:
Replace NULL values in an array in PostgreSQL
If you need to guarantee order of elements:
PostgreSQL unnest() with element number
3. Apply patch to source and recompile
Get the patch "Add array_remove() and array_replace() functions" from the git repo, apply it to the source of your version and recompile. May or may not apply cleanly. The older your version the worse are your chances. I have not tried that, I would rather upgrade to the current version.
You can create your own
based on this source :
CREATE TABLE arr(id int, col1 int[]);
INSERT INTO arr VALUES (1, '{1,2,3}');
INSERT INTO arr VALUES (2, '{3,4,5}');
SELECT array(
SELECT CASE WHEN q = 1 THEN 100 ELSE q END
FROM UNNEST(col1::int[]) q)
FROM arr;
array
-----------
{100,2,3}
{3,4,5}
You can create your own function and put it in your public schema if you still want to call by function though it will be slightly different than the original.
UPDATE tbl SET col1 = array_replace(col1, 1, 100) WHERE id = 1;
Here is the sample query for a test-array:
SELECT test_id,
test_array,
(array (
-- Replace existing value 'int' of an array with given value 'Text'
SELECT CASE WHEN a = '0' THEN 'MyEntry'
WHEN a = '1' THEN 'Apple'
WHEN a = '2' THEN 'Banana'
WHEN a = '3' THEN 'ChErRiEs'
WHEN a = '4' THEN 'Dragon Fruit'
WHEN a = '5' THEN 'Eat a Fruit in a Day'
ELSE 'NONE' END
FROM UNNEST(test_array::TEXT[]) a) ::TEXT
-- UNNEST : Lists out values of my_test_array
) test_result
FROM (
--my_test_array
SELECT 1 test_id, '{0,1,2,3,4,5,6,7,8,9}'::TEXT[][] test_array
) test;

SubString of value with several 0s

The data in the table looks like this
ID Value
1 5006049
2 5006050
How do I select a substring so that I get
R6049
R6050
Keeping in mind that the values are sequential starting from
5000001 = R1
to
5999999 = R999999
Just substract
SELECT 'R' + CAST(VALUE - 5000000 as VARCHAR(6))
FROM table
SqlFiddle
I think as easily as this:
Select 'R'+Substring(convert(VARCHAR(7), Value), 4,7)
Which will give R0001 (do you want the zeros?)
If you don't want the zeros / only looking to remove the top digit:
Select 'R'+ convert(VARCHAR(6),Value - 5000000)

SQL Server: sort a column numerically if possible, otherwise alpha

I am working with a table that comes from an external source, and cannot be "cleaned". There is a column which an nvarchar(20) and contains an integer about 95% of the time, but occasionally contains an alpha. I want to use something like
select * from sch.tbl order by cast(shouldBeANumber as integer)
but this throws an error on the odd "3A" or "D" or "SUPERCEDED" value.
Is there a way to say "sort it like a number if you can, otherwise just sort by string"? I know there is some sloppiness in that statement, but that is basically what I want.
Lets say for example the values were
7,1,5A,SUPERCEDED,2,5,SECTION
I would be happy if these were sorted in any of the following ways (because I really only need to work with the numeric ones)
1,2,5,7,5A,SECTION,SUPERCEDED
1,2,5,5A,7,SECTION,SUPERCEDED
SECTION,SUPERCEDED,1,2,5,5A,7
5A,SECTION,SUPERCEDED,1,2,5,7
I really only need to work with the
numeric ones
this will give you only the numeric ones, sorted properly:
SELECT
*
FROM YourTable
WHERE ISNUMERIC(YourColumn)=1
ORDER BY YourColumn
select
*
from
sch.tbl
order by
case isnumeric(shouldBeANumber)
when 1 then cast(shouldBeANumber as integer)
else 0
end
Provided that your numbers are not more than 100 characters long:
WITH chars AS
(
SELECT 1 AS c
UNION ALL
SELECT c + 1
FROM chars
WHERE c <= 99
),
rows AS
(
SELECT '1,2,5,7,5A,SECTION,SUPERCEDED' AS mynum
UNION ALL
SELECT '1,2,5,5A,7,SECTION,SUPERCEDED'
UNION ALL
SELECT 'SECTION,SUPERCEDED,1,2,5,5A,7'
UNION ALL
SELECT '5A,SECTION,SUPERCEDED,1,2,5,7'
)
SELECT rows.*
FROM rows
ORDER BY
(
SELECT SUBSTRING(mynum, c, 1) AS [text()]
FROM chars
WHERE SUBSTRING(mynum, c, 1) BETWEEN '0' AND '9'
FOR XML PATH('')
) DESC
SELECT
(CASE ISNUMERIC(shouldBeANumber)
WHEN 1 THEN
RIGHT(CONCAT('00000000',shouldBeANumber), 8)
ELSE
shouoldBeANumber) AS stringSortSafeAlpha
ORDEER BY
stringSortSafeAlpha
This will add leading zeros to all shouldBeANumber values that truly are numbers and leave all remaining values alone. This way, when you sort, you can use an alpha sort but still get the correct values (with an alpha sort, "100" would be less than "50", but if you change "50" to "050", it works fine). Note, for this example, I added 8 leading zeros, but you only need enough leading zeros to cover the largest possible integer in your column.

Resources