Extract numbers from a string & convert into an array in hive - arrays

I have string column in my hive table and I would the output to be in the following way:
Column A Column B
ddjj3332 jjn32212 334334 (3332, 32212, 334334)
I have tried to do regex replace in the string to remove the alphabets. But I am unable to convert it into an array.
Let me know if this is possible.

You can use split to make an array after you perform the replace:
split(regexp_replace(column_a, '[^0-9 ]', ''), ' ')

Related

Data Factory concatenate two arrays

I have two arrays (job_1 and job_2) which I need to concatenate in one array:
[
In DataFlow, I use the code:
union([job_1],[job_2])
I need to have everything in one array this way:
["1428526", "1425403","1425696","1425126","1424631","1381348"]
NOT THIS WAY
["1428526"]["1425403","1425696","1425126","1424631","1381348"]
Replace ', [] "' from values of two columns
split(replace(replace(replace(job_ids, '"', ''), ']', ''), '[', ''),',')
Add two columns together
new = job_1 + job_2
Convert new column to string
toString(new)

How to replace string in "select" statement

I need to add comma after every six digits ,but I don't know its length and I can't use loops.
Thanks in advance.
I've tried REGEXP_REPLACE DB2 function, but it doesn't recognize my column as string.
For example , I need to replace "123456123456" with "123456, 123456".
Try this:
select rtrim(xmlcast(xmlquery('fn:replace($s, "([0-9]{6})", "$1, ")' passing str as "s") as varchar(4000)), ', ')
from table(values ('123456123456')) t(str);

How can I convert array to string in hive sql?

I want to convert an array to string in hive. I want to collect_set array values to convert to string without [[""]].
select actor, collect_set(date) as grpdate from actor_table group by actor;
so that [["2016-07-01", "2016-07-02"]] would become 2016-07-01, 2016-07-02
Use concat_ws(string delimiter, array<string>) function to concatenate array:
select actor, concat_ws(',',collect_set(date)) as grpdate from actor_table group by actor;
If the date field is not string, then convert it to string:
concat_ws(',',collect_set(cast(date as string)))
Read also this answer about alternative ways if you already have an array (of int) and do not want to explode it to convert element type to string: How to concatenate the elements of int array to string in Hive
Sometimes, you may need a JSON formatted list, so you can simply use:
SELECT CAST(COLLECT_SET(date) AS STRING) AS dates FROM actor_table
PS: I needed this but found only your question about array to string conversion.

SQL: Fix for CSV import mistake

I have a database that has multiple columns populated with various numeric fields. While trying to populate from a CSV, I must have mucked up assigning delimited fields. The end result is a column containing It's Correct information, but also contains the next column over's data- seperated by a comma.
So instead of Column UPC1 containing "958634", it contains "958634,95877456". The "95877456" is supposed to be in the UPC2 column, instead UPC2 is NULL.
Is there a way for me to split on the comma and send the data to UPC2 while keeping UPC1 data before the comma in tact?
Thanks.
You can do this with string functions. To query the values and verify the logic, try this:
SELECT
LEFT(UPC1, CHARINDEX(',', UPC1) - 1),
SUBSTRING(UPC1, CHARINDEX(',', UPC1) + 1, 1000)
FROM myTable;
If the result is what you want, turn it into an update:
UPDATE myTable SET
UPC1 = LEFT(UPC1, CHARINDEX(',', UPC1) - 1),
UPC2 = SUBSTRING(UPC1, CHARINDEX(',', UPC1) + 1, 1000);
The expression for UPC1 takes the left side of UPC1 up to one character before the comma.
The expression for UPC2 takes the remainder of the UPC1 string starting one character after the comma.
The third argument to SUBSTRING needs some explaining. It's the number of characters you want to include after the starting position of the string (which in this case is one character after the comma's location). If you specify a value that's longer than the string SUBSTRING will just return to the end of the string. Using 1000 here is a lot easier than calculating the exact number of characters you need to get to the end.

Get element of character varying type by index or convert it to array

I have some sql function that returns character varying type. The output is something like this:
'TTFFFFNN'. I need to get this characters by index. How to convert character varying to array?
Use string_to_array() with NULL as delimiter (pg 9.1+):
SELECT string_to_array('TTFFFFNN'::text, NULL) AS arr;
Per documentation:
In string_to_array, if the delimiter parameter is NULL, each character
in the input string will become a separate element in the resulting array.
In older versions (pg 9.0-), the call with NULL returned NULL. (Fiddle.)
To get the 2nd position (example):
SELECT (string_to_array('TTFFFFNN'::text, NULL))[2] AS item2;
Alternatives
For single characters I would use substring() directly, like #a_horse commented:
SELECT substring('TTFFFFNN'::text, 2, 1) AS item2;
SQL Fiddle showing both.
For strings with actual delimiters, I suggest split_part():
Split comma separated column data into additional columns
Only use regexp_split_to_array() if you must. Regular expression processing is more expensive.
In addition to the solutions outlined by Erwin and a horse, you can use regexp_split_to_array() with an empty regexp instead:
select regexp_split_to_array('TTFFFFNN'::text, '');
With an index, that becomes:
select (regexp_split_to_array('TTFFFFNN'::text, ''))[2];

Resources