How to convert array<string> to string in hive - arrays

I want to convert an array<string> to string in hive. The array data is as follow:
+-------------------------------------+--+
| NULL |
| ["Extension","Terms & Conditions"] |
| ["Value (generic or item level)"] |
+-------------------------------------+--+
I want to collect array values to convert to string without [""] so that I could get result like:
+-------------------------------------+--+
| NULL |
| Extension,Terms & Conditions |
| Value (generic or item level) |
+-------------------------------------+--+
Following query: select concat_ws(',', col_name) as col_name from table_stg; provides the result but is is returning NULL as empty. I tried several reference like:
How can I convert array to string in hive sql?
Hive - How to cast array to string?
But not getting the desired result. Is there any way to get the desired result?

With reference to Vamsi comment, I managed to get this and thought to answer as well for the community reference.
select case when col_name is NULL then NULL
else concat_ws(',',col_name) end from table_name;

Related

is it possible for a query to return the wildcard that matches a given string?

I have the following table in postgres:
Table "public.items"
Column | Type | Collation | Nullable | Default
-------------+--------------------------+-----------+----------+---------------------------------------
id | integer | | not null | nextval('items_id_seq'::regclass)
wildcard | character varying(255) | | not null |
The wildcard column value contains wildcards of the form:stackoverflow*.
This should match any word that begins with 'stackoverflow'.
How can I locate the record that contains the a matching wild card?
For example, given 'stackoverflow.com' I would like to return all wildcards matching it.
something like
Store your wildcards with % instead of * and use like:
select *
from items
where 'stackoverflow.com' like wildcard
Or if you must use *, same but replace * with %:
select *
from items
where 'stackoverflow.com' like replace(wildcard, `*`, `%`)
There is no built-in reverse match operator, but you can just swap the position of the arguments to get the reversed match:
select * from items where 'stackoverflow.com' LIKE items.wildcard;
Now, you can't get use of an index this way around, but that is only a barrier to performance, it won't stop you from running the query and getting an answer.

Ruby PG gem select field in array format

I have a postgresql field that stores a 4-element array. I want to select the value of that field, but it's coming back as a string:
{43.690916,-79.396774,43.700845,-79.37125}
I would assume that the gem would have known the format of that field and returned an array but I am wrong.
How can I get this into an array without going through string methods? That would seem like a hack. Moving from four individual float fields to a single array field with associated methods, I thought would make records easier to access.
There was no migration, and restrictive to assume this is in Rails, which it is not. Here is the structure:
Column | Type | Collation | Nullable | Default
-----------+------------------------+-----------+----------+-------------------------------------------
loc_id | integer | | not null | nextval('mjtable_loc_id_seq'::regclass)
locname | character varying(255) | | |
locbounds | double precision[] | | |
In Postgres, array is still a string in the database so you will only have a string and you would need to deal with that as such.
conn = PG.connect( dbname: 'pgarray_development') # or whatever db name
data = conn.exec('SELECT * FROM foos').entries
=> [{"id"=>"1", "coords"=>"{1.0,2.0,3.0,4.0}"]
data.first['coords'].class
=>String
But you can do this
conn.type_map_for_results = PG::BasicTypeMapForResults.new conn
conn.exec("select coords::float[] from foos").values
=> [[1.0, 2.0, 3.0, 4.0]]
There are probably other ways to use type casts, see https://bitbucket.org/ged/ruby-pg/wiki/Home

Element-wise sum of array across rows of a dataset - Spark Scala

I am trying to group the below dataset based on the column "id" and sum the arrays in the column "values" element-wise. How do I do it in Spark using Scala?
Input: (dataset of 2 columns, column1 of type String and column2 of type Array[Int])
| id | values |
---------------
| A | [12,61,23,43]
| A | [43,11,24,45]
| B | [32,12,53,21]
| C | [11,12,13,14]
| C | [43,52,12,52]
| B | [33,21,15,24]
Expected Output: (dataset or dataframe)
| id | values |
---------------
| A | [55,72,47,88]
| B | [65,33,68,45]
| C | [54,64,25,66]
Note:
The result has to be flexible and dynamic. That is, even if there are 1000s of columns or even if the file is several TBs or PBs, the solution should still hold good.
I'm a little unsure about what you mean when you say it has to be flexible, but just on top of my head, I can think of a couple of ways. The first (and in my opinion the prettiest) one uses a udf:
// Creating a small test example
val testDF = spark.sparkContext.parallelize(Seq(("a", Seq(1,2,3)), ("a", Seq(4,5,6)), ("b", Seq(1,3,4)))).toDF("id", "arr")
val sum_arr = udf((list: Seq[Seq[Int]]) => list.transpose.map(arr => arr.sum))
testDF
.groupBy('id)
.agg(sum_arr(collect_list('arr)) as "summed_values")
If you have billions of identical ids, however, the collect_list will of course be a problem. In that case you could do something like this:
testDF
.flatMap{case Row(id: String, list: Seq[Int]) => list.indices.map(index => (id, index, list(index)))}
.toDF("id", "arr_index", "arr_element")
.groupBy('id, 'arr_index)
.agg(sum("arr_element") as "sum")
.groupBy('id)
.agg(collect_list('sum) as "summed_values")
The below single-line solution worked for me
ds.groupBy("Country").agg(array((0 until n).map(i => sum(col("Values").getItem(i))) :_* ) as "Values")

PostgreSQL: retrieving multiple array elements

Let's say we have a query like:
SELECT regexp_split_to_array('foo,bar', ',');
Results:
+-----------------------+
| regexp_split_to_array |
+-----------------------+
| {foo,bar} |
+-----------------------+
(1 row)
To access a single element of an array we can use code like:
SELECT (regexp_split_to_array('foo,bar', ','))[1];
Which will return:
+-----------------------+
| regexp_split_to_array |
+-----------------------+
| foo |
+-----------------------+
(1 row)
Or use slices like:
SELECT (regexp_split_to_array('foo,bar', ','))[2:];
Result:
+-----------------------+
| regexp_split_to_array |
+-----------------------+
| {bar} |
+-----------------------+
(1 row)
However, when I try to access 2 elements at once, like:
SELECT (regexp_split_to_array('foo,bar', ','))[1,2];
or
SELECT (regexp_split_to_array('foo,bar', ','))[1][2];
or any other syntax, I receive an error:
ERROR: syntax error at or near ","
Is it possible to retrieve two different and not adjacent elements of an array in PostgreSQL?
Extracting multiple elements through a select from an array should either mean you can have them returned as multiple columns or all those elements part of a single array.
This returns you one column as an array of the two elements.
knayak=# select ARRAY[arr[1],arr[2]] FROM regexp_split_to_array('foo,bar', ',') as arr;
array
-----------
{foo,bar}
(1 row)
..and this simply gives you the two elements as columns.
knayak=# select arr[1],arr[2] FROM regexp_split_to_array('foo,bar', ',') as arr;
arr | arr
-----+-----
foo | bar
(1 row)
The colon ':' in the array indexer does allow you to access multiple elements as a from-thru.
select (array[1,2,3,4,5])[2:4]
returns
{2,3,4}
This would work in your example above, but not if 1 an 2 weren't next to each other. If that's the case, the suggestion from #KaushikNayak is the only way I could think of.
Using your example:
SELECT (regexp_split_to_array('foo,bar', ','))[1:2]

sort 2d Array re-order the first column

For example:
Array
ID | Primary | Data2
------------------
1 | N | Something 1
2 | N | Something 2
3 | Y | Something 3
I'm trying to sort it based on the primary column and I want the "Y" to show first. It should bring all the other column at the top.
The end result would be:
Sorted Array
ID | Primary | Data2
------------------
3 | Y | Something 3
1 | N | Something 1
2 | N | Something 2
Is there a pre-made function for that. If not, how do we do this?
It is declared like this:
Dim Array(,) As String
regards,
I like using LINQ's OrderBy and ThenBy to order collections of objects. You just pass in a selector function to use to order the collections. For example:
orderedObjs = objs.OrderByDescending(function(x) x.isPrimary).ThenBy(function(x) x.id).ToList()
This code orders a collection first by the .isPrimary boolean, then by the id. Finally, it immediately evaluates the query into a List and assigns it to some variable.
Demo
There's a similar C# question whose solution applies just as well to VB. In short, you can use an overload of Array.Sort if you first split your 2D array into separate (1D) arrays:
Dim Primary() As String
Dim Data2() As String
// ...
Array.Sort(Primary,Data2)
This would reorder Data2 according to the Y/N sort of Primary, after which point you could then recombine them into a 2D array.

Resources