Search Arrays in Hive - arrays

I need to search an array function for specific values in hive.
I have a table that creates a row for each event. I used collect_set(event_id) to roll the table up to the person level. I now need to search the array to put users into separate categories. I tried using a case when statement, find_in_set, and in_file but none of those functions work on arrays.
Any ideas? Can I convert an array into a string? Ideally I would have used a group_concat function in SQL - but hive does not support it

ARRAY_CONTAINS(events, search_param) will return a Boolean if search_param is found in the events array.
It's available in Hive 0.7.1, and possibly earlier versions.

Related

How to transform a list of numbers into seperate elements talend tmap

I have a list of codes for each state as an input stored in a table.
What I want as output is this , using tmap transformations
This is the job I made but it doesn't seem to work correctly like I want. The output should have 1000 rows.
Does anybody know how to solve this?
One way to do this is to convert your list to a string of comma-separated values (using tConvertType for instance) and then use tNormalize to split this string into individual rows.

SQL Server: STRING_SPLIT() result in a computed column

I couldn't find good documentation on this, but I have a table that has a long string as one of it's columns. Here's some example data of what it looks like:
Hello:Goodbye:Apple:Orange
Example:Seagull:Cake:Chocolate
I would like to create a new computed column using the STRING_SPLIT() function to return the third value in the string table.
Result #1: "Apple"
Result #2: "Cake"
What is the proper syntax to achieve this?
At this time your answer is not possible.
The output rows might be in any order. The order is not guaranteed to
match the order of the substrings in the input string.
STRING_SPLIT reference
There is no way to guarantee which item was the third item in the list using string_split and the order may change without warning.
If you're willing to build your own, I'd recommend reading up on the work done by
Brent Ozar and Jeff Moden.
You shouldn't be storing data like that in the first place. This points to a potentially serious database design problem. BUT you could convert this string into JSON by replacing : with ",", surround it with [" and "] and retrieve the third array element , eg :
declare #value nvarchar(200)='Example:Seagull:Cake:Chocolate'
select json_value('["' + replace(#value,':','","' )+ '"]','$[2]')
The string manipulations convert the string value to :
["Example","Seagull","Cake","Chocolate"]
After that, JSON_VALUE parses the JSON string and retrieves the 3rd item in the array using a JSON PATH expression.
Needless to say, this will be slow and can't take advantage of indexing. If those values are meant to be read or written individually, they should be stored in separate columns. They'll probably take less space than one long string.
If you have a lot of optional fields but only a subset contain values at any time, you could use sparse columns. This way you could have thousands of rows, only a few of which would contain data at any time

How to return an array matching a certain criteria in Excel?

I'm trying to parse an XML file in Excel, which is a Japanese dictionary. It contains several translations of each entry into different languages, and some entries have multiple translations per language. I want to write a formula that finds all of the translations by their language code, returns them as an array, and concatenates them using a TEXTJOIN formula. But I don't know how to go about this in Excel.
In Google Sheets, this would be easily solved by a FILTER function, but I can't use Sheets as there's too much data, and I haven't managed to get access to the beta FILTER function yet.
In the below picture, I'm trying to return the values in the <gloss xml:lang*> column, by searching for the values in the lang column. So for example, I want to return all values which have a "dut" next to them, and concatenate those into a single line using TEXTJOIN.
Any idea how I could go about doing this?
I fixed this by downloading the FILTER function. This is part of the Office Insider program, which releases Beta features if you choose to participate. You can access the Insider program by going to File > Account > Office Insider. Then to update your Office version go to File > Account > Office Updates to install the Insider update.
To filter the list by the "lang" column the formula looked like:
=FILTER([range in H column], [range in I column]=T$2)
I haven't specified either range because I used a formula-defined range using the INDIRECT function to avoid filtering through a million rows. The H range is what I want in the results of the filter, the I range is what I want to filter by - the "lang" code. T$2 represents the "lang" code, in this case "dut", and when I copy it across it will filter by each of the 8 lang codes in Row 2.
Then I used TEXTJOIN to combine it the array result into one column using the comma separator:
=TEXTJOIN(", ", TRUE, FILTER(...))

Postgres: is there any row_to_json equivalent that returns values only?

In a project I'm working on, I need to stream potentially large data sets from a Postgres database to the client, for analytics purposes.
The application is built in Rails (irrelevant for this question) and after a bit of research I'm currently able to stream query results by using COPY in Postgres:
COPY (SELECT row_to_json(t) from (#{query}) t) TO STDOUT;
Sources (for who's interested):
https://shift.infinite.red/fast-csv-report-generation-with-postgres-in-rails-d444d9b915ab
https://github.com/brianhempel/stream_json_demo
This works, but it yields every row as a key-value pair, e.g.:
["{\"id\":403457,\"email\":\"email403457#example.com\",\"first_name\":\"Firstname403457\",\"last_name\":\"Lastname403457\",\"source\":\"adwords\",\"created_at\":\"2015-08-05T22:43:07.295796\",\"updated_at\":\"2017-01-19T04:48:29.464051\"}"]
In the spirit of minimising the size (in bytes) of the response and especially since this is getting served through the web, I want to return just an array of values for every row, i.e.:
["[403457, \"email403457#example.com\", \"Firstname403457\", \"Lastname403457\", \"adwords\", \"2015-08-05T22:43:07.295796\", \"2017-01-19T04:48:29.464051\"]"]
Is there a way to achieve this within Postgres, even by nesting functions, starting from the query above?
You could create a simple SQL function that converts a row into the desired format:
CREATE FUNCTION row2json(anyelement) RETURNS json
LANGUAGE sql STABLE AS
'SELECT json_agg(z.value) FROM json_each(row_to_json($1)) z';
Then you use that to transform the output:
SELECT row2json(mytab) FROM mytab;
If performance is more important than JSON output, just cast the result to a string:
SELECT CAST(mytab AS text) FROM mytab;

Database access excel formatting or if statement or grouping

I have a database formatting problem in which I am trying to concatenate column "B" rows based on column "A" rows. Like So:
https://docs.google.com/spreadsheet/ccc?key=0Am8J-Fv99YModE5Va3hLSFdnU0RibmQwNVFNelJCWHc
Sorry I couldn't post a picture. I don't have enough reputation points YET. I'LL Get them eventually though
So I'd like to solve this problem within Excel or Access. Its currently an access database, but I can export it to excel easily. As you can see, I want to find "userid" in column A and where there are multiple column A's such as "shawn" I'd like to combine the multiple instances of shawn and concatenate property num as such.
Even though there are multiple instances of column A still, I could just filter all unique instances of the table later. My concern is how to concatenate column B with a "|" in the middle if column A has multiple instances.
This is just a segment of my data (There is a lot more), so I would be very thankful for your help.
The pseudo code in my head so far is:
If( Column A has more than one instance)
Then Concatenate(Column B with "#"+ "|" +"#")
I'm also wondering if there is a way to do this on access with grouping.
Well Anyways, PLEASE HELP.
In excel we can achieve it easily by custom function in vba module. Hopefully using vba(Macros) is not an issue for you.
Here is the code for the function which can be added in vba. (Press Alt+F11, this will take you to visual editor, right click the project and add a module. Add the below code in module)
Public Function ConcatenatePipe(ByVal lst As Range, ByVal values As Range, ByVal name As Range) As String
ConcatenatePipe = ""
Dim i As Integer
For i = 1 To lst.Count
If name.Value = lst.Item(i).Value Then ConcatenatePipe = ConcatenatePipe & "|" & values.Item(i).Value
Next
ConcatenatePipe = Mid(ConcatenatePipe, 2)
End Function
This function you can use in excel in F Column of your example. Copy the below formulla in F2 and the copy paste the cell to rest of F column. =ConcatenatePipe($A$2:$A$20,$B$2:$B$20,E2)
I believe you can solve this with an SQL GROUP BY function. At least, here's how I'd do it in MySQL or similar:
SELECT userid, GROUP_CONCAT(propertynum SEPARATOR '|') FROM Names GROUP BY userid
as described in this stack overflow post: How to use GROUP BY to concatenate strings in MySQL?
Here's a link on how to use SQL in MS Access: http://www.igetit.net/newsletters/Y03_10/SQLInAccess.aspx
Unfortunately there is not a GROUP_CONCAT function in MSAccess, but this other SO post explains some ways round that: is there a group_concat function in ms-access?

Resources