Hive - Extract arrays from Json - arrays

I have table which contains some array of values.
create external table apidetails
(
inputdetails sting
)
Location 'XXXX'
select inputdetails from apidetails
{ "Name": "api-server1", "ID": "api-1", "tags": ["tag-1","tag-2"] }
I need the results as
| ID | tags |
|-------|---------------|
|api-1| tag-1,tag-2|
I tried select json_extract_scalar(inputdetails ,'$tags'), but it returns errors.

Here are few options:
Option 1: JSON
select json_extract_scalar(inputdetails ,'$.ID') as ID
,json_extract(inputdetails ,'$.tags') as tags
from apidetails
;
ID | tags
-------+-------------------
api-1 | ["tag-1","tag-2"]
Option 2: array(varchar)
select json_extract_scalar(inputdetails ,'$.ID') as ID
,cast(json_extract(inputdetails ,'$.tags') as array(varchar)) as tags
from apidetails
;
ID | tags
-------+----------------
api-1 | [tag-1, tag-2]
Option 3: delimited string
select json_extract_scalar(inputdetails ,'$.ID') as ID
,array_join(cast(json_extract(inputdetails ,'$.tags') as array(varchar)),',') as tags
from apidetails
;
ID | tags
-------+-------------
api-1 | tag-1,tag-2

Related

Store a list of values as a string when creating a table in snowflake

I am trying to create a table with 5 columns. COLUMN #2 (PROGRESS) is a comma seperated list (i.e 1,2,3,4 etc.) but when trying to create this table as either a string, variant or varchar, Snowflake refuses to allow this. Any advice on how I can create a column seperated list from a CSV? I tried to import the data as a TSV, XML, as well as a JSON file but no success.
create or replace TABLE AD_HOC.TEMP.NEW_DATA (
VISITOR_ID VARCHAR(16777216),
PROGRESS VARCHAR(16777216),
DATE DATETIME,
ROLE VARCHAR(16777216),
FIRST_VISIT DATETIME
)COMMENT='Interaction data'
;
Goal:
VISITOR_ID | PROGRESS | DATE | ROLE | FIRST_VISIT
111 | [1,2,3] | 1/1/2022 | OWNER | 1/1/2021
123 | [1] | 1/2/2022 | ADMIN | 2/2/2021
23321 | [1,2,3,4] | 2/22/2022 | USER | 3/12/2021
I encoded the column in python and loaded the data in Snowflake!
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
df = doc_data.join(pd.DataFrame(mlb.fit_transform(doc_data.pop('PROGRESS')),
columns=mlb.classes_,
index=doc_data.index))
df

How to transform data when we have comma separated values in csv format file in snowflake

I have an excel csv format data set with the following data:
Columns: id, product_name, sales, quantity, Profit
Data: 1, "Novimex Executive Leather Armchair, Black","$3,709.40", 9, -$288.77
When I am trying to insert these records from stage to snowflake table, data is getting shifted from product name column because we have comma separated , Black and similarly for following columns data are getting shifted. After loading the data it is looking like as per below:
+----+-------------------------------------+--------+----------+---------+
| id | product_name | sales | quantity | Profit |
+----+-------------------------------------+--------+----------+---------+
| 1 | "Novimex Executive Leather Armchair | Black" | $3 | 709.40" |
+----+-------------------------------------+--------+----------+---------+
Query used:
copy into orders_staging (id,Product_Name,Sales,Quantity,Profit)
from
(select $1,$2,$3,$4,$5
from #sales_data_stage)
file_format = (type = csv field_delimiter = ',' skip_header = 1 ENCODING = 'iso-8859-1');
Use Field Enclosure.
FIELD_OPTIONALLY_ENCLOSED_BY='"'
If you have any issues with accounting styled numbers, remember to put " " around them too.
https://community.snowflake.com/s/question/0D50Z00008pDcoRSAS/copying-csv-files-delimited-by-commas-where-commas-are-also-enclosed-in-strings
Additional documentation for Copy To
https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html#type-csv
Additional documentation on the Create File
https://docs.snowflake.com/en/sql-reference/sql/create-file-format.html

Loading CSV data to Snowflake table

Column splits into multiple columns when trying to load the following data in to SnowFlake table since its CSV file.
Column Data :
{"Department":"Mens
Wear","Departmentid":"10.1;20.1","customername":"john4","class":"tops wear","subclass":"sweat shirts","product":"North & Face 2 Bangle","style":"Sweat shirt hoodie - Large - Black"}
Is there any other way to load the data in to single column.
The best solution would be use a different delimiter instead of comma in your CSV file. If it's not possible, then you can ingest the data using a non-existing delimiter to get the whole line as one column, and then parse it. Of course it won't be as effective as native loading:
cat test.csv
1,2020-10-12,Gokhan,{"Department":"Mens Wear","Departmentid":"10.1;20.1","customername":"john4","class":"tops wear","subclass":"sweat shirts","product":"North & Face 2 Bangle","style":"Sweat shirt hoodie - Large - Black"}
create file format csvfile type=csv FIELD_DELIMITER='NONEXISTENT';
select $1 from #my_stage (file_format => csvfile );
create table testtable( id number, d1 date, name varchar, v variant );
copy into testtable from (
select
split( split($1,',{')[0], ',' )[0],
split( split($1,',{')[0], ',' )[1],
split( split($1,',{')[0], ',' )[2],
parse_json( '{' || split($1,',{')[1] )
from #my_stage (file_format => csvfile )
);
select * from testtable;
+----+------------+--------+-----------------------------------------------------------------+
| ID | D1 | NAME | V |
+----+------------+--------+-----------------------------------------------------------------+
| 1 | 2020-10-12 | Gokhan | { "Department": "Mens Wear", "Departmentid": "10.1;20.1", ... } |
+----+------------+--------+-----------------------------------------------------------------+

Check a value in an array inside a object json in PostgreSQL 9.5

I have an json object containing an array and others properties.
I need to check the first value of the array for each line of my table.
Here is an example of the json
{"objectID2":342,"objectID1":46,"objectType":["Demand","Entity"]}
So I need for example to get all lines with ObjectType[0] = 'Demand' and objectId1 = 46.
This the the table colums
id | relationName | content
Content column contains the json.
just query them? like:
t=# with table_name(id, rn, content) as (values(1,null,'{"objectID2":342,"objectID1":46,"objectType":["Demand","Entity"]}'::json))
select * From table_name
where content->'objectType'->>0 = 'Demand' and content->>'objectID1' = '46';
id | rn | content
----+----+-------------------------------------------------------------------
1 | | {"objectID2":342,"objectID1":46,"objectType":["Demand","Entity"]}
(1 row)

Hive lateral view not working AWS Athena

Im working on a process of AWS Cloudtrail log analysis, Im getting stuck in extract JSON from a row,
This is my table definition.
CREATE EXTERNAL TABLE cloudtrail_logs (
eventversion STRING,
eventName STRING,
awsRegion STRING,
requestParameters STRING,
elements STRING ,
additionalEventData STRING
)
ROW FORMAT SERDE 'com.amazon.emr.hive.serde.CloudTrailSerde'
STORED AS INPUTFORMAT 'com.amazon.emr.cloudtrail.CloudTrailInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://XXXXXX/CloudTrail'
If I run select elements from cl1 limit 1 it returns this result.
{"groupId":"sg-XXXX","ipPermissions":{"items":[{"ipProtocol":"tcp","fromPort":22,"toPort":22,"groups":{},"ipRanges":{"items":[{"cidrIp":"0.0.0.0/0"}]},"prefixListIds":{}}]}}
I need to show this result as virtual columns like,
| groupId | ipProtocol | fromPort | toPort| ipRanges.items.cidrIp|
|---------|------------|--------- | ------|-----------------------------|
| -1 | 0 | | | |
Im using AWS Athena and I tried Lateral view and get_json_object is not working in AWS.
its an external table
select json_extract_scalar(i.item,'$.ipProtocol') as ipProtocol
,json_extract_scalar(i.item,'$.fromPort') as fromPort
,json_extract_scalar(i.item,'$.toPort') as toPort
from cloudtrail_logs
cross join unnest (cast(json_extract(elements,'$.ipPermissions.items')
as array(json))) as i (item)
;
ipProtocol | fromPort | toPort
------------+----------+--------
"tcp" | 22 | 22

Resources