POSTGRESQL: Importing CSV file with an array column - arrays

Is there a way to import a CSV file with an array column to POSTGRESQL with the data below?
genres
['documentation']
['crime', 'drama']
['comedy', 'fantasy']
['comedy']
['horror']
['comedy', 'european']
['thriller', 'crime', 'action']
['drama', 'music', 'romance', 'family']

Related

Create 'granular' date partitions (year, month, date) in S3 parquet folders from a single date column in AWS Wrangler

I am using data wrangler to upload data from a dataframe into S3 bucket parquet files, and am trying to get it in a 'Hive'-like folder structure of:
prefix
- year=2022
-- month=08
--- day=01
--- day=02
--- day=03
In the following code example:
import awswrangler as wr
import pandas as pd
wr.s3.to_parquet(
df=pd.DataFrame({
'date': ['2022-08-01', '2022-08-02', '2022-08-03'],
'col2': ['A', 'A', 'B']
}),
path='s3://bucket/prefix',
dataset=True,
partition_cols=['date'],
database='default'
)
The resulting s3 folder structure would be:
prefix
- date=2022-08-01
- date=2022-08-02
- date=2022-08-03
The Sagemaker feature store ingest function (https://sagemaker.readthedocs.io/en/stable/api/prep_data/feature_store.html) sort of does this automatically with the event_time_feature_name column (timestamp) automatically creating the Hive file structure in S3.
How can I do this with Data Wrangler without creating 3 additional columns from the 1 column and declaring them as partitions, but put in 1 column and have the partitions by year month and day automatically created?

How to create a table in Hive with a column of data type MAP(VARCHAR,ARRAY[MAP(VARCHAR,VARCHAR)]

I am trying to create a table which has a complex data type. And the data types are listed below.
map<string, array <map <String,String>>> , the data I am looking is as follows
{'tags': [{'type': 'type1', 'value': 'value1'}, {'type': 'type1', 'value': 'value1'}]}
Kindly help me with create table in Hive as well as inserting values to this Hive table.

Load JSON Data into Snow flake table

My Data is follows:
[ {
"InvestorID": "10014-49",
"InvestorName": "Blackstone",
"LastUpdated": "11/23/2021"
},
{
"InvestorID": "15713-74",
"InvestorName": "Bay Grove Capital",
"LastUpdated": "11/19/2021"
}]
So Far Tried:
CREATE OR REPLACE TABLE STG_PB_INVESTOR (
Investor_ID string, Investor_Name string,Last_Updated DATETIME
); Created table
create or replace file format investorformat
type = 'JSON'
strip_outer_array = true;
created file format
create or replace stage investor_stage
file_format = investorformat;
created stage
copy into STG_PB_INVESTOR from #investor_stage
I am getting an error:
SQL compilation error: JSON file format can produce one and only one column of type variant or object or array. Use CSV file format if you want to load more than one column.
You should be loading your JSON data into a table with a single column that is a VARIANT. Once in Snowflake you can either flatten that data out with a view or a subsequent table load. You could also flatten it on the way in using a SELECT on your COPY statement, but that tends to be a little slower.
Try something like this:
CREATE OR REPLACE TABLE STG_PB_INVESTOR_JSON (
var variant
);
create or replace file format investorformat
type = 'JSON'
strip_outer_array = true;
create or replace stage investor_stage
file_format = investorformat;
copy into STG_PB_INVESTOR_JSON from #investor_stage;
create or replace table STG_PB_INVESTOR as
SELECT
var:InvestorID::string as Investor_id,
var:InvestorName::string as Investor_Name,
TO_DATE(var:LastUpdated::string,'MM/DD/YYYY') as last_updated
FROM STG_PB_INVESTOR_JSON;

How to import CSV file into TDengine database

I have a CSV file and I want to import my data into the TDengine database. Are there any tutorials for data importing?
you can use following sql:
INSERT INTO table_name FILE '/tmp/csvfile.csv';
INSERT INTO table_name USING super_table_name TAGS ('Beijing.Chaoyang', 2) FILE '/tmp/csvfile.csv';
INSERT INTO table_name_1 USING super_table_name TAGS ('Beijing.Chaoyang', 2) FILE '/tmp/csvfile_21001.csv'
table_name_2 USING super_table_name (groupId) TAGS (2) FILE '/tmp/csvfile_21002.csv';
you can find more details from here Taos SQL

neo4j cypher store array property during csv import

I need to import data from a csv of the form
id;name;targetset
1;"somenode",[1,3,5,8]
2,"someothernode",[3,8]
into the graph and I need to have targetset stored as collection (array) using cypher. I tried
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/mytable.csv" AS row FIELDTERMINATOR ';'
CREATE (:MyNode {id: row.id, name: row.name, targetset: row.targetset});
but it stores targetset as a string, e.g. "[1,3,5,8]". There does not seem to be a function to convert array-encoding-strings to actual arrays, like there is toInt to convert strings to integers. Is there still another possibility?
APOC Procedures will be your best bet here. Use the function apoc.convert.fromJsonList().
An example of use:
WITH "[1,3,5,8]" as arr
RETURN apoc.convert.fromJsonList(arr)
You can try this:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/mytable.csv" AS row FIELDTERMINATOR ';'
CREATE (:MyNode {id: row.id, name: row.name, targetset: split(substring(row.targetset, 1, length(row.targetset) - 2), ',') });
The above code remove the [ and ] chars from the string [1,3,5,8] using substring() and length() functions. After the string 1,3,5,8is splited considering , as separator.

Resources