How to load file on AgensGraph? - agens-graph

I tried to load comma separated file on agensgraph.
But agensgraph does not have load utility on package.
How can I load file on agensgraph?

You can use "Foreign Data Wrapper", Instead of utility.
First, create file few extension.
agens=# CREATE EXTENSION file_fdw;
CREATE EXTENSION
Second, create server object.
agens =# CREATE SERVER graph_import FOREIGN DATA WRAPPER file_fdw;
CREATE SERVER
Next, create foreign table with file.
agens =# CREATE FOREIGN TABLE fdwSample
agens-# (
agens(# id INT8,
agens(# name VARCHAR(256)
agens(# )
agens-# SERVER graph_import
agens-# OPTIONS
agens-# (
agens(# FORMAT 'csv',
agens(# HEADER 'false',
agens(# DELIMITER ',',
agens(# NULL '',
agens(# FILENAME 'sample.dat'
agens(# );
CREATE FOREIGN TABLE
Last, load file use "LOAD" clause.
agens=# LOAD FROM fdwSample AS sample
agens-# CREATE (:node {id:sample.id,name:sample.name});
GRAPH WRITE (INSERT VERTEX 2, INSERT EDGE 0)
After all, you can find loaded data.
agens =# MATCH (n:node) RETURN n;
n
-------------------------------------
node[3.1]{"id": 1, "name": "steve"}
node[3.2]{"id": 2, "name": "bill"}
(2 rows)
Good luck.

Related

Error parsing JSON exception for xml filed in copy command Snowflake

Hi I have declared a table like this
create or replace table app_event (
ID varchar(36) not null primary key,
VERSION number,
ACT_TYPE varchar(255),
EVE_TYPE varchar(255),
CLI_ID varchar(36),
DETAILS variant,
OBJ_TYPE varchar(255),
DATE_TIME timestamp,
AAPP_EVENT_TO_UTC_DT timestamp,
GRO_ID varchar(36),
OBJECT_NAME varchar(255),
OBJ_ID varchar(255),
USER_NAME varchar(255),
USER_ID varchar(255),
EVENT_ID varchar(255),
FINDINGS varchar(255),
SUMMARY variant
);
DETAILS column will contain xml file so that i can run xml function and get element of that xml file .
My sample rows looks like this
dfjkghdfkjghdf8gd7f7997,0,TEST_CASE,CHECK,74356476476DFD,<?xml version="1.0" encoding="UTF-8"?><testPayload><testId>3495864795uiyiu</testId><testCode>COMPLETED</testCode><testState>ONGOING</testState><noOfNewTest>1</noOfNewTest><noOfReviewRequiredTest>0</noOfReviewRequiredTest><noOfExcludedTest>0</noOfExcludedTest><noOfAutoResolvedTest>1</noOfAutoResolvedTest><testerTypes>WATCHLIST</testerTypes></testPayload>,CASE,41:31.3,NULL,948794853948dgjd,(null),dfjkghdfkjghdf8gd7f7997,test user,dfjkghdfkjghdf8gd7f7997,NULL,(null),(null)
When i declare DETAILS as varchar i am able to load file but when i declare this as variant i get below error for that column only
Error parsing JSON:
dfjkghdfkjghdf8gd7f7997COMPLETED</status
File 'SNOWFLAKE/Sudarshan.csv', line 1, character 89 Row 1, column
"AUDIT_EVENT"["DETAILS":6]
Can you please help on this ?
I can not use varchar as i need to query element of xml also in my query .
This is how i load into table and i use default CSV format ,file is available in S3 .
COPY INTO demo_db.public.app_event
FROM #my_s3_stage/
FILES = ('app_Even.csv')
file_format=(type='CSV');
Based on Answer this is how i am loading
copy into demo_db.public.app_event from (
select
$1,$2,$3,$4,$5,
parse_xml($6),$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,parse_xml($17)
from #~/Audit_Even.csv d
)
file_format = (
type = CSV
)
But when i execute it says zero row processed and no mentioned stage here
If you are using a COPY INTO statement then you need to put in a subquery to convert the data before loading it into the table. Use the parse_xml within your copy statement's subquery, something like this:
copy into app_event from (
select
$1,
parse_xml($2) -- <---- "$2" is the column number in the CSV that contains the xml
from #~/test.csv.gz d -- <---- This is my own internal user stage. You'll need to change this to your external stage or whatever
)
file_format = (
type = CSV
)
It is hard to provide you with a good SQL statement without a full example of your existing code (your copy / insert statement). In my example above, I'm copying a file in my own user stage (#~/test.csv.gz) with the default CSV file format options. You are likely using an external stage but it should be easy to adapt this to your own example.

another case of "Table options do not contain an option key 'connector'"

I have read the link Table options do not contain an option key 'connector'
it said we should set the format.
But My Scene is datagen->hive.
Here's my completed example(it's wrong Now)
drop table if exists datagen;
CREATE TABLE datagen (
f_sequence INT,
f_random INT,
f_random_str STRING,
ts AS localtimestamp,
WATERMARK FOR ts AS ts
) WITH (
'connector' = 'datagen',
-- optional options --
'rows-per-second'='5',
'fields.f_sequence.kind'='sequence',
'fields.f_sequence.start'='1',
'fields.f_sequence.end'='50',-- 这个地方限制了一共会产生的条数
'fields.f_random.min'='1',
'fields.f_random.max'='50',
'fields.f_random_str.length'='10'
);
SET table.sql-dialect=hive;
drop table if exists hive_table;
CREATE TABLE hive_table (
f_sequence INT,
f_random INT,
f_random_str STRING
) PARTITIONED BY (dt STRING, hr STRING, mi STRING) STORED AS parquet TBLPROPERTIES (
'partition.time-extractor.timestamp-pattern'='$dt $hr:$mi:00',
'sink.partition-commit.trigger'='partition-time',
'sink.partition-commit.delay'='1 min',
'sink.partition-commit.policy.kind'='metastore,success-file'
);
Flink SQL> insert into hive_table select f_sequence,f_random,f_random_str ,DATE_FORMAT(ts, 'yyyy-MM-dd'), DATE_FORMAT(ts, 'HH') ,DATE_FORMAT(ts, 'mm') from datagen;
[INFO] Submitting SQL update statement to the cluster...
[ERROR] Could not execute SQL statement. Reason:
org.apache.flink.table.api.ValidationException: Table options do not contain an option key 'connector' for discovering a connector.
Is the solution from above link suitable for this case?
Need your help,Thanks~!
please use SET table.sql-dialect=default; before call insert into hive_table..., the statement insert into hive_table... using datagen connector which hive dialect should not support.

SQL Data Warehouse External Table with String fields

I am unable to find a way to create an external table in Azure SQL Data Warehouse (Synapse SQL Pool) with Polybase where some fields contain embedded commas.
For a csv file with 4 columns as below:
myresourcename,
myresourcelocation,
"""resourceVersion"": ""windows"",""deployedBy"": ""john"",""project_name"": ""test_project""",
"{ ""ResourceType"": ""Network"", ""programName"": ""v1""}"
Tried with the following Create External Table statements.
CREATE EXTERNAL FILE FORMAT my_format
WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS(
FIELD_TERMINATOR=',',
STRING_DELIMITER='"',
First_Row = 2
)
);
CREATE EXTERNAL TABLE my_external_table
(
resourceName VARCHAR,
resourceLocation VARCHAR,
resourceTags VARCHAR,
resourceDetails VARCHAR
)
WITH (
LOCATION = 'my/location/',
DATA_SOURCE = my_source,
FILE_FORMAT = my_format
)
But querying this table gives the following error:
Failed to execute query. Error: HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: HadoopExecutionException: Too many columns in the line.
Any help will be appreciated.
Currently this is not supported in polybase, need to modify the input data accordingly to get it working.

How to make the 'public' schema default in a Scala Play project that uses PostgreSQL?

I am not sure if my issue connecting to the Scala Play 2.5.x Framework or to PostgreSQL so I am going to describe my setup.
I am using the Play 2.5.6 with Scala and PostgreSQL 9.5.4-2 from the BigSQL Sandboxes. I use the Play Framework default evolution package to manage the DB versions.
I created a new database in BigSQL Sandbox's PGSQL and PGSQL created a default schema called public. I use this schema for development.
I would like to create a table with the following script (1.sql in DB evolution config):
# Initialize the database
# --- !Ups
CREATE TABLE user (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
email TEXT NOT NULL,
creation_date TIMESTAMP NOT NULL
);
# --- !Downs
DROP TABLE user;
Besides that I would like to read the table with a code like this:
val resultSet = statement.executeQuery("SELECT id, name, email FROM public.user WHERE id=" + id.toString)
I got an error if I would like to execute any of the mentioned code or even if I use the CREATE TABLE... code in pgadmin. The issue is with the user table name. If I prefix it with public (i.e. public.user) everything works fine.
My questions are:
Is it normal to prefix the table name with the schema name every time? It seems to odd to me.
How can I make the public schema a default option so I do not have to qualify the table name? (e.g. CREATE TABLE user (...); will not throw an error)
I tried the following:
I set the search_path for my user: ALTER USER my_user SET search_path to public;
I set the search_path for my database: ALTER database "my_database" SET search_path TO my_schema;
search_path correctly shows this: "$user",public
I got the following errors:
In Play: p.a.d.e.DefaultEvolutionsApi - ERROR: syntax error at or near "user"
In pgadmin:
ERROR: syntax error at or near "user"
LINE 1: CREATE TABLE user (
********** Error **********
ERROR: syntax error at or near "user"
SQL state: 42601
Character: 14
This has nothing to do with the default schema. user is a reserved word.
You need to use double quotes to be able to create such a table:
CREATE TABLE "user" (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
email TEXT NOT NULL,
creation_date TIMESTAMP NOT NULL
);
But I strongly recommend not doing that. Find a different name that does not require a quoted identifier.

create db2 index on non-partitioned table in different tablespace

I have created Non partitined table in db2.
create table test (name varchar (22), cell# integer);
table created succesffully.
Now I want to create index on test table in tablespace TEST_IDX.
I execute following query
CREATE INDEX test1 ON test (cell#) in TEST_IDX.
It give me following error:
[CREATE - 0 row(s), 0.000 secs] [Error Code: -109, SQL State: 42601] DB2 SQL Error: SQLCODE=-109, SQLSTATE=42601, SQLERRMC=IN
db2 database version is DB2/LINUXZ64 9.7.3
I think you will have to specify that for the table, i.e.:
create table test (
...
) in <tblspc> index in TEST_IDX
See ADMIN_MOVE_TABLE ( http://pic.dhe.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.dm.doc/doc/t0054864.html?resultof=%22%6d%6f%76%65%5f%74%61%62%6c%65%22%20 ) for more information.
Index tablespace should be defined during table creation, if not mentioned index will be created in same tablespace where table is exist.

Resources