My snowsql connection locked when trying to connect and run config - snowflake-cloud-data-platform

I am trying to set up a Snowpipe, and I have created my warehouse, database and table and am trying to stage the filew with snowsql.
USE WAREHOUSE IoT;
USE DATABASE SNOWPIPE_TEST;
CREATE OR REPLACE STAGE my_stage;
CREATE OR REPLACE FILE_FORMAT r_json;
CREATE OR REPLACE PIPE snowpipe_pipe
AUTO_INGEST = TRUE,
COMMENT = 'add items IoT',
VALIDATION_MODE = RETURN_ALL_ERRORS
AS (COPY INTO snowpipe_test.public.mytable
from #snowpipe_db.public.my_stage
FILE_FORMAT = (type = 'JSON');
CREATE PIPE mypipe AS COPY INTO mytable FROM #my_stage;
I think something is locked but I am not sure.
I tried to save the config file as config1 and made a copy. It hung, then I remove the copy and tried to connect and there was no error, it just hung
Am I missing something?

To specify the auto ingest parameter it's AUTO_INGEST rather than AUTO-INGEST, but note that this option is not available for an internal stage. So when you try to run this command using an internal stage it should error with a message pointing this out.
https://docs.snowflake.net/manuals/sql-reference/sql/create-pipe.html#optional-parameters
Also you don't need the bracket between the "AS" and "copy" on line 5.

Related

Snowflake ON_ERROR=CONTINUE abort the COPY command for file

Snowkflake documentation for COPY INTO command states (for COPY options)
ON_ERROR = CONTINUE | SKIP_FILE | SKIP_FILE_num | SKIP_FILE_num% | ABORT_STATEMENT
Continue loading the file. The COPY statement returns an error message
for a maximum of one error encountered per data file. Note that the
difference between the ROWS_PARSED and ROWS_LOADED column values
represents the number of rows that include detected errors. However,
each of these rows could include multiple errors. To view all errors
in the data files, use the VALIDATION_MODE parameter or query the
VALIDATE function.
But for me, it just doesn't seem to obey, as I see the default value i.e SKIP_FILE is getting applied as files are getting skipped on any error in the file.
create or replace file format jsonThing type = 'json' DATE_FORMAT='yyyy-mm-dd'
TIMESTAMP_FORMAT='YYYY-MM-DD"T"HH24:MI:SSZ' TRIM_SPACE=TRUE NULL_IF=('\\N', 'NULL','');
create or replace stage snowflake_json_stage
storage_integration = snowflake_json_storage_integration
url = 'azure://snowflakejson.blob.core.windows.net/cdrs'
file_format = jsonThing
COPY_OPTIONS = (ON_ERROR=CONTINUE PURGE=TRUE MATCH_BY_COLUMN_NAME=CASE_INSENSITIVE)
COMMENT='The snowflake json stage';
CREATE or REPLACE PIPE SNOWFLAKE_JSON_PIPE
AUTO_INGEST = TRUE
integration = snowflake_json_notification_integration
as
COPY INTO purge.public.cdrs
from #SNOWFLAKE_JSON_STAGE
ON_ERROR=CONTINUE
MATCH_BY_COLUMN_NAME=CASE_INSENSITIVE;
Do ON_ERROR=CONTINUE options work with PIPE?
NOTE: The file is an NDJSON file.

In the tutorial "Tutorial: Bulk Loading from a local file system using copy" what is the difference between my_stage and my_table permissions?

I started to go through the first tutorial for how to load data into Snowflake from a local file.
This is what I have set up so far:
CREATE WAREHOUSE mywh;
CREATE DATABASE Mydb;
Use Database mydb;
CREATE ROLE ANALYST;
grant usage on database mydb to role sysadmin;
grant usage on database mydb to role analyst;
grant usage, create file format, create stage, create table on schema mydb.public to role analyst;
grant operate, usage on warehouse mywh to role analyst;
//tutorial 1 loading data
CREATE FILE FORMAT mycsvformat
TYPE = "CSV"
FIELD_DELIMITER= ','
SKIP_HEADER = 1;
CREATE FILE FORMAT myjsonformat
TYPE="JSON"
STRIP_OUTER_ARRAY = true;
//create stage
CREATE OR REPLACE STAGE my_stage
FILE_FORMAT = mycsvformat;
//Use snowsql for this and make sure that the role, db, and warehouse are seelcted: put file:///data/data.csv #my_stage;
// put file on stage
PUT file://contacts.csv #my
List #~;
list #%mytable;
Then in my active Snowsql when I run:
Put file:///Users/<user>/Documents/data/data.csv #my_table;
I have confirmed I am in the correct role Accountadmin:
002003 (02000): SQL compilation error:
Stage 'MYDB.PUBLIC.MY_TABLE' does not exist or not authorized.
So then I try to create the table in Snowsql and am successful:
create or replace table my_table(id varchar, link varchar, stuff string);
I still run into this error after I run:
Put file:///Users/<>/Documents/data/data.csv #my_table;
002003 (02000): SQL compilation error:
Stage 'MYDB.PUBLIC.MY_TABLE' does not exist or not authorized.
What is the difference between putting a file to a my_table and a my_stage in this scenario? Thanks for your help!
EDIT:
CREATE OR REPLACE TABLE myjsontable(json variant);
COPY INTO myjsontable
FROM #my_stage/random.json.gz
FILE_FORMAT = (TYPE= 'JSON')
ON_ERROR = 'skip_file';
CREATE OR REPLACE TABLE save_copy_errors AS SELECT * FROM TABLE(VALIDATE(myjsontable, JOB_ID=>'enterid'));
SELECT * FROM SAVE_COPY_ERRORS;
//error for random: Error parsing JSON: invalid character outside of a string: '\\'
//no error for generated
SELECT * FROM Myjsontable;
REMOVE #My_stage pattern = '.*.csv.gz';
REMOVE #My_stage pattern = '.*.json.gz';
//yay your are done!
The put command copies the file from your local drive to the stage. You should do the put to the stage, not that table.
put file:///Users/<>/Documents/data/data.csv #my_stage;
The copy command loads it from the stage.
But in document its mention like it gets created by default for every stage
Each table has a Snowflake stage allocated to it by default for storing files. This stage is a convenient option if your files need to be accessible to multiple users and only need to be copied into a single table.
Table stages have the following characteristics and limitations:
Table stages have the same name as the table; e.g. a table named mytable has a stage referenced as #%mytable
in this case without creating stage its should load into default Snowflake stage allocated

I am trying to run multiple query statements created when using the python connector with the same query id

I have created a Python function which creates multiple query statements.
Once it creates the SQL statement, it executes it (one at a time).
Is there anyway to way to bulk run all the statements at once (assuming I was able to create all the SQL statements and wanted to execute them once all the statements were generated)? I know there is an execute_stream in the Python Connector, but I think this requires a file to be created first. It also appears to me that it runs a single query statement at a time."
Since this question is missing an example of the file, here is a file content that I have provided as extra that we can work from.
//connection test file for python multiple queries
import snowflake.connector
conn = snowflake.connector.connect(
user = 'xxx',
password = '',
account = 'xxx',
warehouse= 'xxx',
database= 'TEST_xxx'
session_parameters = {
'QUERY_TAG: 'Rachel_test',
}
}
while(conn== true){
print(conn.sfqid)import snowflake.connector
try:
conn.cursor().execute("CREATE WAREHOUSE IF NOT EXISTS tiny_warehouse_mg")
conn.cursor().execute("CREATE DATABASE IF NOT EXISTS testdb_mg")
conn.cursor().execute("USE DATABASE testdb_mg")
conn.cursor().execute(
"CREATE OR REPLACE TABLE "
"test_table(col1 integer, col2 string)")
conn.cursor().execute(
"INSERT INTO test_table(col1, col2) VALUES " +
" (123, 'test string1'), " +
" (456, 'test string2')")
break
except Exception as e:
conn.rollback()
raise e
}
conn.close()
The reference to this question refers to a method that can be done with the file call, the example in documentation is as follows:
from codecs import open
with open(sqlfile, 'r', encoding='utf-8') as f:
for cur in con.execute_stream(f):
for ret in cur:
print(ret)
Reference to guide I used
Now when I ran these, they were not perfect, but in practice I was able to execute multiple sql statements in one connection, but not many at once. Each statement had their own query id. Is it possible to have a .sql file associated with one query id?
Is it possible to have a .sql file associated with one query id?
You can achieve that effect with the QUERY_TAG session parameter. Set the QUERY_TAG to the name of your .SQL file before executing it's queries. Access the .SQL file QUERY_IDs later using the QUERY_TAG field in QUERY_HISTORY().
I believe though you generated the .sql while executing in snowflake each statement will have unique query id.
If you want to run one sql independent to other you may try with multiprocessing/multi threading concept in python.
The Python and Node.Js libraries do not allow multiple statement executions.
I'm not sure about Python but for Node.JS there is this library that extends the original one and add a method call "ExecutionAll" to it:
snowflake-multisql
You just need to wrap multiple statements with the BEGIN and END.
BEGIN
<statement_1>;
<statement_2>;
END;
With these operators, I was able to execute multiple statement in nodejs

Oracle 11g External Table error

I'm trying to run a simple external table program using oracle 11g on Linux VM. The problem is that I can't query any data from .txt files.
Here's my code:
CONN / as sysdba;
CREATE OR REPLACE DIRECTORY DIR1 AS 'home/oracle/TEMP/X/';
GRANT READ, WRITE ON DIRECTORY DIR1 TO user;
CONN user/password;
CREATE TABLE gerada
(
field1 INT,
field2 Varchar2(20)
)
ORGANIZATION EXTERNAL
(
TYPE ORACLE_LOADER
DEFAULT DIRECTORY DIR1
ACCESS PARAMETERS
(
RECORDS DELIMITED BY NEWLINE
FIELDS TERMINATED BY ';'
MISSING FIELD VALUES ARE NULL
)
LOCATION ('registros.txt')
)
REJECT LIMIT UNLIMITED;
--Error starts here.
SELECT * FROM gerada;
DROP TABLE gerada;
DROP DIRECTORY DIR1;
Here's the error message:
ERROR at line 1:
ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
error opening file home/oracle/TEMP/X/GERADA_3375.log
And thats how registros.txt looks like:
1234;hello world;
I've checked my permissions on DIR1 and I do have read/write permissions.
Any ideas?
ORA-29913 and ORA-29400 mean that you're unable to access to directory and/or file.
Looking carefully at the CREATE DIRECTORY command it looks like the path you're using may be mis-formatted. Try putting a forward slash at the start of the path and removing the one at the end of the path when creating the directory - e.g. CREATE OR REPLACE DIRECTORY DIR1 AS '/home/oracle/TEMP/X';.
Share and enjoy.

Oracle external tables - Specifying dynamic filename

CREATE TABLE LOG_FILES (
LOG_DTM VARCHAR(18),
LOG_TXT VARCHAR(300)
)
ORGANIZATION EXTERNAL(
TYPE ORACLE_LOADER
DEFAULT DIRECTORY LOG_DIR
ACCESS PARAMETERS(
RECORDS DELIMITED BY NEWLINE
FIELDS(
LOG_DTM position(1:18),
LOG_TXT position(19:300)
)
)
LOCATION('logadm'))
)
REJECT LIMIT UNLIMITED
/
LOG_DIR is an oracle directory that points to /u/logs/
The problem though is that the contents of /u/logs/ looks like this
logadm_12012012.log
logadm_13012012.log
logadm_14012012.log
logadm_15012012.log
Is there any way i can specify the location of the file dynamically? i.e. every time i run Select * from LOG_FILES it should use the log file of the day. (e.g. log_adm_DDMMYYYYY).
I know i can use alter table log_files location ('logadm_15012012.log') but i would like not to have to issue the alter command.
Any other possibilities?
Thanks
It's a shame you're running 10g. On 11g we can associate a pre-processor script - a shell script - with an external table. In your case you could run a script which would figure out the latest file and then issue a copy command. Something like:
cp logadm_15012012.log logadm
Adrian Billington has blogged about this feature here. Frankly his write-up is more helpful than the official docs.
But as you're on 10g all you can do is run the ALTER TABLE statement, or use a scheduled job (cron or whatever) to sync a new file with the generic name.

Resources