How do I put a csv into an external table in SNowflake? - snowflake-cloud-data-platform

I have a staged file and I am trying to query the first line/row of it because it contains the column headers of the file. Is there a way I can create an external table using this file so that I can query the first line?
I am able to query the staged file using
SELECT a.$1
FROM #my_stage (FILE_FORMAT=>'my_file_format',PATTERN=>'my_file_path') a
and then to create the table I tried doing
CREATE EXTERNAL TABLE MY_FILE_TABLE
WITH
LOCATION='my_file_path'
FILE_FORMAT = my_file_format;

Reading Headers from CSV is not supported however this answer from StackOverflow gives a workaround.

Related

Snowflake create partition based on file column

Is it possible to create partition in external tables in snowflake based on the file column instead of using file path (metadata$filename) when the file folder isn't partitioned.
I have tried creating an external like exemple bellow:
create or replace external table "db"."schema".EXT_TABLE (
"year" NUMBER(38,0) AS YEAR(TO_TIMESTAMP((PARSE_JSON(VALUE:"c3"):"time":"$date"::varchar)))
)
partition by ("year")
partition_type = user_specified
location=#stage
file_format=file_format;
but it returns the error Function GET is not supported in an external table partition column expression.
I also have tried using the (parse_json(metadata$external_table_partition) as documentation shows, but this column always returns empty.
Does anyone have any tips that can make it works ?

How to load Parquet/AVRO into multiple columns in Snowflake with schema auto detection?

When trying to load a Parquet/AVRO file into a Snowflake table I get the error:
PARQUET file format can produce one and only one column of type variant or object or array. Use CSV file format if you want to load more than one column.
But I don't want to load these files into a new one column table — I need the COPY command to match the columns of the existing table.
What can I do to get schema auto detection?
Good news, that error message is outdated, as now Snowflake supports schema detection and COPY INTO multiple columns.
To reproduce the error:
create or replace table hits3 (
WatchID BIGINT,
JavaEnable SMALLINT,
Title TEXT
);
copy into hits3
from #temp.public.my_ext_stage/files/
file_format = (type = parquet);
-- PARQUET file format can produce one and only one column of type variant or object or array.
-- Use CSV file format if you want to load more than one column.
To fix the error and have Snowflake match the columns from the table and Parquet/AVRO files just add the option MATCH_BY_COLUMN_NAME=CASE_INSENSITIVE (or MATCH_BY_COLUMN_NAME=CASE_SENSITIVE):
copy into hits3
from #temp.public.my_ext_stage/files/
file_format = (type = parquet)
match_by_column_name = case_insensitive;
Docs:
https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html
https://docs.snowflake.com/en/user-guide/data-load-overview.html?#detection-of-column-definitions-in-staged-semi-structured-data-files

Easy way to load a CSV file from the command line into a new table of an Oracle database without specifying the column details

I often want to quickly load a CSV into an Oracle database. The CSV (Unicode) is on a machine with an Oracle InstantClient version 19.5, the Oracle database is of version 18c.
I look for a command line tool which uploads the rows without me specifying a column structure.
I know I can use sqlldr with a .ctl file, but then I need to define columns types, etc. I am interested in a tool which figures out the column attributes itself from the data in the CSV (or uses a generic default for all columns).
The CSVs I have to ingest contain always a header row the tool in question could use to determine appropriate columns in the table.
Starting with Oracle 12c, you can use sqlldr in express mode, thereby you don't need any control file.
In Oracle Database 12c onwards, SQLLoader has a new feature called
express mode that makes loading CSV files faster and easier. With
express mode, there is no need to write a control file for most CSV
files you load. Instead, you can load the CSV file with just a few
parameters on the SQLLoader command line.
An example
Imagine I have a table like this
CREATE TABLE EMP
(EMPNO number(4) not null,
ENAME varchar2(10),
HIREDATE date,
DEPTNO number(2));
Then a csv file that looks like this
7782,Clark,09-Jun-81,10
7839,King,17-Nov-81,12
I can use sqlldr in express mode :
sqlldr userid=xxx table=emp
You can read more about express mode in this white paper
Express Mode in SQLLDR
Forget about using sqlldr in a script file. Your best bet is on using an external table. This is a create table statement with sqlldr commands that will read a file from a directory and store it as a table. Super easy, really convenient.
Here is an example:
create table thisTable (
"field1" varchar2(10)
,"field2" varchar2(100)
,"field3" varchar2(100)
,"dateField" date
) organization external (
type oracle_loader
default directory <createDirectoryWithYourPath>
access parameters (
records delimited by newline
load when (fieldname != BLANK)
skip 9
fields terminated by ',' optionally ENCLOSED BY '"' ltrim
missing field values are null
(
"field1"
,"field2"
,"field3"
,"dateField" date 'mm/dd/yyyy'
)
)
location ('filename.csv')
);

loading a csv file to existing HIVE tale through Spark

Below is the code that I have written to connect to a RDBMS, then create temp table , execute SQL query on that temp table, saving the SQL query output to a .csv format through databricks module.
from pyspark import SparkContext
sc = SparkContext("local", "Simple App")
from pyspark.sql import SQLContext, Row
sqlContext = SQLContext(sc)
from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)
df = sqlContext.read.format("jdbc").option("url","jdbc:sqlserver://<server>:<port>").option("databaseName","xxx").option("driver","com.microsoft.sqlserver.jdbc.SQLServerDriver").option("dbtable","xxxx").option("user","xxxxx").option("password","xxxxx").load()
df.registerTempTable("test")
df1= sqlContext.sql("select * from test where xxx= 6")
df1.write.format("com.databricks.spark.csv").save("/xxxx/xxx/ami_saidulu")
df1.write.option("path", "/xxxx/xxx/ami_saidulu").saveAsTable("HIVE_DB.HIVE_TBL",format= 'csv',mode= 'Append')
Where HIVE.DB is an existing HIVE DATABASE
HIVE.TBL is an existing HIVE TABLE
after I execute the code, I am getting below error:
py4j.protocol.Py4JJavaError: An error occurred while calling o68.saveAsTable.
: java.lang.RuntimeException: Append mode is not supported by com.databricks.spark.csv.DefaultSource15
Does that mean, the databricks module doesn't support "saveAsTable" function?
If yes, then please point out the mistakes in my code.
If no, then what is the solution/work around/industry standards ?
Spark 1.6.1
I can suggest you one another solution.
You can use Insert functionality to insert in the table.
sqlContext.sql("INSERT INTO/OVERWRITE TABLE HIVE_DB.HIVE_TBL select * from test where xxx= 6")
I hope this solution will help you and you can directly write into table, why do you want to write in csv and then writing into the table?
Even if you want text delimited file #table path. Just define table as TextFile table with the required delimiter. Your files #table path would be the delimited one after insert.
Thanks
Assuming your table is managed:
Just do df.write.saveAsTable('HIVE_DB.HIVE_TBL',write_mode='Append')‌, no need to go through an intermediate csv-File.
What this error means is that the databricks module for csv does not support Append mode. There is an issue on github here. So the solution is not to use csv with append mode.

Exporting table data in csv file using SQL Server stored procedure

I have these requirements:
Export table data into a .csv file using a stored procedure
First row in the .csv file will be a custom header
Note: data in this row will not come from table. It will be a fixed header for all the .csv being generated.
Similar to something like this:
Latest price data:
product1;150;20150727
product2;180;20150727
product3;180;20150727
Assuming that date is a proper datetimecolumn the following procedure will at least do the job for this table named prodtbl:
CREATE proc csvexp (#header as nvarchar(256)) AS
BEGIN
SELECT csv FROM (
SELECT prod,date,prod+';'+CAST(price AS varchar(8))+';'
+replace(CONVERT(varchar,getdate(),102),'.','') csv
FROM prodtbl
UNION ALL
SELECT null,'1/1/1900',#header
) csvexp ORDER BY prod,date
END
The command
EXEC csvexp 'my very own csv export'
will then generate the following output:
my very own csv export
product1;150;20150727
product2;180;20150727
product3;180;20150727
The part of actually getting this output into a .csv file still remains to be done ...

Resources