insert csv file into snowflake as variant not working - snowflake-cloud-data-platform

I am trying to copy csv data into snowflake table with only one column(variant). When I run copy into statement, my variant column is only displaying data from first column. I'm not sure what the problem is. Please help.
Create or replace table name(
RAW variant )
COPY INTO db_name.table_name
FROM (SELECT s.$1::VARIANT FROM #stage_name.csv s);

Assuming your .csv file consists of multiple, valid JSON, try using a file format of type JSON instead of .csv. See https://docs.snowflake.com/en/sql-reference/sql/create-file-format.html.
Alternatively, use PARSE_JSON in your SELECT.

Related

Snowflake Data Load Wizard - File Format - how to handle null string in File Format

I am using Snowflake Data Load Wizard to upload csv file to Snowflake table. The Snowflake table structure identifies a few columns as 'NOT NULL' (non-nullable). Problem is, the wizard is treating empty strings as null and the Data Load Wizard issues the following error:
Unable to copy files into table. NULL result in a non-nullable column
File '#<...../load_data.csv', line 2, character 1 Row, 1 Column
"<TABLE_NAME>" ["PRIMARY_CONTACT_ROLE":19)]
I'm sharing my File Format parameters from the wizard:
I then updated the DDL of the table by removing the "NOT NULL" declaration of the PRIMARY_CONTACT_ROLE column, then re-create the table and this time the data load of 20K records is successful.
How do we fix the file format wizard to make SNOWFLAKE not consider empty strings as NULLS?
The option you have to set EMPTY_FIELD_AS_NULL = FALSE. Unfortunately, modifying this option is not possible in the wizard. You have to create your file format or alter your existing file format in a worksheet manually as follows:
CREATE FILE FORMAT my_csv_format
TYPE = CSV
FIELD_DELIMITER = ','
SKIP_HEADER = 1
EMPTY_FIELD_AS_NULL = FALSE;
This will cause empty strings to be not treated as NULL values but as empty strings.
The relevant documentation can be found at https://docs.snowflake.com/en/sql-reference/sql/create-file-format.html#type-csv.
Let me know if you you need a full example of how to upload a CSV file with the SnowSQL CLI.
Just to add, there are additional ways you can load your CSV into snowflake without having to specify a file format.
You can use pre-built third-party modeling tools to upload your raw file, adjust the default column types to your preference and push your modeled data back into snowflake.
My team and I are working on such a tool - Datameer, feel free to check it out here
https://www.datameer.com/upload-csv-to-snowflake/

Copy data files from internal stage table to Logical tables

I am dealing with json and csv files moving from Unix/S3 bucket to Internal/External stage receptively
and I don't have any issue with json files copy from Internal/External stages to Static or logical table, where I am storing as JsonFileName, and JsonFileContent
Trying to copy to Static table ( parse_json($1) is working for JSON)
COPY INTO LogicalTable (FILE_NM, JSON_CONTENT)
from (
select METADATA$FILENAME AS FILE_NM, parse_json($1) AS JSON_CONTENT
from #$TSJsonExtStgName
)
file_format = (type='JSON' strip_outer_array = true);
I am looking for something similar for CSV, copy csv file name and csv file content from internal/external staging to Static or logical tables. Mainly looking for this to separate file copy and file loading, load may fail due number of columns mismatch, newline character, or bad data in one of the records.
If any one of below gets clarified is fine, please suggest
1) Trying to copy to Static table (METADATA$?????? not working for CSV)
select METADATA$FILENAME AS FILE_NM, METADATA$?????? AS CSV_CONTENT
from #INT_REF_CSV_UNIX_STG
2) Trying for dynamic columns (T.* not working for CSV)
SELECT METADATA$FILENAME,$1, $2, $3, T.*
FROM #INT_REF_CSV_UNIX_STG(FILE_FORMAT => CSV_STG_FILE_FORMAT)T
Regardless of whether the file is CSV or JSON, you need to make sure that your SELECT matches the table layout of the target table. I assume with your JSON, your target table is 2 columns...filename and a VARIANT column for your JSON contents. For CSV, you need to do the same thing. So, you need to do the $1, $2, etc. for each column that you want from the file...that matches your target table.
I have no idea what you are referencing with METADATA$??????, btw.
---ADDED
Based on your comment below, you have 2 options, which aren't native to a COPY INTO statement:
1) Create a Stored Procedure that looks at a table DDL and generates a COPY INTO statement that has the static columns defined and then executing the COPY INTO from within the SP.
2) Leverage an External Table. By defining an External Table with the METADATA$FILENAME and the rest of the columns, the External Table will return the CSV contents to you as JSON. From there, you can treat it in the same way you are treating your JSON tables.

Uploading excel file to sql server [duplicate]

Every time that I try to import an Excel file into SQL Server I'm getting a particular error. When I try to edit the mappings the default value for all numerical fields is float. None of the fields in my table have decimals in them and they aren't a money data type. They're only 8 digit numbers. However, since I don't want my primary key stored as a float when it's an int, how can I fix this? It gives me a truncation error of some sort, I'll post a screen cap if needed. Is this a common problem?
It should be noted that I cannot import Excel 2007 files (I think I've found the remedy to this), but even when I try to import .xls files every value that contains numerals is automatically imported as a float and when I try to change it I get an error.
http://imgur.com/4204g
SSIS doesn't implicitly convert data types, so you need to do it explicitly. The Excel connection manager can only handle a few data types and it tries to make a best guess based on the first few rows of the file. This is fully documented in the SSIS documentation.
You have several options:
Change your destination data type to float
Load to a 'staging' table with data type float using the Import Wizard and then INSERT into the real destination table using CAST or CONVERT to convert the data
Create an SSIS package and use the Data Conversion transformation to convert the data
You might also want to note the comments in the Import Wizard documentation about data type mappings.
Going off of what Derloopkat said, which still can fail on conversion (no offense Derloopkat) because Excel is terrible at this:
Paste from excel into Notepad and save as normal (.txt file).
From within excel, open said .txt file.
Select next as it is obviously tab delimited.
Select "none" for text qualifier, then next again.
Select the first row, hold shift, select the last row, and select the text radial button. Click Finish
It will open, check it to make sure it's accurate and then save as an excel file.
There is a workaround.
Import excel sheet with numbers as float (default).
After importing, Goto Table-Design
Change DataType of the column from Float to Int or Bigint
Save Changes
Change DataType of the column from Bigint to any Text Type (Varchar, nvarchar, text, ntext etc)
Save Changes.
That's it.
When Excel finds mixed data types in same column it guesses what is the right format for the column (the majority of the values determines the type of the column) and dismisses all other values by inserting NULLs. But Excel does it far badly (e.g. if a column is considered text and Excel finds a number then decides that the number is a mistake and insert a NULL instead, or if some cells containing numbers are "text" formatted, one may get NULL values into an integer column of the database).
Solution:
Create a new excel sheet with the name of the columns in the first row
Format the columns as text
Paste the rows without format (use CVS format or copy/paste in Notepad to get only text)
Note that formatting the columns on an existing Excel sheet is not enough.
There seems to be a really easy solution when dealing with data type issues.
Basically, at the end of Excel connection string, add ;IMEX=1;"
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\\YOURSERVER\shared\Client Projects\FOLDER\Data\FILE.xls;Extended Properties="EXCEL 8.0;HDR=YES;IMEX=1";
This will resolve data type issues such as columns where values are mixed with text and numbers.
To get to connection property, right click on Excel connection manager below control flow and hit properties. It'll be to the right under solution explorer. Hope that helps.
To avoid float type field in a simple way:
Open your excel sheet..
Insert blank row after header row and type (any text) in all cells.
Mouse Right-Click on the head of the columns that cause a float issue and select (Format Cells), then choose the category (Text) and press OK.
And then export the excel sheet to your SQL server.
This simple way worked with me.
A workaround to consider in a pinch:
save a copy of the excel file, modify the column to format type 'text'
copy the column values and paste to a text editor, save the file (call it tmp.txt).
modify the data in the text file to start and end with a character so that the SQL Server import mechanism will recognize as text. If you have a fancy editor, use included tools. I use awk in cygwin on my windows laptop. For example, I start end end the column value with a single quote, like "$ awk '{print "\x27"$1"\x27"}' ./tmp.txt > ./tmp2.txt"
copy and paste the data from tmp2.txt over top of the necessary column in the excel file, and save the excel file
run the sql server import for your modified excel file... be sure to double check the data type chosen by the importer is not numeric... if it is, repeat the above steps with a different set of characters
The data in the database will have the quotes once the import is done... you can update the data later on to remove the quotes, or use the "replace" function in your read query, such as "replace([dbo].[MyTable].[MyColumn], '''', '')"

Load csv file data into tables

Created tables as below :
source:([id:`symbol$()] ric:();source:();Date:`datetime$())
property:([id:`symbol$()] Value:())
Then i have two .csv files which include two tables datas.
property.csv showing as below :
id,Value
TEST1,1
TEST2,2
source.csv showing as below :
id,ric,source,Date
1,TRST,QO,2017-07-07 11:42:30.603
2,TRST2,QOT,2018-07-07 11:42:30.603
Now , how to load csv file data into each tables one time
You can use the 0: to load delimited records. https://code.kx.com/wiki/Reference/ZeroColon
The most simple form of the function is (types; delimiter) 0: filehandle
The types should be given as their uppercase letter representations, one for each column or a blank space to ignore a column. e.g using "SJ" for source.csv would mean I wanted to read in the id column as a symbol and the value column as a long.
The delimiter specifies how each columns is separated, in your case Comma Separated Values (CSV). You can pass in the delimiter as a string "," which will treat every row as part of the data and return a nested list of the columns which you can either insert into a table with matching schema or you can append on headers and flip the dictionary manually and then flip to get a table like so: flip `id`value!("IS";",") 0: `:test.txt.
If you have column headers as the first row in the csv you can pass an enlisted delimeter enlist "," which will then use the column headers and return a table in kdb with these as the headers, which you can then rename if you see fit.
As the files you want to read in have different types for the columns and are to bed into you could create a function to read them in for examples
{x insert (y;enlist ",") 0:z}'[(`source;`property);("SSSP";"SJ");(`:source.csv;`:property.csv)]
Which would allow you to specify the name of the table that should be created, the column types and the file handle of the file.
I would suggest a timestamp instead of the (depreciated) datetime as it is stored as a long instead of a float so there will be no issues with comparison.
you can use key to list the contents of the dir ;
files: key `:.; /get the contents of the dir
files:files where files like "*.csv"; /filter the csv files
m:`property.csv`source.csv!("SJ";"JSSZ"); /create the mappings for each csv file
{[f] .[first ` vs f;();:; (m#f;enlist csv) 0: hsym f]}each files
and finally, load each csv file; please note here the directory is 'pwd', you might need to add the dir path to each file before using 0:

How to Dynamically render Table name and File name in pentaho DI

I have a requirement in which one source is a table and one source is a file. I need to join these both on a column. The problem is that I can do this for one table with one transformation but I need to do it for multiple set of files and tables to load into another set of specific files as target using the same transformation.
Breaking down my requirement more specifically :
Source Table Source File Target File
VOICE_INCR_REVENUE_PROFILE_0 VoiceRevenue0 ProfileVoice0
VOICE_INCR_REVENUE_PROFILE_1 VoiceRevenue1 ProfileVoice1
VOICE_INCR_REVENUE_PROFILE_2 VoiceRevenue2 ProfileVoice2
VOICE_INCR_REVENUE_PROFILE_3 VoiceRevenue3 ProfileVoice3
VOICE_INCR_REVENUE_PROFILE_4 VoiceRevenue4 ProfileVoice4
VOICE_INCR_REVENUE_PROFILE_5 VoiceRevenue5 ProfileVoice5
VOICE_INCR_REVENUE_PROFILE_6 VoiceRevenue6 ProfileVoice6
VOICE_INCR_REVENUE_PROFILE_7 VoiceRevenue7 ProfileVoice7
VOICE_INCR_REVENUE_PROFILE_8 VoiceRevenue8 ProfileVoice8
VOICE_INCR_REVENUE_PROFILE_9 VoiceRevenue9 ProfileVoice9
The table and file names are always corresponding i.e. VOICE_INCR_REVENUE_PROFILE_0 should always join with VoiceRevenue0 and the result should be stored in ProfileVoice0. There should be no mismatches in this case. I tried setting the variables with table names and file names, but it only takes on value at a time.
All table names and file names are constant. Is there any other way to get around this. Any help would be appreciated.
Try using "Copy rows to result" step. It will store all the incoming rows (in your case the table and file names) into a memory. And for every row, it will try to execute your transformation. In this way, you can read multiple filenames at one go.
Try reading this link. Its not the exact answer, but similar.
I have created a sample here. Please check if this is what is required.
In the first transformation, i read the tablenames and filenames and loaded it in the memory. After that i have used the get variable step to read all the files and table names to generate the output. [Note: I have not used table input as source anywhere, instead used TablesNames. You can replace the same with the table input data.]
Hope it helps :)

Resources