I was trying to export a table in H2 DB into CSV using CSVWRITE function and found out if double quotes are included in a varchar column they will be duplicated.
Eg. - 'hello"howareyou' will be 'hello""howareyou' in the written csv.
Tried saving this varchar column with escape characters and few other combinations but result is the same.
Following is my table column I created to test this issue and the resulted CSV value I got.
My column CSV written value
------------------------------
hello"how hello""how
hello\"how hello\""how
hello""how hello""""how
hello\""how hello\""""how
hello\\"how hello\\""how
hello\\\\"how hello\\\\""how
hello["]how hello[""]how
hello"e;how hello"e;how
Following is my CSVWrite command:
CALL CSVWRITE(
'#DELTA_CSV_DIR#/DELTA.csv',
'SELECT ccc from temptemp',
null, '|', '');
Am I doing this wrong? or is there any option or workaround I can use to avoid this situation?
Thanks in advanced.
You are currently using the built-in CSVWRITE function with the following options:
fileName = '#DELTA_CSV_DIR#/DELTA.csv'
query = 'SELECT ccc from temptemp'
characterSet = default (UTF-8)
fieldSeparator = '|'
fieldDelimiter = '' (empty string)
As documented, the default escape character is a double quote, so that double quotes are escaped using a double quote (in the same way as you need to escape a backslash within a Java string with a backslash). The escape character is needed to escape the field separator.
You can disable the escape character as follows:
CALL CSVWRITE(
'#DELTA_CSV_DIR#/DELTA.csv',
'SELECT ccc from temptemp',
'fieldSeparator=| fieldDelimiter= escape=');
This is also using the more readable new format for options.
Related
I read through snowflake documentation and the web and found only one solution to my problem by https://stackoverflow.com/users/12756381/greg-pavlik which can be found here Snowflake JSON to tabular
This doesn't work on data with Russian attribute names and attribute values. What modifications can be made for this to fit my case?
Here is an example:
create or replace table target_json_table(
v variant
);
INSERT INTO target_json_table SELECT parse_json('{
"at": {
"cf": "NV"
},
"pd": {
"мо": "мо",
"ä": "ä",
"retailerName": "retailer",
"productName":"product"
}
}');
call create_view_over_json('target_json_table', 'V', 'MY_VIEW');
ERROR: Encountered an error while creating the view. SQL compilation error: syntax error line 7 at position 7 unexpected 'ä:'. syntax error line 8 at position 7 unexpected 'мо'.
There was a bug in the original SQL used as a basis for the creation of the stored procedure. I have corrected that. You can get an update on the Github page. The changed section is here:
sql =
`
SELECT DISTINCT '"' || array_to_string(split(f.path, '.'), '"."') || '"' AS path_nAme, -- This generates paths with levels enclosed by double quotes (ex: "path"."to"."element"). It also strips any bracket-enclosed array element references (like "[0]")
DECODE (substr(typeof(f.value),1,1),'A','ARRAY','B','BOOLEAN','I','FLOAT','D','FLOAT','STRING') AS attribute_type, -- This generates column datatypes of ARRAY, BOOLEAN, FLOAT, and STRING only
'"' || array_to_string(split(f.path, '.'), '.') || '"' AS alias_name -- This generates column aliases based on the path
FROM
#~TABLE_NAME~#,
LATERAL FLATTEN(#~COL_NAME~#, RECURSIVE=>true) f
WHERE TYPEOF(f.value) != 'OBJECT'
AND NOT contains(f.path, '[') -- This prevents traversal down into arrays
limit ${ROW_SAMPLE_SIZE}
`;
Previously this SQL simply replaced non-ASCII characters with underscores. The updated SQL will wrap key names in double quotes to create non-ASCII key names.
Be sure that's what you want it to do. Also, the keys are nested. I decided that the best way to handle that is to create column names in the view with dot notation, for example one column name is pd.ä. That will require wrapping the column name with double quotes, such as:
select * from MY_VIEW where "pd.ä" = 'ä';
Final note: The name of your stored procedure is create_view_over_json, however, in the Github project the name is create_view_over_variant. When you update, be sure to call the right procedure.
I have a CSV file, a sample of it looks like this:
Image of CSV file
Snowpipe is failing to load this CSV file with the following error:
Number of columns in file (5) does not match that of the corresponding table (3), use file format option error_on_column_count_mismatch=false to ignore this error
Can someone advise me csv file format definition to accomodate load without fail ?
The issue is that the data you are trying to load contains commas (,) inside the data itself. Snowflake thinks that those commas represent new columns which is why it thinks there are 5 columns in your file. It is then trying to load these 5 columns into a table with only 3 columns resulting in an error.
You need to tell Snowflake that anything inside double-quotes (") should be loaded as-is, and not to interpret commas inside quotes as column delimiters.
When you create your file-format via the web interface there is an option which allows you to tell Snowflake to do this. Set the "Field optionally enclosed by" dropdown to "Double Quote" like in this picture:
Alternatively, if you're creating your file-format with SQL then there is an option called FIELD_OPTIONALLY_ENCLOSED_BY that you can set to \042 which does the same thing:
CREATE FILE FORMAT "SIMON_DB"."PUBLIC".sample_file_format
TYPE = 'CSV'
COMPRESSION = 'AUTO'
FIELD_DELIMITER = ','
RECORD_DELIMITER = '\n'
SKIP_HEADER = 0
FIELD_OPTIONALLY_ENCLOSED_BY = '\042' # <---------------- Set to double-quote
TRIM_SPACE = FALSE
ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE
ESCAPE = 'NONE'
ESCAPE_UNENCLOSED_FIELD = '\134'
DATE_FORMAT = 'AUTO'
TIMESTAMP_FORMAT = 'AUTO';
If possible share the file format and one sample record to figure out the issue. Seems issue with number of column, Can you include field_optionally_enclosed_by option into your copy statement and try it once.
When the TAB character is unlikely to occur, I tend to use TAB delimited files - which, also - together with a Header -, make the source files more human-readable in case they need to be open for troubleshooting loading failures:
FIELD_DELIMITER = '\t'
Also (although a bit off-topic), note that Snowflake suggests files to be compressed: https://docs.snowflake.com/en/user-guide/data-load-prepare.html#data-file-compression
I mostly use GZip compression type:
COMPRESSION = GZIP
A (working) example:
CREATE FILE FORMAT Public.CSV_GZIP_TABDELIMITED_WITHHEADER_QUOTES_TRIM
FIELD_DELIMITER = '\t'
SKIP_HEADER = 1
TRIM_SPACE = TRUE
NULL_IF = ('NULL')
COMPRESSION = GZIP
;
I'm having trouble creating a table in Athena - that points at files with the following format:
string, string, string, array.
when I wrote the file - I delimited the array items with '|'.
I delimited each line with '\n' and each column with ','.
so for example, a row in my CSV would look like that:
Garfield, 15, orange, fish|milk|lasagna
in hive (according to the documentation i read)- when creating a table with a row delimited format - while stating the delimiters you can state a 'collection items' delimiter - that states the delimiter between elements in array columns.
I could not find an equivalent for Presto in the documentation,
Is anyone aware if it's possible, if so - what is the format, or where can I find it?
i tried "guessing" many forms, including 'collection items', none seem to work.
CREATE EXTERNAL TABLE `cats`(
`name` string,
`age` string,
`color` string,
`foods` array<string>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
COLLECTION ITEMS TERMINATED BY '|'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'some-location'
Would really appreciate any insight, Thanks! :)
According to AWS Athena docs on using SerDe, your guess was 100% correct.
In general, Athena uses the LazySimpleSerDe if you do not specify a ROW FORMAT, or if you specify ROW FORMAT DELIMITED
ROW FORMAT
DELIMITED FIELDS TERMINATED BY ','
ESCAPED BY '\\'
COLLECTION ITEMS TERMINATED BY '|'
MAP KEYS TERMINATED BY ':'
Now, when I simply tried your DDL statement, I would get
line 1:8: no viable alternative at input 'create external'
However by deleting LINES TERMINATED BY '\n', I was able to create table schema in meta catalog
CREATE EXTERNAL TABLE `cats`(
`name` string,
`age` string,
`color` string,
`foods` array<string>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '|'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'some-location'
Sample file with lines as shown in your file would get parsed correctly and I was able to do UNNEST on foods column:
SELECT *
FROM "cats"
CROSS JOIN UNNEST(foods) as t(food)
which resulted in
Moreover, it was also enough to simply swap lines LINES TERMINATED BY '\n' and COLLECTION ITEMS TERMINATED BY '|' for query to work (although I don't have an explanation for it)
CREATE EXTERNAL TABLE `cats`(
`name` string,
`age` string,
`color` string,
`foods` array<string>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '|'
LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'some-location'
(Note: this answer is applicable to Presto in general, but not to Athena)
Currently you cannot set collection delimiter in Presto.
Please create a feature request # https://github.com/prestosql/presto/issues/
Note, we plan to provide generic support for table properties to address cases like this holistically -- https://github.com/prestosql/presto/issues/954. You can track the issue and associated pull request for updates.
I'm use presto engine creating a hive table , set collection delimiter in Presto, for example:
CREATE TABLE IF NOT EXISTS test (
id bigint COMMENT 'ID',
type varchar COMMENT 'TYPE',
content varchar COMMENT 'CONTENT',
create_time timestamp(3) COMMENT 'CREATE TIME',
pt varchar
)
COMMENT 'create time 2021/11/04 11:27:53'`
WITH (
format = 'TEXTFILE',
partitioned_by = ARRAY['pt'],
textfile_field_separator = U&'\0001'
)
I have a text file that contains Persian words and is saved using ANSI encoding. When I try to read the Persian words from the text file, I get some characters like '?'. To solve the problem, I changed the file encoding to UTF8 and re-wrote the text file. Here's the method for changing file encoding:
public void Convert2UTF8(string filePath)
{
//first, read the text file with "ANSI" endocing
StreamReader fileStream = new StreamReader(filePath, Encoding.Default);
string fileContent = fileStream.ReadToEnd();
fileStream.Close();
//Now change the file encoding and replace it with the UTF8
StreamWriter utf8Writer = new StreamWriter(filePath.Replace(".txt", ".txt"), false, Encoding.UTF8);
utf8Writer.Write(fileContent);
utf8Writer.Close();
}
Now the first problem is solved; However, there is another issue here: every time that I want to search a Persian word from the SQL server database table, the result is null while the record does exist in the database table.
What's the solution to find my Persian words that exist in the table? The code that I currently use is simply like the following:
SELECT * FROM [dbo].[WordDirectory]
WHERE Word = N'کلمه'
Word is the field that Persian words are saved in. The type of the field is NVARCHAR. My SQL server version is 2012.
Should I change the collation?
DECLARE #Table TABLE(Field NVARCHAR(4000) COLLATE Frisian_100_CI_AI)
INSERT INTO #Table (Field) VALUES
(N'همهٔ افراد بش'),
(N'میآیند و حیثیت '),
(N'ميشه آهسته تر صحبت کنيد؟'),
(N'روح'),
(N' رفتار')
SELECT * FROM #Table
WHERE Field LIKE N'%آهسته%'
The both Queries return the same result
RESULT Set: ميشه آهسته تر صحبت کنيد؟
You have to make sure that when you are inserting the values you prefix then witn N thats to tell sql server there can be unicode character in the passed string. Same is true when you are searching for them strings in Select statement.
Probably you have problem with Persian and Arabic versions of the 'ي' and 'ك' during search. These characters even look the same, have different Unicode numbers:
select NCHAR(1740), -- Persian ى
NCHAR(1610), -- Arabic ي
NCHAR(1705), -- Persian ك
NCHAR(1603) -- Arabic ك
more info: http://www.dotnettips.info/post/90
I need to fetch Table's TOP_PK, IDENT_CURRENT, IDENT_INCR, IDENT_SEED for which i am building dynamic query as below:
sGetSchemaCommand = String.Format("SELECT (SELECT TOP 1 [{0}] FROM [{1}]) AS TOP_PK, IDENT_CURRENT('[{1}]') AS CURRENT_IDENT, IDENT_INCR('[{1}]') AS IDENT_ICREMENT, IDENT_SEED('[{1}]') AS IDENT_SEED", pPrimaryKey, pTableName)
Here pPrimaryKey is name of Table's primary key column and pTableName is name of Table.
Now, i am facing problem when Table_Name contains " ' " character.(For Ex. KIN'1)
When i am using above logic and building query it would be as below:
SELECT (SELECT TOP 1 [ID] FROM [KIL'1]) AS TOP_PK, IDENT_CURRENT('[KIL'1]') AS CURRENT_IDENT, IDENT_INCR('[KIL'1]') AS IDENT_ICREMENT, IDENT_SEED('[KIL'1]') AS IDENT_SEED
Here, by executing above query i am getting error as below:
Incorrect syntax near '1'.
Unclosed quotation mark after the character string ') AS IDENT_SEED'.
So, can anyone please show me the best way to solve this problem?
Escape a single quote by doubling it: KIL'1 becomes KIL''1.
If a string already has adjacent single quotes, two becomes four, or four becomes eight... it can get a little hard to read, but it works :)
Using string methods from .NET, your statement could be:
sGetSchemaCommand = String.Format("SELECT (SELECT TOP 1 [{0}] FROM [{1}]) AS TOP_PK, IDENT_CURRENT('[{2}]') AS CURRENT_IDENT, IDENT_INCR('[{2}]') AS IDENT_ICREMENT, IDENT_SEED('[{2}]') AS IDENT_SEED", pPrimaryKey, pTableName, pTableName.Replace("'","''"))
EDIT:
Note that the string replace is now only on a new, third substitution string. (I've taken out the string replace for pPrimaryKey, and for the first occurrence of pTableName.) So now, single quotes are only doubled, when they will be within other single quotes.
You need to replace every single quote into two single quotes http://beyondrelational.com/modules/2/blogs/70/posts/10827/understanding-single-quotes.aspx