Custom delimiter that needs to be processed and loaded into hive - file

I have a log file delimited by |~ and also the values are enclosed in double quotes . I tried loading the file into hive using the following . But i didnt succeed.
CREATE EXTERNAL TABLE AUDIT_DETAIL
(
EVENT_ID string
, DETAIL_ID smallint
, SERVER_CUID String
, DETAIL_TYPE_ID smallint
, DETAIL_TEXT String
, START_TIMESTAMP DATE DEFAULT SYSDATE
) row format delimited fields terminated by '|~'
location '/user/Audit_Detail';
Is there any way to accomplish this other than hive udf?
Thanks a lot

The delimiter you are using is a multicharacter delimiter which is not supported till ver 0.13 . It might be the reason for your error.
refer : How can I do a double delimiter(||) in Hive?

Related

Snowflake Copy Into failing when insert Null in timestamp column

Trying to load file data into Snowflake using COPY INTO. The table has a timestamp column. The file has only Null's empty string "" in that column.
On running copy into with File Format Timestamp option set AUTO, the statement is failing stating Can't parse '' as timestamp.
Is there any way to handle this
Using NULL_IF option:
NULL_IF = ( 'string1' [ , 'string2' ... ] )
String used to convert to and from SQL NULL. Snowflake replaces these strings in the data load source with SQL NULL. To specify more than one string, enclose the list of strings in parentheses and use commas to separate each value.
NULL_IF = ('\\N', '')

Retrieve text after last back slash using SSIS Derived Column transformation

Using SQL Server 2014.
I have a field that contains a string that contains a full file path i.e.
\\Server\Folder1\Folder2\Folder3\File21.csv
I only want what is after the last backslash i.e.
File21.csv
So in the world of SQL I would use:
Select RIGHT([FileName],charindex('\',reverse([FileName]),1)-1) as FileNameNew from mytable
However, how do I do this in a Derived Column in SSIS? There is no CHARINDEX so you have to use FINDSTRING. This is my expression:
RIGHT( [FileName] , FINDSTRING('\', REVERSE( [FileName] ) ,1) -1)
But it is not working, it keeps saying the single quotation mark was not expected. I've also tried double quotes to no avail.
I think you have your parameters backwards. FINDSTRING() wants the thing you're searching first, then the thing you're searching for. And you will need double quotes and an escaped backslash. This should work:
RIGHT( [FileName] , FINDSTRING(REVERSE( [FileName] ), "\\" ,1) -1)
Even if this can be done using RIGHT() or SUBSTRING() functions. I prefer using TOKEN and TOKENCOUNT() function to do that:
TOKEN([File Name],"\\",TOKENCOUNT([File Name],"\\"))
Example:
TOKEN("\\\\Server\\Folder1\\Folder2\\Folder3\\File21.csv","\\",TOKENCOUNT("\\\\Server\\Folder1\\Folder2\\Folder3\\File21.csv","\\"))
Result:
File21.csv

NULL Value Handling for CSV Files Via External Tables in Snowflake

I am trying to get the NULL_IF parameter of a file format working when applied to an external table.
I have a source CSV file containing NULL values in some columns. NULLS in the source file appear in the format "\N" (all non numeric values in the file are quoted). Here is an example line from the raw csv where the ModifiedOn value is NULL in the source system:
"AirportId" , "IATACode" , "CreatedOn" , "ModifiedOn"
1 , "ACU" , "2015-08-25 16:58:45" , "\N"
I have a file format defined including the parameter NULL_IF = "\\N"
The following select statement successfully interprets the correct rows as holding NULL values.
SELECT $8
FROM #MyS3Bucket
(
file_format => 'CSV_1',
pattern => '.*MyFileType.*.csv.gz'
)
However if I use the same file format with an external table like this:
CREATE OR REPLACE EXTERNAL TABLE MyTable
MyColumn varchar as (value:c8::varchar)
WITH LOCATION = #MyS3Bucket
FILE_FORMAT = (FORMAT_NAME = 'CSV_1')
PATTERN = '.*MyFileType_.*.csv.gz';
Each row holds \N as a value rather than NULL.
I assume this is caused by external tables providing a single variant output that can then be further split rather than directly presenting individual columns in the csv file.
One solution is to code the NULL handling into the external view like this:
CREATE OR REPLACE EXTERNAL TABLE MyTable
MyColumn varchar as (NULLIF(value:c8::varchar,'\\N'))
WITH LOCATION = #MyS3Bucket
FILE_FORMAT = (FORMAT_NAME = 'CSV_1')
PATTERN = '.*MyFileType_.*.csv.gz';
However this leaves me at risk of having to re-write a lot of external table code if the file format changes whereas the file format could\should centralise that NULL definition. It would also mean the NULL conversion would have to be handled column by column rather than file by file increasing code complexity.
Is there a way that I can have the NULL values appear through an external table without handling them explicitly through column definitions?
Ideally this would be applied through a file format object but changes to the format of the raw file are not impossible.
I am able to reproduce the issue, and it seems like a bug. If you have access to Snowflake support, it could be better to submit a support case regarding to this issue, so you can easily follow the process.

LOAD TABLE statement with NULLable dates

I m looking to do a batch load into a table, called temp_data, where some of the columns are NULLable dates.
Here is what I have till now:
LOAD TABLE some.temp_data
(SomeIntegerColumn ',', SomeDateColumn DATE('YYYYMMDD') NULL('NULL'), FILLER(1), SomeStringColumn ',')
USING CLIENT FILE '{0}' ESCAPES OFF DELIMITED BY ',' ROW DELIMITED BY '#'
and I m trying to load the following file
500,NULL,Monthly#
500,NULL,Monthly#
500,NULL,Monthly#
Unfortunately the error I get is:
ERROR [07006] [Sybase][ODBC Driver][Sybase IQ]Cannot convert NULL,Mon
to a date (column SomeDateColumn)
Any ideas why this wouldn't work?
It appears that it's reading the 8 characters following the first delimiter and trying to interpret them as a date.
Try switching to FORMAT BCP. The following example could work on your sample file:
LOAD TABLE some.temp_data (
SomeIntegerColumn
, SomeDateColumn NULL('NULL')
, SomeStringColumn
)
USING CLIENT FILE '{0}'
ESCAPES OFF
DELIMITED BY ','
ROW DELIMITED BY '#'
FORMAT BCP
In addition,FORMAT BCP also has the advantage of not requiring trailing delimiters.

Why can I store an Ukrainian string in a varchar column?

I got a little surprised as I was able to store an Ukrainian string in a varchar column .
My table is:
create table delete_collation
(
text1 varchar(100) collate SQL_Ukrainian_CP1251_CI_AS
)
and using this query I am able to insert:
insert into delete_collation
values(N'використовується для вирішення квитки')
but when I am removing 'N' it is showing ?????? in the select statement.
Is it okay or am I missing something in understanding unicode and non-unicode with collate?
From MSDN:
Prefix Unicode character string constants with the letter N. Without
the N prefix, the string is converted to the default code page of the
database. This default code page may not recognize certain characters.
UPDATE:
Please see a similar questions::
What is the meaning of the prefix N in T-SQL statements?
Cyrillic symbols in SQL code are not correctly after insert
sql server 2012 express do not understand Russian letters
To expand on MegaTron's answer:
Using collate SQL_Ukrainian_CP1251_CI_AS, SQL server is able to store ukrainian characters in a varchar column by using CodePage 1251.
However, when you specify a string without the N prefix, that string will be converted to the default non-unicode codepage before it is sent to the database, and that is why you see ??????.
So it is completely fine to use varchar and collate as you do, but you must always include the N prefix when sending strings to the database, to avoid the intermediate conversion to default (non-ukrainian) codepage.

Resources