i am loading a CSV file with has '|' as delimiter. the CSV file has 22 records in one line and 25 in other line .
but my table is only has 18 column .
i am facing issue like " Field delimiter '|' found while expecting record delimiter '\n' ".i have used error_on_column_mismatch=false.
but its not working.error_on_column_mismatch=false working when the table had more column then file. Do anybody faced this issue and how to solve it.
I've got the simple table in CSV format:
999,"01/01/2001","01/01/2001","7777777","company","channel","01/01/2001"
990,"01/01/2001","01/01/2001","767676","hhh","tender","01/01/2001"
3838,"01/01/2001","01/01/2001","888","jhkh","jhkjh","01/01/2001"
08987,"01/01/2001","01/01/2001","888888","hkjhjkhv","jhgjh","01/01/2001"
8987,"01/01/2001","01/01/2001","9999","jghg","hjghg","01/01/2001"
jhkjhj,"01/01/2001","01/01/2001","9999","01.01.2001","hjhh","01/01/2001"
090009,"","","77777","","","01/01/2001"
980989,"01/01/2001","01/01/2001","888","","jhkh","01/01/2001"
0000,"01/01/2001","01/01/2001","99999","jhjh","","01/01/2001"
92929,"01/01/2001","01/01/2001","222","","","01/01/2001"
I'm trying to import that data into SQL Server using BULK INSERT (Transact-SQL)
set dateformat DMY;
BULK INSERT Oracleload
FROM '\\Mac\Home\Desktop\Test\T_DOGOVOR.csv'
WITH
(FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
KEEPNULLS);
On the output I get the next error:
Msg 4864, Level 16, State 1, Line 4
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 2 (date_begin)....
Something wrong with date format maybe. But what script I need to write to fix that error?
Please help.
Thanks in advance.
BULK INSERT (nor bcp) cannot (properly) handle CSV files, specially if they have (correctly) " quotes. Alternatives are SSIS or PowerShell.
I always look at the data in Notepad++ to see if there are some weird characters, or non-printable characters, like a line break or something. For this, it seems like you can open it using Notepad (if you don't have Notepad++) do a find-replace for " to nothing... Save the file, and re-do the Bulk Load.
This record:
jhkjhj,"01/01/2001","01/01/2001","9999","01.01.2001","hjhh","01/01/2001"
The first column has a numeric type of some kind. You can't put the jhkjhj value into that field.
Additionally, some records have empty values ("") in date fields. These are likely to be to interpreted as empty strings, rather than null dates, and not convert properly.
But the error refers to "row 1, column 2". That's this value:
"01/01/2001"
Again, the import is interpreting this as a string, rather than a date. I suspect it's trying to import the quotes (") instead of just using them as separators.
You might try bulk loading to a special holding table, and then re-importing from there. Alternatively, you can change how data is exported or write a program to pre-clean it — strip the quotes from fields that shouldn't have them, isolate records that have data that won't insert to an exception file and report.
In SQL Data Warehouse (editors please don't change this, it is the actual name see: here) I have a JobCandidate_ext external table that looks like this.
CREATE EXTERNAL TABLE [HumanResources].[JobCandidate_ext](
[JobCandidateID] int,
[BusinessEntityID] int,
[Resume] Varchar(8000),
[ModifiedDate] Datetime
)
WITH (
LOCATION='/[HumanResources].[JobCandidate]/data.txt',
DATA_SOURCE=AzureStorage,
FILE_FORMAT=TextFile)
GO
The column [Resume] was an XML type in SQL Server but in SQL Data Warehouse XML types should be converted to varchar(8000) as described here.
I am using a flat file data.txt to export the data to a blob and then create an external table from it.
The [Resume] column has carriage returns in it (as expected from an XML file), and so when you run a SELECT * FROM [HumanResources].[JobCandidate_ext] you get an error. In this case:
Query aborted-- the maximum reject threshold (0 rows) was reached while reading from an external source: 1 rows rejected out of total 2 rows processed.
(/[HumanResources].[JobCandidate]/data.txt)Column ordinal: 0, Expected data type: INT, Offending value: some text .... (Column Conversion Error), Error: Error converting data type NVARCHAR to INT.
I know that I cannot configure a row delimiter when creating external tables as described here.
The row delimiter must be UTF-8 and supported by Hadoop’s LineRecordReader. The row delimiter must be either '\r', '\n', or '\r\n'. These are not user-configurable.
And if you try to put quotes on each column field you get this error while selecting rows from the external table: No closing string delimiter.
Query aborted-- the maximum reject threshold (0 rows) was reached while reading from an external source: 1 rows rejected out of total 1 rows processed.
(/[HumanResources].[JobCandidate]/data.txt)Column ordinal: 2, Expected data type: VARCHAR(8000) collate SQL_Latin1_General_CP1_CI_AS, Offending value: 'ShaiBassli (Tokenization failed), Error: No closing string delimiter.
Is there a way to get around this issue?
Today, PolyBase does not allow for row or field delimiters inside fields i.e. it does not allow you to escape these characters. As Greg pointed out, you can vote for this functionality here: https://feedback.azure.com/forums/307516-sql-data-warehouse/suggestions/10600132-polybase-allow-line-ends-within-qualified-text-f
To workaround this limitation, you can either pre-process the data (using sed or tr for example) to replace unwanted characters before reading it with PolyBase. Or you can switch to other polybase supported file formats RCFile/ORC/Parquet to avoid dealing with row and field delimiters completely.
I am trying to import a csv file ('|' delimited though) using bulk insert. The problem is that some columns have text qualifier double quotes. However this is not consistent across all rows in the same column. Inside the double quotes I find the text delimiter '|' which is part of the text, For example
Col1|Col2|Col3
1|text1|101
2|"text2|text2a"|102
3|text3|103
4|"text4"|104
In this case I can't use the format file where I could specify the column delimiter because col2 does not have consistent delimiter (it is either | or |").
However, if I try to import via the import wizard and select text qualifier '"' then the wizard successfully sorts everything into columns, recognizing '"' as text qualifier and not as column delimiter ("| or |"). Is there an equivalent way using the bulk insert?
I have a log file delimited by |~ and also the values are enclosed in double quotes . I tried loading the file into hive using the following . But i didnt succeed.
CREATE EXTERNAL TABLE AUDIT_DETAIL
(
EVENT_ID string
, DETAIL_ID smallint
, SERVER_CUID String
, DETAIL_TYPE_ID smallint
, DETAIL_TEXT String
, START_TIMESTAMP DATE DEFAULT SYSDATE
) row format delimited fields terminated by '|~'
location '/user/Audit_Detail';
Is there any way to accomplish this other than hive udf?
Thanks a lot
The delimiter you are using is a multicharacter delimiter which is not supported till ver 0.13 . It might be the reason for your error.
refer : How can I do a double delimiter(||) in Hive?