I'm trying to import some data from the popular GeoNames site into SQL Server. It's a tab delimited text file. I didn't think there would be a problem but whatever I do, I just get an error message which says:
The bulk load failed. The column is too long in the data file for row 1, column 4. Verify that the field terminator and row terminator are specified correctly.
This is the file I'm trying to import:
http://download.geonames.org/export/dump/admin2Codes.txt
...and this is my table:
CREATE TABLE [Admin2Codes](
[code] [VARCHAR](20) NOT NULL,
[name] [NVARCHAR](200) NOT NULL,
[asciiname] [NVARCHAR](200) NOT NULL,
[geonameId] [INT] NOT NULL
)
I can't spot what the problem is. It works if I only have one row in the file, but as soon as there's more than one row, it fails. The line endings in the file appear to be \n and that matches my SQL:
BULK INSERT dbo.Admin2Codes FROM 'D:\admin2codes.txt'
WITH(
DATAFILETYPE = 'widechar',
FIELDTERMINATOR = '\t',
ROWTERMINATOR = '\n'
)
GO
Looks like the data uses the line feed as a row terminator that is used in UNIX instead of the carriage return and line feed used in windows. Try this instead:
ROWTERMINATOR = '''+CHAR(10)+'''
Related
I am trying to bulk insert few records in a table test from a CSV file ,
CREATE TABLE Level2_import
(wkt varchar(max),
area VARCHAR(40),
)
BULK
INSERT level2_import
FROM 'D:\test.csv'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
The bulk insert code should rid of the first row and insert the data into the table . it gets rid of first row alright but gets confused in the delimiter section . The first column is wkt and the column value is double quoted and has comma within the value .
So I guess I question is if there is a way to tell the BULK INSERT that the double quoted part is one column regardless of the comma within it ?
the CSV file looks like this ,
"MULTIPOLYGON (((60851.286135090661 510590.66974495345,60696.086128673756 510580.56976811233,60614.7860844061 510579.36978015327,60551.486015895614)))", 123123.22
You need to use a 'format file' to implement a text qualifier for bulk insert. Essentially, you will need to teach the bulk insert that there's potentially different delimiters in each field.
Create a text file called "level_2.fmt" and save it.
11.0
2
1 SQLCHAR 0 8000 "\"," 1 wkt SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 40 "\r\n" 2 area SQL_Latin1_General_CP1_CI_AS
The first line, "11.0" refers to your version of SQL. The second line shows that your table, [level2_import], has two columns. Each line after that will describe a column, and obeys the following format:
[Source Column Number][DataType][Min Size][Max Size][Delimiter pattern][Destination Column Number][Destination Column Name][Case sensitivity of database]
Once you've created that file, you can read in your data with the following bulk insert statement:
BULK INSERT level2_import
FROM 'D:\test.csv'
WITH
(
FIRSTROW = 2,
FORMATFILE='D:\level_2.fmt'
);
Refer to this blog for a detailed explanation of the format file.
SQL Server 2017 finally added support for text qualifiers and the CSV format defined in RFC 4180. It should be enough to write :
BULK INSERT level2_import
FROM 'D:\test.csv'
WITH ( FORMAT = 'CSV', ROWTERMINATOR = '\n', FIRSTROW = 2 )
Try removing .fmt to the file and use .txt instead, that worked for me
I have this issue working with LDAP data the dn contains commas, as do other fields that contain dns. Try changing your field terminator to another, unused character, like a pipe | or Semicolon ;. Do this in the data and the file definition.
so the code should be:
CREATE TABLE Level2_import
(wkt varchar(max),
area VARCHAR(40),
)
BULK
INSERT level2_import
FROM 'D:\test.csv'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = ';',
ROWTERMINATOR = '\n'
)
and your CSV:
"MULTIPOLYGON (((60851.286135090661 510590.66974495345,60696.086128673756 510580.56976811233,60614.7860844061 510579.36978015327,60551.486015895614)))"; 123123.22
I am trying to dump the following txt into a table (using the wizard did not work either)
http://download.geonames.org/export/dump/admin1CodesASCII.txt
using the following
drop table tempregions
create table TempRegions
(
code varchar(500),
name varchar(500),
asciiName varchar(500),
somenumber varchar(500)
);
BULK INSERT GeoNames
FROM 'C:\Users\Administrator\Desktop\geonames\admin1CodesASCII.txt'
WITH(
FIELDTERMINATOR = '\t',
ROWTERMINATOR = '\r\n'
)
go
getting the following error (using sql server 2012)
Msg 4864, Level 16, State 1, Line 10
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 1 (geonameid).
thanks for the help
The text contains non-standard ASCII characters and you fail to define a codepage. The error is there to protrect you. Find and define an appropriate codepage as per syntax at http://msdn.microsoft.com/en-us/library/ms188365.aspx
For some reason, the SSIS Lookup transformation seems to be checking the cache for a NCHAR(128) value instead of a NVARCHAR(128) value. This results in a whole bunch of appended whitespace on the value being looked up and causes the lookup to fail to find a match.
On the Lookup transformation, I configured it to have No Cache so that it always goes to the database so I could trace with SQL Profiler and see what it was looking up. This is what it captured (notice the whitespace ending at the single quote on the second last line - requires horizontal scrolling):
exec sp_executesql N'
select *
from (
SELECT SurrogateKey, NaturalKey, SomeInt
FROM Dim_SomeDimensionTable
) [refTable]
where [refTable].[NaturalKey] = #P1
and [refTable].[SomeInt] = #P2'
,N'#P1 nchar(128)
,#P2 smallint'
,N'VALUE '
,8
Here's the destination table's schema:
CREATE TABLE [dbo].[dim_SomeDimensionTable] (
[SurrogateKey] [int] IDENTITY(1,1) NOT NULL,
[NaturalKey] [nvarchar](128) NOT NULL,
[SomeInt] [smallint] NOT NULL
)
What I am trying to figure out is why SSIS is checking the NaturalKey value as NCHAR(128) and how I can get it to perform the lookup as NVARCHAR(128) without the whitespace.
Things I've tried:
I have LTRIM() and RTRIM() on the SQL Server source query.
Before the Lookup, I have used a Derived Column transformation to add a new column with the original value TRIM()'d (this trimmed column is the one I'm passing to the Lookup transformation).
Before and after the Lookup, I multicasted the rows and sent them to a unicode Flat File Destination and there was no white space in either case.
Before the lookup, I looked at the metadata on the data flow path and it shows the value as having data type DT_WSTR with length 128.
Any ideas would be greatly appreciated!
It doesn't make any difference.
You need to look elsewhere for the source of your problem (perhaps the column has a case sensitive collation for example).
Trailing white space is only significant to SQL Server in LIKE comparisons, not = comparisons as documented here.
SQL Server follows the ANSI/ISO SQL-92 specification (Section 8.2,
, General rules #3) on how to compare strings
with spaces. The ANSI standard requires padding for the character
strings used in comparisons so that their lengths match before
comparing them. The padding directly affects the semantics of WHERE
and HAVING clause predicates and other Transact-SQL string
comparisons. For example, Transact-SQL considers the strings 'abc' and
'abc ' to be equivalent for most comparison operations.
The only exception to this rule is the LIKE predicate...
You can also easily see this by running the below.
USE tempdb;
CREATE TABLE [dbo].[Dim_SomeDimensionTable] (
[SurrogateKey] [int] IDENTITY(1,1) NOT NULL,
[NaturalKey] [nvarchar](128) NOT NULL,
[SomeInt] [smallint] NOT NULL
)
INSERT INTO [dbo].[Dim_SomeDimensionTable] VALUES ('VALUE',8)
exec sp_executesql N'
select *
from (
SELECT SurrogateKey, NaturalKey, SomeInt
FROM Dim_SomeDimensionTable
) [refTable]
where [refTable].[NaturalKey] = #P1
and [refTable].[SomeInt] = #P2'
,N'#P1 nchar(128)
,#P2 smallint'
,N'VALUE '
,8
Which returns the single row
I'm having problems trying to insert strings containing UTF-8 encoded Chinese characters and punctuations into a SQL Server 2008 table (default installation) from my Delphi 7 application using Zeosdb native SQL Server library.
I remembered in the past I had problems inserting UTF8 string into SQL Server even using PHP and other methods so I believe that this problem is not unique to Zeosdb.
It doesn't happen all the time, some UTF8 encoded strings can get inserted successfully but some not. I can't figure out what is it in the string that caused the failure.
Table schema:
CREATE TABLE [dbo].[incominglog](
[number] [varchar](50) NULL,
[keyword] [varchar](1000) NULL,
[message] [varchar](1000) NULL,
[messagepart1] [varchar](1000) NULL,
[datetime] [varchar](50) NULL,
[recipient] [varchar](50) NULL
) ON [PRIMARY]
SQL statement template:
INSERT INTO INCOMINGLOG ([Number], [Keyword], [Message], [MessagePart1], [Datetime], [Recipient])
VALUES('{N}', '{KEYWORD}', '{M}', '{M1}', '{TIMESTAMP}', '{NAME}')
The parameter {KEYWORD}, {M} and {M1} can contain UTF8 string.
For example, the following statement will return an error:
Incorrect syntax near 'é¢'. Unclosed quotation mark after the character string '全力克æœå››ç§å±é™©','2013-06-19 17:07:28','')'.
INSERT INTO INCOMINGLOG ([Number], [Keyword], [Message], [MessagePart1], [Datetime], [Recipient])
VALUES('+6590621005', '题', '题 [全力克æœå››ç§å±é™© åšå†³æ‰«é™¤ä½œé£Žä¹‹å¼Š]', '[全力克æœå››ç§å±é™©','2013-06-19 17:07:28', '')
Note: Please ignore the actual characters as the utf8 encoding is lost after copy and paste.
I've also tried using NVARCHAR instead of VARCHAR:
CREATE TABLE [dbo].[incominglog](
[number] [varchar](50) NULL,
[keyword] [nvarchar](max) NULL,
[message] [nvarchar](max) NULL,
[messagepart1] [nvarchar](max) NULL,
[datetime] [varchar](50) NULL,
[recipient] [varchar](50) NULL
) ON [PRIMARY]
And also tried amending the SQL statement into:
INSERT INTO INCOMINGLOG ([Number],[Keyword],[Message],[MessagePart1],[Datetime],[Recipient]) VALUES('{N}',N'{KEYWORD}',N'{M}',N'{M1}','{TIMESTAMP}','{NAME}')
They don't work either. I would appreciate any pointer. Thanks.
EDITED: As indicated by marc_s below, the N prefix must be outside the single quotes. It is correct in my actual test, the initial statement is a typo, which I've corrected.
The test with the N prefix also returned an error:
Incorrect syntax near 'åŽŸæ ‡é¢'. Unclosed quotation mark after the
character string '全力克�四��险','2013-06-19
21:22:08','')'.
The SQL statement:
INSERT INTO INCOMINGLOG ([Number],[Keyword],[Message],[MessagePart1],[Datetime],[Recipient]) VALUES('+6590621005',N'åŽŸæ ‡é¢˜',N'åŽŸæ ‡é¢˜ [全力克æœ?å››ç§?å?±é™© å?šå†³æ‰«é™¤ä½œé£Žä¹‹å¼Š]',N'[全力克æœ?å››ç§?å?±é™©','2013-06-19','')
.
.
REPLY TO gbn's Answer: I've tried using parameterized SQL but still encountering "Unclosed quotation mark after the character string" error.
For the new test, I used a simplified SQL statement:
INSERT INTO INCOMINGLOG ([Keyword],[Message]) VALUES(:KEYWORD,:M)
The error returned for the above statement:
Incorrect syntax near 'åŽŸæ ‡é¢'. Unclosed quotation mark after the
character string '')'.
For info, the values of KEYWORD and M are:
KEYWORD:åŽŸæ ‡é¢˜
M:åŽŸæ ‡é¢˜ [
.
.
.
Further tests on 20th June Parametarized SQL query don't work so I tried a different approach by trying to isolate the character that caused the error. After trial and error, I managed to identify the problematic character.
The following character produces an error: 题
SQL Statement: INSERT INTO INCOMINGLOG ([Keyword]) VALUES('题')
Interestingly, note that the string in the return error tax contains a "?" character which didn't exist in the original statement.
Error: Unclosed quotation mark after the character string '�)'. Incorrect syntax near '�)'.
If I were to place some latin characters immediately after the culprit character, there will be no error. For example, INSERT INTO INCOMINGLOG ([Keyword]) VALUES('题Ok') works ok. Note: It doesn't work with all characters.
There are ' characters in the UTF-8 which abnormally terminate the SQL.
Classic SQL injection.
Use proper parametrisation, not string concatenation basically.
Edit, after Question updates...
Without the Delphi code, I don't think we can help you
All SQL side code works. For example, this works in SSMS
DECLARE #t TABLE ([Keyword] nvarchar(100) COLLATE Chinese_PRC_CI_AS);
INSERT INTO #t ([Keyword]) VALUES('题');
INSERT INTO #t ([Keyword]) VALUES(N'题');
SELECT * FROM #t T;
Something is missing to help us fic this
Also see
How to store UTF-8 bytes from a C# String in a SQL Server 2000 TEXT column
Having trouble with UTF-8 storing in NVarChar in SQL Server 2008
Write utf-8 to a sql server Text field using ADO.Net and maintain the UTF-8 bytes
I am trying to import email communication into a database table using Bulk Insert but I can't seem to be able to preserve the CR and LF characters. Let's consider the following:
CREATE TABLE myTable (
Email_Id int,
Email_subject varchar(200) NULL,
Email_Body TEXT NULL
)
The bulk insert statement has the following:
codepage = '1250',
fieldterminator = '<3P4>',
rowterminator = '<3ND>',
datafiletype = 'char'
The file contains full emails (including CR and LF characters). I would like to import the data and include the CR and LF characters. I have read that BULK INSERT treats each entry as a single row but does that mean it strips out the CR and LF characters? If so, what can I use to import this CSV file? I don't have access to SSIS and I would prefer to use SQL code to do it.
Example data:
11324<3P4>Read this email because it's urgent<3P4>Haha John,
I lied, the email was just to mess with you!
Your Nemesis,
Steve
P.S. I still hate you!
<3ND>
11355<3P4>THIS IS THE LAST STRAW<3P4>Steve,
I have had it with you stupid jokes, this email is going to the manager.
Good day,
John
<3ND>
It should import with the carriage returns and linefeeds, even if you don't see them in some tools. We would import XSL this way and it would preserve all of the line formatting.