I have a csv file that has column values enclosed within double quotes.
I want to import a csv file from a network path using an sql statement.
I tried bulk insert. But it imports along with double quotes. Is there any other way to import a csv file into SQL Server 2008 using an sql statement by ignoring the text qualifier double quote?
Thanks
-Vivek
You could use a non-xml format file to specify a different delimiter per column. For values enclosed with double quotes, and delimited by tabs, the delimiter could be \",\". You'd have to add an initial unused column to capture the first quote. For example, to read this file:
"row1col1","row1col2","row1col3"
"row2col1","row2col2","row2col3"
"row3col1","row3col2","row3col3"
You could use this format file:
10.0
4
1 SQLCHAR 0 50 "\"" 0 unused ""
2 SQLCHAR 0 50 "\",\"" 1 col1 ""
3 SQLCHAR 0 50 "\",\"" 2 col2 ""
4 SQLCHAR 0 50 "\"\r\n" 3 col3 ""
(The number on the first line depends on the SQL Server version. The number on the second line is the number of columns to read. Don't forget to adjust it.)
The bulk insert command accepts a formatfile = 'format_file_path' parameter where you can specify the format file. For example:
BULK INSERT YourTable
FROM 'c:\test\test.csv'
WITH (FORMATFILE = 'c:\test\test.cfmt')
This results in:
select * from YourTable
-->
col1 col2 col3
row1col1 row1col2 row1col3
row2col1 row2col2 row2col3
row3col1 row3col2 row3col3
This is a known issue when importing files with text delimiters as the bcp/bulk insert utilities don't allow you to specify a text delimiter. See this link for a good discussion.
#Andomar's anaswer got me 99% of the way there with a very similar problem. However, I found SQL Server 2014 failed to import the last row because the last field didn't have the new line characters: \r\n.
So my format file looked more like:
12.0
4
1 SQLCHAR 0 50 "\"" 0 unused ""
2 SQLCHAR 0 50 "\",\"" 1 col1 ""
3 SQLCHAR 0 50 "\",\"" 2 col2 ""
4 SQLCHAR 0 50 "\"" 3 col3 ""
And so for my file, which had a row with field names, the import SQL became:
BULK INSERT MyTable
FROM 'C:\mypath\datafile.csv'
WITH (
FIRSTROW = 2,
FORMATFILE = 'C:\mypath\formatfile.cfmt',
ROWTERMINATOR = '\r\n'
)
The actual CSV had 40 fields so it was helpful to read on Microsoft's website that it is not necessary to write the column names (col1 - col40 works just fine) and also that the fourth parameter in each line - 50 in the example, just needs to be the maximum field length, not exact.
Related
I am exporting to a text file from BTEQ and I am getting whitespace padding to the maximum length of each of my columns in my output text file. For example I just want customer_name and post_code columns to look like;
Mr Always Teste,AB10 1AB,
but it on my file it is like;
Mr Always Teste ,AB10 1AB ,
I just want the data I need and not all the whitespace at the end as I need to import the data cleanly after exporting.
My export script contains:
.SET TITLEDASHES OFF
.SET SEPARATOR ','
.SET FORMAT OFF (ON MAKES IT ALL WEIRD)
.SET NULL ''
.SET WIDTH 1000
Forgive me I can't paste any data as it's on another pc and its all confidential anyway.
Example column definitions are (they are all like this with varying lengths):
Name: customer_name Type: CV Format: X(208) Max Length: 208
Like I say, this and all the other columns pad out to their length with whitespace in the output file. Anything I can do about it?
REPORT format in BTEQ is fixed width, setting the SEPERATOR will not remove spaces. But you might return a single column only using the CSV function to return a delimited string:
with cte as
(
select * from tab
)
select *
from table(CSV(new variant_type(cte.col1 -- list all columns here
,cte.col2
,cte.col3)
,',' -- seperator
,'"') -- string delimiter
returns (s varchar(10000)) as t;
This is much easier and better performing than CONCAT & COALESCE all columns.
I want to query data from a tab delimited file using SQL Server and OPENROWSET.
I have the following sample source file:
FirstName LastName EMail
Marny Haney sed.dictum.eleifend#sem.com
Alexa Carpenter Vivamus.non.lorem#consectetuereuismod.com
Wyatt Mosley est#tortoratrisus.org
Cedric Johns lectus.a.sollicitudin#quisurna.ca
Lavinia Fischer nibh#insodales.net
Vera Marshall scelerisque#sapienAeneanmassa.co.uk
Beau Frost vel.quam.dignissim#mauris.net
Halla Fisher amet.metus.Aliquam#ullamcorpervelit.co.uk
Sierra Randall Nulla#magnis.net
Noel Malone semper#porttitor.org
I'm using the following format file:
12.0
3
1 SQLCHAR 0 5 "" 1 FirstName SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 5 "" 2 LastName SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 27 "0x0A" 3 EMail SQL_Latin1_General_CP1_CI_AS
I'm trying to query the data from the file with the following statement:
SELECT *
FROM
OPENROWSET(
BULK 'C:\data\Source\sample_data.dwo'
,FORMATFILE= 'C:\data\Format\sample_data.FMT'
,FIRSTROW = 2
) AS a
Unfortunately, the query returns an empty result. I don't get an error.
As far as I understood the default Terminator for fields is \t. I also tried to use t and \t explicitly as a terminator but still no result.
Any suggestions what I can try next?
Link to both files:
https://github.com/LordTakeshiXVII/files/blob/master/sample_data.FMT
https://github.com/LordTakeshiXVII/files/blob/master/sample_data.dwo
You need to adapt your format file:
First change the max-length of the fields to something appropriate (100 in the example) - you can also set it to zero for unlimited input length.
Second set the terminator for the first two fields to \t and of the third field to \r\n
12.0
3
1 SQLCHAR 0 100 "\t" 1 FirstName SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 100 "\t" 2 LastName SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 100 "\r\n" 3 EMail SQL_Latin1_General_CP1_CI_AS
Here you can find more information on format files: https://learn.microsoft.com/en-us/sql/relational-databases/import-export/create-a-format-file-sql-server?view=sql-server-2017
I have a table that contains the names of various recording artists. One of them has a dash in their name. If I run the following:
Select artist
, substring(artist,8,1) as substring_artist
, ascii(substring(artist,8,1)) as ascii_table
, ascii('-') as ascii_dash_key /*The dash key next to zero */
, len(artist) as len_artist
From [dbo].[mytable] where artist like 'Sleater%'
Then the following is returned. This seems to indicate that a dash (ascii 45) is being stored in the artist column
However, if I change the where clause to:
From [dbo].[mytable] where artist like 'Sleater' + char(45) + '%'
I get no results returned. If I copy and paste the output from the artist column into a hex editor, I can see that the dash is actually stored as E2 80 90, the Unicode byte sequence for the multi-byte hyphen character.
So, I'd like to find and replace such occurrences with a standard ascii hyphen, but I'm am at a loss as to what criteria to use to find these E2 80 90 hyphens?
Your char is the hyphen, information on it here :
https://www.charbase.com/2010-unicode-hyphen
You can see that the UTF16 code is 2010 so in T-SQL you can build it with
SELECT NCHAR(2010)
From there you can use any SQL command with that car, for example in a select like :
Select artist
From [dbo].[mytable] where artist like N'Sleater' + NCHAR(2010) + '%'
or as you want in a
REPLACE( artist, NCHAR(2010), '-' )
with a "real" dash
EDIT:
If the collation of your DB give you some trouble with the NCHAR(2010) you can also try to use the car N'‐' that you'll copy/paste from the charbase link I gave you so :
REPLACE( artist , N'‐' , '-' )
that you can even take from the string here (made with the special car) so all made for you :
update mytable set artist=REPLACE( artist, N'‐' , '-' )
I don't know your table definition and COLLATION but I'm almost sure that you are mixing NCHAR and CHAR types and convert unicode, multibyte characters to sinle byte representations. Take a look at this demo:
WITH Demo AS
(
SELECT N'ABC'+NCHAR(0x2010)+N'DEF' T
)
SELECT
T,
CASE WHEN T LIKE 'ABC'+CHAR(45)+'%' THEN 1 ELSE 0 END [Char],
CASE WHEN T LIKE 'ABC-%' THEN 1 ELSE 0 END [Hyphen],
CASE WHEN T LIKE N'ABC‐%' THEN 1 ELSE 0 END [Unicode-Hyphen],--unicode hyphen us used here
CASE WHEN T LIKE N'ABC'+NCHAR(45)+N'%' THEN 1 ELSE 0 END [NChar],
CASE WHEN CAST(T AS varchar(MAX)) LIKE 'ABC-%' THEN 1 ELSE 0 END [ConvertedToAscii],
ASCII(NCHAR(0x2010)) ConvertedToAscii,
CAST(SUBSTRING(T, 4, 1) AS varbinary) VarbinaryRepresentation
FROM Demo
My results:
T Char Hyphen Unicode-Hyphen NChar ConvertedToAscii ConvertedToAscii VarbinaryRepresentation
------- ----------- ----------- -------------- ----------- ---------------- ---------------- --------------------------------------------------------------
ABC‐DEF 0 0 1 0 1 45 0x1020
UTF-8 (3 bytes) representation is the same as 2010 in unicode.
error im getting
This is to insert into an already created table:
CREATE TABLE SERIES(
SERIES_NAME VARCHAR(225) NOT NULL UNIQUE, --MADE VARCHAR(225) & UNIQUE FOR FK REFERENCE
ONGOING_SERIES BIT, --BOOL FOR T/F IF SERIES IS COMPLETED OR NOT
RUN_START DATE,
RUN_END DATE,
MAIN_CHARACTER VARCHAR(20),
PUBLISHER VARCHAR(12),
S_ID INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
CONSTRAINT chk_DATES CHECK (RUN_START < RUN_END)
)
and the text file is organized as:
GREEN LANTERN,0,2005-07-01,2011-09-01,HAL JORDAN,DC
SPIDERMAN,0,2005-07-01,2011-09-01,PETER PARKER,MARVEL
I have already tried adding commas to the end of each line in .txt file
I have also tried adding ,' ' to the end of each line.
Any suggestions?
Indeed, the KEEPIDENTITY prevents the bulk insert from taken place. Removing the statement however won't resolve the problem.
Msg 4864, Level 16, State 1, Line 13
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 7 (S_ID).
The bulk insert expects to update all the columns. Another way of solving this issue is adding a format file for the text file, see MS Docs - Use a Format File to Bulk Import Data
You can create a format file for your text file with the following command.
bcp yourdatabase.dbo.series format nul -c -f D:\test.fmt -t, -T
Remove the last row, update the number of columns, and replace the last comma with the row terminator. The result will look like as shown below.
13.0
6
1 SQLCHAR 0 255 "," 1 SERIES_NAME SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 1 "," 2 ONGOING_SERIES ""
3 SQLCHAR 0 11 "," 3 RUN_START ""
4 SQLCHAR 0 11 "," 4 RUN_END ""
5 SQLCHAR 0 510 "," 5 MAIN_CHARACTER SQL_Latin1_General_CP1_CI_AS
6 SQLCHAR 0 510 "\r\n" 6 PUBLISHER SQL_Latin1_General_CP1_CI_AS
Remove KEEPIDENTIY from your BULK INSERT, since that specifies that you want to use the values in the source text file as your IDENTITY.
If this still fails, try adding a VIEW on the table that excludes the IDENTITY field, and INSERT into that instead, e.g.:
CREATE VIEW SeriesBulkInsertTarget
AS
SELECT Series_Name,
Ongoing_Series,
Run_Start,
Run_End,
Main_Character,
Publisher
FROM SERIES
When using nzload for fixed width where the first row are headers for the column, the skiprow works fine. But when I
Works fine if the 1st row has the same number of elements.
1HelloWorld2011-12-07
1HelloWorld2011-12-07
2Netezza 2010-02-16
The first row has a single text that I want nzload to skiprow on but because it's not the same number of elements, nzload throws an error
DummyRow
1HelloWorld2011-12-07
2Netezza 2010-02-16
Script example:
nzload -t "textFixed_tbl" -format fixed -layout "col1 int bytes 1, col2 char(10) bytes 10, col3 date YMD '-' bytes 10" -df /tmp/fixed_width.dat -bf /tmp/testFixedWidth.bad -lf /tmp/testFixedWidth.nzlog -skipRows 1 -maxErrors 1
Data File
DummyRow
1HelloWorld2011-12-07
2Netezza 2010-02-16
Error:
Error: Operation canceled
Error: External Table : count of bad input rows reached maxerrors limit
Record Format: FIXED Record Null-Indicator: 0
Record Length: 0 Record Delimiter:
Record Layout: 3 zones : "col1" INT4 DECIMAL BYTES 1 NullIf &&1 = '', "col2" CHAR(10) INTERNAL BYTES 10, "col3" DATE YMD '-' BYTES 10 NullIf &&3 = ''
Statistics
number of records read: 1
number of bytes read: 22
number of records skipped: 0
number of bad records: 1
number of records loaded: 0
Elapsed Time (sec): 0.0
The skiprows option for nzload / external tables discards the specified number number of rows, but it still processes the skipped rows. Consequently the rows must be properly formed, and this behavior won't act as you hoped/intended.
This is noted in the documentation:
You cannot use the SkipRows option for header row processing in a data file, because even the skipped rows are processed first. Therefore, data in the header rows should be valid with respect to the external table definition