BULK INSERT from file which has extra values

BULK INSERT from file which has extra values - sql-server

How can I tell the format file that the column in csv file should be ignored. I tried putting 0s and I get invalid column number error:
Format file:
10.0
9
0 SQLCHAR 0 12 "\t" 1 ID ""
2 SQLCHAR 0 10 "\t" 2 Symbol SQL_Latin1_General_CP1_CI_AS
0 SQLCHAR 0 11 "\t" 3 DateDone ""
0 SQLCHAR 0 19 "\t" 4 TimeDone ""
4 SQLCHAR 0 10 "\t" 5 Side SQL_Latin1_General_CP1_CI_AS
5 SQLCHAR 0 12 "\t" 6 Size ""
6 SQLCHAR 0 41 "\t" 7 Price ""
7 SQLCHAR 0 10 "\t" 8 Exchange SQL_Latin1_General_CP1_CI_AS
8 SQLCHAR 0 12 "\r\n" 9 Position ""
Sample row of csv data
------------------------------------------------------------------------------------------------------------------------
|AccountName || ExecSymbol || ExecDateTime || ExecSide || ExecSize || ExecPrice || ExecExchange || PositionSize|
------ ------------ ---------------- ------------ ----------- ---------- ------------- -------------
PRIMU$ || SCO || 1/2/2013 || B || 100 || 38.87 || ARCA || 100

The easiest way is to create an 'fmt' file that can specify what you want to import and what you to ignore:
https://msdn.microsoft.com/en-us/library/ms179250.aspx

Related

Unicode/Collation Issue in Openrowset SQL Server

My CSV has text like this:
Côté fenêtres,
carré
I'm trying to open this CSV file using openrowset in SQL Server like below:-
select * from openrowset(BULK 'C:\Import_Orders\Files\PO.csv',
FORMATFILE = 'C:\Import_Orders\Format\Cust_441211.fmt.txt') as PO
But the result is like this:
C+¦t+¬ fen+¬tres,
Carr+¬
How can I tackle this issue? Let me know if I need to add anything more to this question.
SQL Version -
Microsoft SQL Server 2017 (RTM-CU29-GDR) (KB5014553) - 14.0.3445.2 (X64)
This is the format file:-
11.0
8
1 SQLCHAR 0 250 "|" 1 PARTNO ""
2 SQLCHAR 0 250 "|" 2 CODE ""
3 SQLCHAR 0 250 "|" 3 PRICEKG ""
4 SQLCHAR 0 250 "|" 4 FOOTKG ""
5 SQLCHAR 0 250 "|" 5 LENGTH ""
6 SQLCHAR 0 250 "|" 6 QTY ""
7 SQLCHAR 0 250 "|" 7 COLOR ""
8 SQLCHAR 0 250 "\r\n" 8 TOTKG ""

(1) You can try to add an additional parameter CODEPAGE = '65001' to specify a code page to support UNICODE characters.
(2) Use may try to use SQLNCHAR data type instead of SQLCHAR in the format file. For a text file you should always specify SQLCHAR for all fields, unless you have a Unicode file in in UTF‑16 encoding in which case you should use SQLNCHAR.
SQL
SELECT * FROM openrowset(BULK 'C:\Import_Orders\Files\PO.csv',
FORMATFILE = 'C:\Import_Orders\Format\Cust_441211.fmt.txt',
CODEPAGE = '65001') as PO;

Sql Server Bulk insert via file separated with special charactors

I have File to be imported into SQL Server table, I am importing that via BCP command via command line in C# when I pass Comma(,) as Separator in format file then it works fine, but when I try to pass Special char this as separator then it fails and giving me below error.
XML parsing: line 2, character 0, incorrect document syntax
Note: due to some reasons Stackoverflow not showing my special character, Please copy format file into your text editor like notepad++ or something else.
My format file as following for special char.
12.0
20
1 SQLCHAR 0 0 "" 2 Column1 SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 0 "" 3 Column2 SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 0 "" 0 XyzColumnToBypass ""
4 SQLCHAR 0 0 "" 0 XyzColumnToBypass ""
5 SQLCHAR 0 0 "" 0 XyzColumnToBypass ""
6 SQLCHAR 0 0 "" 0 XyzColumnToBypass ""
7 SQLCHAR 0 0 "" 0 XyzColumnToBypass ""
8 SQLCHAR 0 0 "" 0 XyzColumnToBypass ""
9 SQLCHAR 0 0 "" 0 XyzColumnToBypass ""
10 SQLCHAR 0 0 "" 0 XyzColumnToBypass ""
11 SQLCHAR 0 0 "" 0 XyzColumnToBypass ""
12 SQLCHAR 0 0 "" 0 XyzColumnToBypass ""
13 SQLCHAR 0 0 "" 7 Column4 SQL_Latin1_General_CP1_CI_AS
14 SQLCHAR 0 0 "" 6 Column5 SQL_Latin1_General_CP1_CI_AS
15 SQLCHAR 0 0 "" 5 Column6 SQL_Latin1_General_CP1_CI_AS
16 SQLCHAR 0 0 "" 0 XyzColumnToBypass ""
17 SQLCHAR 0 0 "" 0 XyzColumnToBypass ""
18 SQLCHAR 0 0 "" 0 XyzColumnToBypass ""
19 SQLCHAR 0 0 "" 0 XyzColumnToBypass ""
20 SQLCHAR 0 0 "\r\n" 0 XyzColumnToBypass ""
I also tried via SQL server using below query but got the same error.
BULK INSERT mySqlServerTable
FROM '\\machinexxx\Shared\BCP\TextfileToImport_SpecialChar.dat'
WITH(FORMATFILE = '\\machinexxx\Shared\BCP\formatfile.fmt')
I don't know why it accessing my non-XML format file as xml format file while my encoding for the format file is ASCII.

how to import csv file into sybaseASE with less columns than table field by using format file?

I'm using BCP to load data into SybaseASET table under UNIX
I have a temp.csv file with 4 columns:
name | id | attr1 | attr2
FIERA|20138||
SECOR|73328||
WELLINGTON|92413||
template table with two extra columns was defined like below:
create table template
(name varchar(10),
id int,
attr1 varchar(5) default '',
attr2 varhcar(5) default '',
creation_time datetime null,
active_flag char(1) null)
bcp.fmt format file:
10.0
7
1 SYBCHAR 0 10 "|" 1 name
2 SYBINT4 0 4 "|" 2 id
3 SYBCHAR 0 5 "|" 3 attr1
4 SYBCHAR 0 5 "|" 4 attr2
5 SYBDATETIME 0 8 "|" 0 creation_time
6 SYBCHAR 0 1 "|" 0 active_flag
7 SYBCHAR 0 10 "\r\n" 0 end
My purpose is to import all values include blank of temp.csv file into template table, leave the last two fields creation_time and active_flag as null.
I use command:
bcp client..template in temp.csv -F2 -f bcp.fmt -U -P -S
However, I always got the following error:
Unexpected EOF encountered in BCP data-file.
bcp copy in partially failed
I double checked my temp.csv file, all row terminator is \r\n as I listed in fromat file, why I still got unexpected EOF error?
Struggled too many times, all failed. Could anybody help me out? Thanks
===================update on Feb.06=================
Thank you James, I update format file to below as what you indicate:
10.0
6
1 SYBCHAR 0 0 "|" 5 creation_time
2 SYBCHAR 0 0 "|" 6 active_flag
3 SYBCHAR 0 10 "|" 1 name
4 SYBCHAR 0 4 "|" 2 id
5 SYBCHAR 0 5 "|" 3 attr1
6 SYBCHAR 0 5 "\r\n" 4 attr2
then I was prompt "Incorrect host-column number found in bcp format file"
===========================================================================
============SOLUTION IS HERE=============
first solution:
10.0
4
1 SYBCHAR 0 10 "|" 1 name
2 SYBCHAR 0 4 "|" 2 id
3 SYBCHAR 0 5 "|" 3 attr1
4 SYBCHAR 0 5 "\r\n" 4 attr2
second solution:
10.0
6
1 SYBCHAR 0 10 "|" 1 name
2 SYBCHAR 0 4 "|" 2 id
3 SYBCHAR 0 5 "|" 3 attr1
4 SYBCHAR 0 5 "\r\n" 4 attr2
5 SYBCHAR 0 0 "" 5 active_flag
6 SYBCHAR 0 0 "" 6 creation_time
Both work perfectly

There are a few problems with your format file.
According to Sybase documentation, you should be using SYBCHAR exclusively:
Host file datatype
The host file datatype refers to the storage format of the field in
the host data file, not the datatype of the database table column.
The DBMS knows the datatype of its columns; it does not know how the input file is encoded.
Remember that the first element in the lines describing a column (3 onward) indicates the column in the file. Your data file has no column 5-7. I suspect that's the field provoking the error message.
Also afaik 0 is not a valid colid in the target table. If you want to indicate NULL for a particular column, say it starts at the beginning and has no length,
1 SYBCHAR 0 0 "|" 7 active_flag
Finally, there's no need to account for the row-terminator in the format file. You do that on the bcp command line with the -r option. If you're using Windows, IIRC that would become
bcp client..template in temp.csv -F2 -f bcp.fmt -r \r\n -U -P -S
In Linux of course you'd have to quote or escape the backslashes.
Edit: for clarity, here's what I think your file should look like,
10.0
6
1 SYBCHAR 0 0 "|" 5 creation_time
1 SYBCHAR 0 0 "|" 6 active_flag
1 SYBCHAR 0 10 "|" 1 name
2 SYBCHAR 0 4 "|" 2 id
3 SYBCHAR 0 5 "|" 3 attr1
4 SYBCHAR 0 5 "" 4 attr2
If that doesn't work, you'll have to find someone who, um, knows the answer. I don't have a system handy to test on.
Nothing prevents any part of the data file from being mapped to many columns. In field 1 of your format file, though, you mention data file columns 5 & 6, but your data file has only 4 columns. I think that's what the error message it telling you.
do you mean all datatype I put in format file should be 'SYBCHAR'
Yes. The format file can describe text or binary files. Your file is text, so all your data (in the file) are SYBCHAR.

Importing utf-8 encoded data from csv to SQL Server using bulk insert

I am trying to import raw data that I have in .csv format to my table in SQL Server. My table is called [raw].[sub_brand_channel_mapping].
The last column ImportFileId is generated by my python code. I first set all the rows of that column as the default value generated. Then I use bulk insert to import my csv data in utf-8 format to my table in SQL Server. However, during the process a lot of special characters are changing. I am using a format file like this
10.0
8
1 SQLCHAR 0 100 "\t" 1 sub_brand_id ""
2 SQLCHAR 0 1024 "\t" 2 sub_brand_name SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 100 "\t" 3 channel_country_id ""
4 SQLCHAR 0 1024 "\t" 4 channel_id ""
5 SQLCHAR 0 1024 "\t" 5 channel_name SQL_Latin1_General_CP1_CI_AS
6 SQLCHAR 0 256 "\t" 6 status ""
7 SQLCHAR 0 256 "\t" 7 eff_start_date ""
8 SQLCHAR 0 256 "\r\n" 8 eff_end_date ""
My bulk insert command looks like this:
bcp "{table}" in "{file}" {connect_string} -f {format_file} -F {first_row} - b 10000000 -e {error_file} -q -m {max_errors}
My csv files have "\t" as the delimiter. I need to import the exact names without any change. What should I do ?
P.S. I did try converting my utf-8 encoded csv to utf-16-le and then use "-w" in my bcp command. It is giving a lot of errors. In short it didn't work. Please advise me on how to do this.

How to add column to SQL Server bcp query?

I'm beginner in SQL Server, when I write this query:
select ANUMBER
from CDRTABLE
it shows me data, but I want to add new column to result change that query to this:
select '028', ANUMBER
from CDRTABLE
This query adds a new column to query result, so I write this bcp query for saving results to a text file:
EXEC xp_cmdshell 'bcp "SELECT rtrim(ltrim(ANUMBER)),rtrim(ltrim(BNUMBER)),rtrim(ltrim(DATE)),rtrim(ltrim(TIME)),rtrim(ltrim(DURATION)) FROM [myTestReport].[dbo].[CDRTABLE]" queryout f:\newOUTPUT.txt -S DESKTOP-A5CFJSH\MSSQLSERVER1 -Umyusername -Pmypassword -f "f:\myFORMAT.fmt" '
and my format file is this:
9.0
5
1 SQLNCHAR 0 5 "," 1 ANUMBER ""
2 SQLNCHAR 0 10 "," 2 BNUMBER ""
3 SQLNCHAR 0 10 "," 3 DATE ""
4 SQLNCHAR 0 10 "," 4 TIME ""
5 SQLNCHAR 0 10 "\r\n" 5 DURATION ""
Everything is ok, but I want add new column to bcp result, for example add '028' to bcp query result. How can I do that? Thanks.

Because it looks like you're adding a character string to the front of the select, something like this should work:
9.0
6
1 SQLCHAR 0 3 "," 1 NEWCOLUMN "SQL_Latin1_General_CP1_CI_AS"
2 SQLNCHAR 0 5 "," 2 ANUMBER ""
3 SQLNCHAR 0 10 "," 3 BNUMBER ""
4 SQLNCHAR 0 10 "," 4 DATE ""
5 SQLNCHAR 0 10 "," 5 TIME ""
6 SQLNCHAR 0 10 "\r\n" 6 DURATION ""
See https://msdn.microsoft.com/en-us/library/ms191479.aspx for more details on the format of the format file.