Unicode/Collation Issue in Openrowset SQL Server

Unicode/Collation Issue in Openrowset SQL Server - sql-server

My CSV has text like this:
Côté fenêtres,
carré
I'm trying to open this CSV file using openrowset in SQL Server like below:-
select * from openrowset(BULK 'C:\Import_Orders\Files\PO.csv',
FORMATFILE = 'C:\Import_Orders\Format\Cust_441211.fmt.txt') as PO
But the result is like this:
C+¦t+¬ fen+¬tres,
Carr+¬
How can I tackle this issue? Let me know if I need to add anything more to this question.
SQL Version -
Microsoft SQL Server 2017 (RTM-CU29-GDR) (KB5014553) - 14.0.3445.2 (X64)
This is the format file:-
11.0
8
1 SQLCHAR 0 250 "|" 1 PARTNO ""
2 SQLCHAR 0 250 "|" 2 CODE ""
3 SQLCHAR 0 250 "|" 3 PRICEKG ""
4 SQLCHAR 0 250 "|" 4 FOOTKG ""
5 SQLCHAR 0 250 "|" 5 LENGTH ""
6 SQLCHAR 0 250 "|" 6 QTY ""
7 SQLCHAR 0 250 "|" 7 COLOR ""
8 SQLCHAR 0 250 "\r\n" 8 TOTKG ""

(1) You can try to add an additional parameter CODEPAGE = '65001' to specify a code page to support UNICODE characters.
(2) Use may try to use SQLNCHAR data type instead of SQLCHAR in the format file. For a text file you should always specify SQLCHAR for all fields, unless you have a Unicode file in in UTF‑16 encoding in which case you should use SQLNCHAR.
SQL
SELECT * FROM openrowset(BULK 'C:\Import_Orders\Files\PO.csv',
FORMATFILE = 'C:\Import_Orders\Format\Cust_441211.fmt.txt',
CODEPAGE = '65001') as PO;

Related

bulk insert (SQL) format file last line

I have the following csv I wish to import into my db
"LE";"SOURCE";"VAR_SCTARGET_NAME"
"B";"A/K";"A/K"
"A";"A/B";"A/B"
"A";"A/B";"A/C"
I arranged the following format file
10.0
3
1 SQLCHAR 0 0 "\";\"" 1 A ""
2 SQLCHAR 0 0 "\";\"" 2 B ""
3 SQLCHAR 0 0 "\"\r\n\"" 3 AA ""
which works just fine, if it weren't for the last line. The output in my db is the following
LE SOURCE VAR_SCTARGET_NAME
B A/K A/K
A A/B A/B
A A/B A/C"
How can I remove the quote on the last row? I'm working on a SQL Server platform, if it can be of any help.

Sql Server Bulk insert via file separated with special charactors

I have File to be imported into SQL Server table, I am importing that via BCP command via command line in C# when I pass Comma(,) as Separator in format file then it works fine, but when I try to pass Special char this as separator then it fails and giving me below error.
XML parsing: line 2, character 0, incorrect document syntax
Note: due to some reasons Stackoverflow not showing my special character, Please copy format file into your text editor like notepad++ or something else.
My format file as following for special char.
12.0
20
1 SQLCHAR 0 0 "" 2 Column1 SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 0 "" 3 Column2 SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 0 "" 0 XyzColumnToBypass ""
4 SQLCHAR 0 0 "" 0 XyzColumnToBypass ""
5 SQLCHAR 0 0 "" 0 XyzColumnToBypass ""
6 SQLCHAR 0 0 "" 0 XyzColumnToBypass ""
7 SQLCHAR 0 0 "" 0 XyzColumnToBypass ""
8 SQLCHAR 0 0 "" 0 XyzColumnToBypass ""
9 SQLCHAR 0 0 "" 0 XyzColumnToBypass ""
10 SQLCHAR 0 0 "" 0 XyzColumnToBypass ""
11 SQLCHAR 0 0 "" 0 XyzColumnToBypass ""
12 SQLCHAR 0 0 "" 0 XyzColumnToBypass ""
13 SQLCHAR 0 0 "" 7 Column4 SQL_Latin1_General_CP1_CI_AS
14 SQLCHAR 0 0 "" 6 Column5 SQL_Latin1_General_CP1_CI_AS
15 SQLCHAR 0 0 "" 5 Column6 SQL_Latin1_General_CP1_CI_AS
16 SQLCHAR 0 0 "" 0 XyzColumnToBypass ""
17 SQLCHAR 0 0 "" 0 XyzColumnToBypass ""
18 SQLCHAR 0 0 "" 0 XyzColumnToBypass ""
19 SQLCHAR 0 0 "" 0 XyzColumnToBypass ""
20 SQLCHAR 0 0 "\r\n" 0 XyzColumnToBypass ""
I also tried via SQL server using below query but got the same error.
BULK INSERT mySqlServerTable
FROM '\\machinexxx\Shared\BCP\TextfileToImport_SpecialChar.dat'
WITH(FORMATFILE = '\\machinexxx\Shared\BCP\formatfile.fmt')
I don't know why it accessing my non-XML format file as xml format file while my encoding for the format file is ASCII.

Importing utf-8 encoded data from csv to SQL Server using bulk insert

I am trying to import raw data that I have in .csv format to my table in SQL Server. My table is called [raw].[sub_brand_channel_mapping].
The last column ImportFileId is generated by my python code. I first set all the rows of that column as the default value generated. Then I use bulk insert to import my csv data in utf-8 format to my table in SQL Server. However, during the process a lot of special characters are changing. I am using a format file like this
10.0
8
1 SQLCHAR 0 100 "\t" 1 sub_brand_id ""
2 SQLCHAR 0 1024 "\t" 2 sub_brand_name SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 100 "\t" 3 channel_country_id ""
4 SQLCHAR 0 1024 "\t" 4 channel_id ""
5 SQLCHAR 0 1024 "\t" 5 channel_name SQL_Latin1_General_CP1_CI_AS
6 SQLCHAR 0 256 "\t" 6 status ""
7 SQLCHAR 0 256 "\t" 7 eff_start_date ""
8 SQLCHAR 0 256 "\r\n" 8 eff_end_date ""
My bulk insert command looks like this:
bcp "{table}" in "{file}" {connect_string} -f {format_file} -F {first_row} - b 10000000 -e {error_file} -q -m {max_errors}
My csv files have "\t" as the delimiter. I need to import the exact names without any change. What should I do ?
P.S. I did try converting my utf-8 encoded csv to utf-16-le and then use "-w" in my bcp command. It is giving a lot of errors. In short it didn't work. Please advise me on how to do this.

Bulk insert fmt text qualifier

I've a BULK INSERT task that takes data from a csv and imports into a table. Only problem is that one of the columns can contain a comma, so the import doesn't work as expected.
I've tried to fix this by creating a format (fmt) file, the contents of which I've detailed below:-
9.0
6
1 SQLCHAR 0 50 "," 1 "Identifier" Latin1_General_CI_AS
2 SQLCHAR 0 50 "," 2 "Name" Latin1_General_CI_AS
3 SQLCHAR 0 50 "," 3 "Date of Birth" Latin1_General_CI_AS
4 SQLCHAR 0 50 "," 4 "Admission" Latin1_General_CI_AS
5 SQLCHAR 0 50 "," 5 "Code" Latin1_General_CI_AS
6 SQLCHAR 0 50 "\r\n" 6 "Length" Latin1_General_CI_AS
The column causing me pain is column 2 "Name".
I've tried a couple of things to identify the column as being text qualified and containing a comma but I'm not getting the result I want.
If I change to the following:-
"\"," - I get something like this -- "Richardson, Mat
This isn't correct, so I tried this instead, as suggested on some other forums / sites:-
"\",\""
This doesn't work at all and actually gives me the error
Cannot obtain the required interface ("IID_IColumnsInfo") from OLE DB provider "BULK" for linked server "(null)".Bulk load: An unexpected end of file was encountered in the data file.
I've tried a few other combinations and just can't get this right. Any help or guidance would be massively appreciated.

Not really answering your question regarding format files but a possible get you working solution.
Format files are incomprehensible arcana from the 1980s to me, bulk insert is uber fussy and unforgiving. Therefore I tend to clean data with a few lines of powershell instead. Here's an example I used recently to convert a CSV to Pipe separated, to remove some random quoting on the output and to allow for commas in the records:
Import-Csv -Path $dirtyCsv |
ConvertTo-CSV -NoType -Delimiter '|' |
%{ $_.Replace('"','') } |
Out-File $cleanCsv
You get the idea...
This then simply imported:
BULK INSERT SomeTable FROM 'clean.csv' WITH ( FIRSTROW = 2, FIELDTERMINATOR = '|', ROWTERMINATOR = '\n' )
Hope this helps.

This is occurring because you are telling the bulk insert that your field terminator for the column before name is a simple comma and that the field terminator for the Name column itself is double quote, then comma. You need to change the field terminator for the column before Name to be comma then double quote if you want to take care of the remaining double quote.
I believe your field terminator for the column before name should be: ",\"", where:
,=comma
/" = double quotes
Enclosed in another set of double quotes; it is the value to be used as the field terminator.
Flip the comma and the double quotes for the field terminator of your Name column.
So it should look like this:
9.0
6
1 SQLCHAR 0 50 ",\"" 1 "Identifier" Latin1_General_CI_AS
2 SQLCHAR 0 50 "\"," 2 "Name" Latin1_General_CI_AS
3 SQLCHAR 0 50 "," 3 "Date of Birth" Latin1_General_CI_AS
4 SQLCHAR 0 50 "," 4 "Admission" Latin1_General_CI_AS
5 SQLCHAR 0 50 "," 5 "Code" Latin1_General_CI_AS
6 SQLCHAR 0 50 "\r\n" 6 "Length"

How to add column to SQL Server bcp query?

I'm beginner in SQL Server, when I write this query:
select ANUMBER
from CDRTABLE
it shows me data, but I want to add new column to result change that query to this:
select '028', ANUMBER
from CDRTABLE
This query adds a new column to query result, so I write this bcp query for saving results to a text file:
EXEC xp_cmdshell 'bcp "SELECT rtrim(ltrim(ANUMBER)),rtrim(ltrim(BNUMBER)),rtrim(ltrim(DATE)),rtrim(ltrim(TIME)),rtrim(ltrim(DURATION)) FROM [myTestReport].[dbo].[CDRTABLE]" queryout f:\newOUTPUT.txt -S DESKTOP-A5CFJSH\MSSQLSERVER1 -Umyusername -Pmypassword -f "f:\myFORMAT.fmt" '
and my format file is this:
9.0
5
1 SQLNCHAR 0 5 "," 1 ANUMBER ""
2 SQLNCHAR 0 10 "," 2 BNUMBER ""
3 SQLNCHAR 0 10 "," 3 DATE ""
4 SQLNCHAR 0 10 "," 4 TIME ""
5 SQLNCHAR 0 10 "\r\n" 5 DURATION ""
Everything is ok, but I want add new column to bcp result, for example add '028' to bcp query result. How can I do that? Thanks.

Because it looks like you're adding a character string to the front of the select, something like this should work:
9.0
6
1 SQLCHAR 0 3 "," 1 NEWCOLUMN "SQL_Latin1_General_CP1_CI_AS"
2 SQLNCHAR 0 5 "," 2 ANUMBER ""
3 SQLNCHAR 0 10 "," 3 BNUMBER ""
4 SQLNCHAR 0 10 "," 4 DATE ""
5 SQLNCHAR 0 10 "," 5 TIME ""
6 SQLNCHAR 0 10 "\r\n" 6 DURATION ""
See https://msdn.microsoft.com/en-us/library/ms191479.aspx for more details on the format of the format file.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Unicode/Collation Issue in Openrowset SQL Server - sql-server

Related

bulk insert (SQL) format file last line

Sql Server Bulk insert via file separated with special charactors

Importing utf-8 encoded data from csv to SQL Server using bulk insert

Bulk insert fmt text qualifier

How to add column to SQL Server bcp query?

Categories

Resources