Bulk Insert Czech characters - sql-server

I got a tab delimited text file (cznames.txt)with PersonID and Names with Czech characters in it.
I am figuring out how to load it SQL Server table. Here's what I did
USE myDatabase
Go
CREATE TABLE [dbo].[myNameTable](
[ID] smallint NOT NULL,
[NAME] [nvarchar](50) collate Czech_CI_AS
) ON [PRIMARY]
I then created format file
bcp myDatabase.dbo.myNameTable format nul -c -f "C\temp\Czech.fmt" -T -Smyserver -Umyuser -P1mypwd
I used the below statement to insert into table
BULK INSERT myDatabase.dbo.myNameTable FROM 'C:\temp\cznames.txt'
WITH (FormatFile = 'C:\temp\Czech.fmt', FIRSTROW = 2, ROWTERMINATOR = '0X0A');
I find no errors but the characters in the table look very different from the text file.
Sample cznames.txt
ID NAME
1 Vysočina
2 Olomoucký
3 Středočeský
4 Hlavní město
Here's the format file
10.0
2
1 SQLCHAR 0 7 "\t" 1 ID ""
2 SQLCHAR 0 100 "\r\n" 2 Region Czech_CI_AS
Can anyone help me
Thanks

Please try the following solution.
The CODEPAGE = '65001' setting was a remedy.
Versions prior to SQL Server 2016 (13.x) don't support code page 65001
(UTF-8 encoding).
And without a format file.
SQL
USE tempdb;
GO
DROP TABLE IF EXISTS dbo.myNameTable;
CREATE TABLE dbo.myNameTable(
ID INT NOT NULL,
NAME NVARCHAR(50) collate Czech_CI_AS
);
BULK INSERT dbo.myNameTable
FROM 'e:\Temp\cznames.txt'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = '\t',
ROWTERMINATOR = '0x0A',
CODEPAGE = '65001'
);
SELECT * FROM dbo.myNameTable;
Output
+----+--------------+
| ID | NAME |
+----+--------------+
| 1 | Vysočina |
| 2 | Olomoucký |
| 3 | Středočeský |
| 4 | Hlavní město |
+----+--------------+

Related

bcp file sometimes with double quotes

I have a csv file with 2 columns ID, CompanyName.
I want to bcp to a sqlserver table with format file. This file is comma delimited. Problem with the .csv file is that Companyname has double quotes around them only when there are multiple words.
Here's an example
CompanyID,CompanyName
1000,FirstCompanyName
2000,"Testing Comma Name"
I do not know how to write a format file for this.
This is what I tried
10.0
3
1 SQLCHAR 0 10 ",\"" 1 CompanyID SQL_Latin1_General_CP1_CI_AI
2 SQLCHAR 0 0 "\"" 0 junk1 SQL_Latin1_General_CP1_CI_AI
3 SQLCHAR 0 100 "\r\n" 2 CompanyName SQL_Latin1_General_CP1_CI_AI
There are no errors
when I type this in Command prompt
bcp "[a].b.[CompanyData]" in "C:\test.csv" -f C:\Data.fmt -t, -F2 -S "server1\prod01" -Uuername -Ppwd -e C:\Logs\error.log -o C:\Logs\outputlog.log
there are no errors but nothing in the table either.
Can someone guide me?
Thanks
MR
bcp command line utility cannot process a *.csv file where a column has a sporadic double quotes as a delimiter.
A possible solution would be to load the entire line into a wide single column in a staging table. After that split into three columns by using T-SQL via bcp or BULK INSERT.
Here is the 2nd part. SQL Server 2017 onwards (due to the enhanced SPLIT() function)
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (line_from_file NVARCHAR(MAX));
INSERT INTO #tbl (line_from_file) VALUES
(N'1000,FirstCompanyName'),
(N'2000,"Testing Comma Name"');
-- DDL and sample data population, end
DECLARE #separator CHAR(1) = ',';
;WITH rs AS
(
SELECT *
, TRY_CAST('<root><r><![CDATA[' +
REPLACE(line_from_file, #separator, ']]></r><r><![CDATA[') + ']]></r></root>' AS XML) AS xmldata
FROM #tbl
)
-- INSERT INTO targetTable (CompanyID, CompanyName)
SELECT c.value('(r[1]/text())[1]', 'INT') AS CompanyID
, TRIM('"' FROM c.value('(r[2]/text())[1]', 'NVARCHAR(100)')) AS CompanyName
FROM rs CROSS APPLY xmldata.nodes('/root') AS t(c);
Output
+-----------+--------------------+
| CompanyID | CompanyName |
+-----------+--------------------+
| 1000 | FirstCompanyName |
| 2000 | Testing Comma Name |
+-----------+--------------------+

how to specify text qualified comma delimiter in BULK INSERT Format File for columns which may have null values

See this small sample of my CSV file:
"ID","TRANSACTION_TIME","CONTAINER_NUMBER","EVENT"
33115541,"2019-04-03 00:47:41.000000","MSKU1128096",
33115538,"2019-04-03 01:34:49.000000","MSKU1128096","Gate Out"
33115545,"2019-04-03 00:47:55.000000","MSKU4717839",
This is the format file I created
14.0
4
1 SQLCHAR 0 0 ",\"" 2 ID ""
2 SQLCHAR 0 0 "\",\"" 3 TRANSACTION_TIME ""
3 SQLCHAR 0 0 "\",\"" 4 CONTAINER_NUMBER ""
4 SQLCHAR 0 0 "\"\r\n" 5 EVENT SQL_Latin1_General_CP1_CI_AS
The issue is that the 4th column may have null values as you can see from rows 1 and 3 (excluding header)
See below my BULK INSERT command
bulk insert dbo.DRISPIN_CONTAINER_HISTORY_STG1
from 'e:\dri_container_history_initial.csv'
with (
firstrow = 2,
formatfile = 'e:\container_history_initial.fmt'
)
When I run this I get the following error:
Msg 8152, Level 16, State 13, Line 305
String or binary data would be truncated.
I have also tried specifying a Prefix Length of 2, but get some different errors.
I know I can possibly take the the values in with the qualifiers into staging table and then strip them out. But ideally I would like to see if there is a way to do this with BULK INSERT or BCP
Thanks in advance
Full CSV support was addedin SQL Server 2017. I suspect that's the version used here since the file's format version number is 14.0.
The following command will load the file using a double quote as the FIELDQUOTE character and CRLF as the row terminator :
create table testtable
(
"ID" bigint,
"TRANSACTION_TIME" datetime2(0),
"CONTAINER_NUMBER" varchar(200),
"EVENT" varchar(200)
)
bulk insert dbo.testtable
from 'c:\path\to\testcsv.csv'
with (
format='csv',
FIRSTROW=2
)
select * from testtable
The results are :
ID TRANSACTION_TIME CONTAINER_NUMBER EVENT
33115541 2019-04-03 00:47:41 MSKU1128096 NULL
33115538 2019-04-03 01:34:49 MSKU1128096 Gate Out
33115545 2019-04-03 00:47:55 MSKU4717839 NULL
FORMAT = 'CSV' still can't handle a missing newline at the end of the file

SQL Server : Bulk insert a Datatable into 2 tables

Consider this datatable :
word wordCount documentId
---------- ------- ---------------
Ball 10 1
School 11 1
Car 4 1
Machine 3 1
House 1 2
Tree 5 2
Ball 4 2
I want to insert these data into two tables with this structure :
Table WordDictionary
(
Id int,
Word nvarchar(50),
DocumentId int
)
Table WordDetails
(
Id int,
WordId int,
WordCount int
)
FOREIGN KEY (WordId) REFERENCES WordDictionary(Id)
But because I have thousands of records in initial table, I have to do this just in one transaction (batch query) for example using bulk insert can help me doing this purpose.
But the question here is how I can separate this data into these two tables WordDictionary and WordDetails.
For more details :
Final result must be like this :
Table WordDictionary:
Id word
---------- -------
1 Ball
2 School
3 Car
4 Machine
5 House
6 Tree
and table WordDetails :
Id wordId WordCount DocumentId
---------- ------- ----------- ------------
1 1 10 1
2 2 11 1
3 3 4 1
4 4 3 1
5 5 1 2
6 6 5 2
7 1 4 2
Notice :
The words in the source can be duplicated so I must check word existence in table WordDictionary before any insert record in these tables and if a word is found in table WordDictionary, the just found Word ID must be inserted into table WordDetails (please see Word Ball)
Finally the 1 M$ problem is: this insertion must be done as fast as possible.
If you're looking to just load the table the first time without any updates to the table over time you could potentially do it this way (I'm assuming you've already created the tables you're loading into):
You can put all of the distinct words from the datatable into the WordDictionary table first:
SELECT DISTINCT word
INTO WordDictionary
FROM datatable;
Then after you populate your WordDictionary you can then use the ID values from it and the rest of the information from datatable to load your WordDetails table:
SELECT WD.Id as wordId, DT.wordCount as WordCount, DT.documentId AS DocumentId
INTO WordDetails
FROM datatable as DT
INNER JOIN WordDictionary AS WD ON WD.word = DT.word
There a little discrepancy between declared table schema and your example data, but it was solved:
1) Setup
-- this the table with the initial data
-- drop table DocumentWordData
create table DocumentWordData
(
Word NVARCHAR(50),
WordCount INT,
DocumentId INT
)
GO
-- these are result table with extra information (identity, primary key constraints, working foreign key definition)
-- drop table WordDictionary
create table WordDictionary
(
Id int IDENTITY(1, 1) CONSTRAINT PK_WordDictionary PRIMARY KEY,
Word nvarchar(50)
)
GO
-- drop table WordDetails
create table WordDetails
(
Id int IDENTITY(1, 1) CONSTRAINT PK_WordDetails PRIMARY KEY,
WordId int CONSTRAINT FK_WordDetails_Word REFERENCES WordDictionary,
WordCount int,
DocumentId int
)
GO
2) The actual script to put data in the last two tables
begin tran
-- this is to make sure that if anything in this block fails, then everything is automatically rolled back
set xact_abort on
-- the dictionary is obtained by considering all distinct words
insert into WordDictionary (Word)
select distinct Word
from DocumentWordData
-- details are generating from initial data joining the word dictionary to get word id
insert into WordDetails (WordId, WordCount, DocumentId)
SELECT W.Id, DWD.WordCount, DWD.DocumentId
FROM DocumentWordData DWD
JOIN WordDictionary W ON W.Word = DWD.Word
commit
-- just to test the results
select * from WordDictionary
select * from WordDetails
I expect this script to run very fast, if you do not have a very large number of records (millions at most).
This is the query. I'm using temp table to be able to test.
if you use the 2 CTEs, you'll be able to generate the final result
1.Setting up a sample data for test.
create table #original (word varchar(10), wordCount int, documentId int)
insert into #original values
('Ball', 10, 1),
('School', 11, 1),
('Car', 4, 1),
('Machine', 3, 1),
('House', 1, 2),
('Tree', 5, 2),
('Ball', 4, 2)
2. Use cte1 and cte2. In your real database, you need to replace #original with the actual table name you have all initial records.
;with cte1 as (
select ROW_NUMBER() over (order by word) Id, word
from #original
group by word
)
select * into #WordDictionary
from cte1
;with cte2 as (
select ROW_NUMBER() over (order by #original.word) Id, Id as wordId,
#original.word, #original.wordCount, #original.documentId
from #WordDictionary
inner join #original on #original.word = #WordDictionary.word
)
select * into #WordDetails
from cte2
select * from #WordDetails
This will be data in #WordDetails
+----+--------+---------+-----------+------------+
| Id | wordId | word | wordCount | documentId |
+----+--------+---------+-----------+------------+
| 1 | 1 | Ball | 10 | 1 |
| 2 | 1 | Ball | 4 | 2 |
| 3 | 2 | Car | 4 | 1 |
| 4 | 3 | House | 1 | 2 |
| 5 | 4 | Machine | 3 | 1 |
| 6 | 5 | School | 11 | 1 |
| 7 | 6 | Tree | 5 | 2 |
+----+--------+---------+-----------+------------+

SQL Server BULK INSERT mess up varbinary

I have an input file with records
1,2014030000000212,0x060000000000000000000000000000
1,2014030000000215,0x050000000000000000000000000000
1,2014030000000221,0x080000000000000000000000000000
I use a FormatFile
11.0
3
1 SQLINT 0 4 "," 1 ClientCode ""
2 SQLCHAR 0 20 "," 2 AccountID SQL_Latin1_General_CP1_CI_AS
3 SQLBINARY 0 64 "\r\n" 3 mask ""
when I use BULK INSERT TempBinaryMask from 'C:\Temp\BinaryData.txt' WITH (FORMATFILE = 'C:\Temp\BinaryFormat.txt') it inserts the data, but it messes up my varbinaries, and it looks like this
49 2014030000000212 0x3078303630303030303030303030303030303030303030303030303030303030
49 2014030000000215 0x3078303530303030303030303030303030303030303030303030303030303030
49 2014030000000221 0x3078303830303030303030303030303030303030303030303030303030303030
I also just noticed that my ClientCode is also wrong. it is 49 instead of 1. If there something I'm doing wrong?
This is my table definition
CREATE TABLE TempBinaryMask
(
ClientCode int,
AccountID varchar(20),
mask varbinary(64)
)
For some reason a FormatFile is the problem.
I changed my input file to
1,2014030000000212,060000000000000000000000000000
1,2014030000000215,050000000000000000000000000000
1,2014030000000221,080000000000000000000000000000
and used
BULK INSERT TempBinaryMask from 'C:\Temp\BinaryData.txt' WITH (DATAFILETYPE='char', FIELDTERMINATOR=',')
to import the data, and it worked perfectly.
I've tried the XML format file, as well as the non-XML, and both gave me different types of errors

SQL Server - Inserting default values bcp

In SQL Server, how to insert default values while using bcp command?
Scenario is from the below table, while running bcp command, column 'sno' is identity column, where values should increment automatically by 1, data for values column should come from datafile and values for date column should automatically updated to today's date and values for status column should updated as Flag1.
For normal usage I know how to create bcp format file. For the above scenario, how can I create a format file and insert data to table1?
Table format:
CREATE TABLE [dbo].[table1]
(
SNo int IDENTITY(1,1) NOT NULL,
values varchar(13) NOT NULL,
date datetime NOT NULL,
status varchar(50)
)
Table1:
sno | values | date | status
-----+----------+------------+--------
1 | 111111 | 2015-08-17 | Flag1
2 | 222222 | 2015-08-17 | Flag1
Basically, you just need to put 0 as the host column number to avoid a column from being inserted by bcp.
So assuming you have a default constraint for your [date] column:
ALTER TABLE dbo.table1
ADD CONSTRAINT DF_Table1_Date DEFAULT(SYSDATETIME()) FOR [Date]
and somehow you have also set up some way to calculate the [status] - then you could use this format file:
12.0
4
1 SQLCHAR 0 12 ";" 0 SNo ""
2 SQLCHAR 0 13 ";" 2 values SQL_Latin1_General_CP1_CI_AS
3 SQLDATETIME 0 24 ";" 0 date ""
4 SQLCHAR 0 50 "\r\n" 0 status SQL_Latin1_General_CP1_CI_AS
and thus you would be really only importing the [values] column - the SNo is automatically set by SQL Server (identity column), the [date] column is automatically set to the current date&time by means of the default constraint - now you'll have to find a way to fill in the [status] column upon or after insert!

Resources