bcp file sometimes with double quotes

bcp file sometimes with double quotes - sql-server

I have a csv file with 2 columns ID, CompanyName.
I want to bcp to a sqlserver table with format file. This file is comma delimited. Problem with the .csv file is that Companyname has double quotes around them only when there are multiple words.
Here's an example
CompanyID,CompanyName
1000,FirstCompanyName
2000,"Testing Comma Name"
I do not know how to write a format file for this.
This is what I tried
10.0
3
1 SQLCHAR 0 10 ",\"" 1 CompanyID SQL_Latin1_General_CP1_CI_AI
2 SQLCHAR 0 0 "\"" 0 junk1 SQL_Latin1_General_CP1_CI_AI
3 SQLCHAR 0 100 "\r\n" 2 CompanyName SQL_Latin1_General_CP1_CI_AI
There are no errors
when I type this in Command prompt
bcp "[a].b.[CompanyData]" in "C:\test.csv" -f C:\Data.fmt -t, -F2 -S "server1\prod01" -Uuername -Ppwd -e C:\Logs\error.log -o C:\Logs\outputlog.log
there are no errors but nothing in the table either.
Can someone guide me?
Thanks
MR

bcp command line utility cannot process a *.csv file where a column has a sporadic double quotes as a delimiter.
A possible solution would be to load the entire line into a wide single column in a staging table. After that split into three columns by using T-SQL via bcp or BULK INSERT.
Here is the 2nd part. SQL Server 2017 onwards (due to the enhanced SPLIT() function)
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (line_from_file NVARCHAR(MAX));
INSERT INTO #tbl (line_from_file) VALUES
(N'1000,FirstCompanyName'),
(N'2000,"Testing Comma Name"');
-- DDL and sample data population, end
DECLARE #separator CHAR(1) = ',';
;WITH rs AS
(
SELECT *
, TRY_CAST('<root><r><![CDATA[' +
REPLACE(line_from_file, #separator, ']]></r><r><![CDATA[') + ']]></r></root>' AS XML) AS xmldata
FROM #tbl
)
-- INSERT INTO targetTable (CompanyID, CompanyName)
SELECT c.value('(r[1]/text())[1]', 'INT') AS CompanyID
, TRIM('"' FROM c.value('(r[2]/text())[1]', 'NVARCHAR(100)')) AS CompanyName
FROM rs CROSS APPLY xmldata.nodes('/root') AS t(c);
Output
+-----------+--------------------+
| CompanyID | CompanyName |
+-----------+--------------------+
| 1000 | FirstCompanyName |
| 2000 | Testing Comma Name |
+-----------+--------------------+

Related

Bulk Insert Czech characters

I got a tab delimited text file (cznames.txt)with PersonID and Names with Czech characters in it.
I am figuring out how to load it SQL Server table. Here's what I did
USE myDatabase
Go
CREATE TABLE [dbo].[myNameTable](
[ID] smallint NOT NULL,
[NAME] [nvarchar](50) collate Czech_CI_AS
) ON [PRIMARY]
I then created format file
bcp myDatabase.dbo.myNameTable format nul -c -f "C\temp\Czech.fmt" -T -Smyserver -Umyuser -P1mypwd
I used the below statement to insert into table
BULK INSERT myDatabase.dbo.myNameTable FROM 'C:\temp\cznames.txt'
WITH (FormatFile = 'C:\temp\Czech.fmt', FIRSTROW = 2, ROWTERMINATOR = '0X0A');
I find no errors but the characters in the table look very different from the text file.
Sample cznames.txt
ID NAME
1 Vysočina
2 Olomoucký
3 Středočeský
4 Hlavní město
Here's the format file
10.0
2
1 SQLCHAR 0 7 "\t" 1 ID ""
2 SQLCHAR 0 100 "\r\n" 2 Region Czech_CI_AS
Can anyone help me
Thanks

Please try the following solution.
The CODEPAGE = '65001' setting was a remedy.
Versions prior to SQL Server 2016 (13.x) don't support code page 65001
(UTF-8 encoding).
And without a format file.
SQL
USE tempdb;
GO
DROP TABLE IF EXISTS dbo.myNameTable;
CREATE TABLE dbo.myNameTable(
ID INT NOT NULL,
NAME NVARCHAR(50) collate Czech_CI_AS
);
BULK INSERT dbo.myNameTable
FROM 'e:\Temp\cznames.txt'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = '\t',
ROWTERMINATOR = '0x0A',
CODEPAGE = '65001'
);
SELECT * FROM dbo.myNameTable;
Output
+----+--------------+
| ID | NAME |
+----+--------------+
| 1 | Vysočina |
| 2 | Olomoucký |
| 3 | Středočeský |
| 4 | Hlavní město |
+----+--------------+

how to specify text qualified comma delimiter in BULK INSERT Format File for columns which may have null values

See this small sample of my CSV file:
"ID","TRANSACTION_TIME","CONTAINER_NUMBER","EVENT"
33115541,"2019-04-03 00:47:41.000000","MSKU1128096",
33115538,"2019-04-03 01:34:49.000000","MSKU1128096","Gate Out"
33115545,"2019-04-03 00:47:55.000000","MSKU4717839",
This is the format file I created
14.0
4
1 SQLCHAR 0 0 ",\"" 2 ID ""
2 SQLCHAR 0 0 "\",\"" 3 TRANSACTION_TIME ""
3 SQLCHAR 0 0 "\",\"" 4 CONTAINER_NUMBER ""
4 SQLCHAR 0 0 "\"\r\n" 5 EVENT SQL_Latin1_General_CP1_CI_AS
The issue is that the 4th column may have null values as you can see from rows 1 and 3 (excluding header)
See below my BULK INSERT command
bulk insert dbo.DRISPIN_CONTAINER_HISTORY_STG1
from 'e:\dri_container_history_initial.csv'
with (
firstrow = 2,
formatfile = 'e:\container_history_initial.fmt'
)
When I run this I get the following error:
Msg 8152, Level 16, State 13, Line 305
String or binary data would be truncated.
I have also tried specifying a Prefix Length of 2, but get some different errors.
I know I can possibly take the the values in with the qualifiers into staging table and then strip them out. But ideally I would like to see if there is a way to do this with BULK INSERT or BCP
Thanks in advance

Full CSV support was addedin SQL Server 2017. I suspect that's the version used here since the file's format version number is 14.0.
The following command will load the file using a double quote as the FIELDQUOTE character and CRLF as the row terminator :
create table testtable
(
"ID" bigint,
"TRANSACTION_TIME" datetime2(0),
"CONTAINER_NUMBER" varchar(200),
"EVENT" varchar(200)
)
bulk insert dbo.testtable
from 'c:\path\to\testcsv.csv'
with (
format='csv',
FIRSTROW=2
)
select * from testtable
The results are :
ID TRANSACTION_TIME CONTAINER_NUMBER EVENT
33115541 2019-04-03 00:47:41 MSKU1128096 NULL
33115538 2019-04-03 01:34:49 MSKU1128096 Gate Out
33115545 2019-04-03 00:47:55 MSKU4717839 NULL
FORMAT = 'CSV' still can't handle a missing newline at the end of the file

How to remove extension dates in SQL Server

How to remove extension dates in SQL server?
FileName | id
-------------------------+---
c:\abc_20181008.txt | 1
c:\xyz_20181007.dat | 2
c:\abc_xyz_20181007.dat | 3
c:\ab.xyz_20181007.txt | 4
Based on above data I want output like below :
Table: emp
FileName | id
-------------------+---
c:\abc.txt | 1
c:\xyz.dat | 2
c:\abc_xyz.dat | 3
c:\ab.xyz.txt | 4
I have tried like this:
select
substring (Filename, replace(filename, '.', ''), len(filename)), id
from
emp
But this query is not returning the expected result in SQL Server.
Please tell me how to write a query to achieve this task in SQL Server.

You can use the following query:
SELECT id, filename,
LEFT(filename, LEN(filename) - i1) + RIGHT(filename, i2 - 1)
FROM emp
CROSS APPLY
(
SELECT CHARINDEX('_', REVERSE(filename)) AS i1,
PATINDEX('%[0-9]%', REVERSE(filename)) AS i2
) AS x
Demo here

You can try this as well:
declare #t table (a varchar(50))
insert into #t values ('c:\abc_20181008.txt')
insert into #t values ('c:\abc_xyz_20181007.dat')
insert into #t values ('c:\ab.xyz_20181007.txt')
insert into #t values ('c:\ab.xyz_20182007.txt')
select replace(SUBSTRING(a,1,CHARINDEX('2',a) - 1) + SUBSTRING(a,len(a)-3,LEN(a)),'_.','.') from #t

SQL Server BULK INSERT mess up varbinary

I have an input file with records
1,2014030000000212,0x060000000000000000000000000000
1,2014030000000215,0x050000000000000000000000000000
1,2014030000000221,0x080000000000000000000000000000
I use a FormatFile
11.0
3
1 SQLINT 0 4 "," 1 ClientCode ""
2 SQLCHAR 0 20 "," 2 AccountID SQL_Latin1_General_CP1_CI_AS
3 SQLBINARY 0 64 "\r\n" 3 mask ""
when I use BULK INSERT TempBinaryMask from 'C:\Temp\BinaryData.txt' WITH (FORMATFILE = 'C:\Temp\BinaryFormat.txt') it inserts the data, but it messes up my varbinaries, and it looks like this
49 2014030000000212 0x3078303630303030303030303030303030303030303030303030303030303030
49 2014030000000215 0x3078303530303030303030303030303030303030303030303030303030303030
49 2014030000000221 0x3078303830303030303030303030303030303030303030303030303030303030
I also just noticed that my ClientCode is also wrong. it is 49 instead of 1. If there something I'm doing wrong?
This is my table definition
CREATE TABLE TempBinaryMask
(
ClientCode int,
AccountID varchar(20),
mask varbinary(64)
)

For some reason a FormatFile is the problem.
I changed my input file to
1,2014030000000212,060000000000000000000000000000
1,2014030000000215,050000000000000000000000000000
1,2014030000000221,080000000000000000000000000000
and used
BULK INSERT TempBinaryMask from 'C:\Temp\BinaryData.txt' WITH (DATAFILETYPE='char', FIELDTERMINATOR=',')
to import the data, and it worked perfectly.
I've tried the XML format file, as well as the non-XML, and both gave me different types of errors

SQL Server - Inserting default values bcp

In SQL Server, how to insert default values while using bcp command?
Scenario is from the below table, while running bcp command, column 'sno' is identity column, where values should increment automatically by 1, data for values column should come from datafile and values for date column should automatically updated to today's date and values for status column should updated as Flag1.
For normal usage I know how to create bcp format file. For the above scenario, how can I create a format file and insert data to table1?
Table format:
CREATE TABLE [dbo].[table1]
(
SNo int IDENTITY(1,1) NOT NULL,
values varchar(13) NOT NULL,
date datetime NOT NULL,
status varchar(50)
)
Table1:
sno | values | date | status
-----+----------+------------+--------
1 | 111111 | 2015-08-17 | Flag1
2 | 222222 | 2015-08-17 | Flag1

Basically, you just need to put 0 as the host column number to avoid a column from being inserted by bcp.
So assuming you have a default constraint for your [date] column:
ALTER TABLE dbo.table1
ADD CONSTRAINT DF_Table1_Date DEFAULT(SYSDATETIME()) FOR [Date]
and somehow you have also set up some way to calculate the [status] - then you could use this format file:
12.0
4
1 SQLCHAR 0 12 ";" 0 SNo ""
2 SQLCHAR 0 13 ";" 2 values SQL_Latin1_General_CP1_CI_AS
3 SQLDATETIME 0 24 ";" 0 date ""
4 SQLCHAR 0 50 "\r\n" 0 status SQL_Latin1_General_CP1_CI_AS
and thus you would be really only importing the [values] column - the SNo is automatically set by SQL Server (identity column), the [date] column is automatically set to the current date&time by means of the default constraint - now you'll have to find a way to fill in the [status] column upon or after insert!

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

bcp file sometimes with double quotes - sql-server

Related

Bulk Insert Czech characters

how to specify text qualified comma delimiter in BULK INSERT Format File for columns which may have null values

How to remove extension dates in SQL Server

SQL Server BULK INSERT mess up varbinary

SQL Server - Inserting default values bcp

Categories

Resources