Removing double quotes before bulk insert - sql-server

I'm trying to load a file using bulk insert, but data isn't inserted correctly because some of the data is covered by quotation marks.
I've tried using a format file, but it doesn't work becasue not ALL the rows in that column contain quotes. Only some do. e.g.
columna
abc
cdf
"dfd"
dfs
"aee"
So my format file doesn't work.
My bulk insert code:
bulk insert tablename
from 'C:/...'
with
(
FIRSTROW = 2,
rowterminator = '0x0a'
,formatfile = 'file.fmt'
)
Format file:
10.0
5
1 SQLCHAR 0 1000 "," 1 "a" ""
2 SQLCHAR 0 1000 ",\"" 2 "b" ""
3 SQLCHAR 0 1000 "\",\"" 3 "d" <- has quotes ""
4 SQLCHAR 0 1000 ",\"" 4 "e" ""
5 SQLCHAR 0 1000 "\n" 5 "f"
Any ideas?

If there is no other way to remove double quotes in the column/s then what you can do is to do a post process where you update the affected column/s with replace
ie.
update mytable set col1 = replace(col1,'"',''),col2 = replace(col2,'"','')

I came across this problem with some data which also had some quotation marks inside the data so I couldn't use replace. Just in case there is this:
CREATE TABLE SomeTable
(
ColumnA VARCHAR(5)
)
INSERT INTO SomeTable VALUES ('abc')
INSERT INTO SomeTable VALUES ('cdf')
INSERT INTO SomeTable VALUES ('"dfd"')
INSERT INTO SomeTable VALUES ('dfs')
INSERT INTO SomeTable VALUES ('"aee"')
INSERT INTO SomeTable VALUES (' efg ')
GO
SELECT * FROM SomeTable
GO
--TRIM THE DATA
UPDATE SomeTable SET ColumnA =LTRIM(RTRIM(ColumnA))
GO
--DELETE THE DELIMITERS
UPDATE SomeTable SET ColumnA = LEFT(ColumnA,LEN(ColumnA)-1) WHERE RIGHT(ColumnA,1) LIKE '"'
GO
UPDATE SomeTable SET ColumnA = RIGHT(ColumnA,LEN(ColumnA)-1) WHERE LEFT(ColumnA,1) LIKE '"'
GO
--RETRIM THE DATA
UPDATE SomeTable SET ColumnA =LTRIM(RTRIM(ColumnA))
GO
SELECT * FROM SomeTable
GO

Related

file not having end line and " at first column and last column

I am trying to export of table in csv file by using bcp command in microsoft sql server.
Below is the table sample data
Table name : XYZ
col1 col2 col3
abcd,inc. USD,inc 1234
pqrs,inc USD,inc 6789
stuv,inc USD,inc 0009
There is comma in column values as above.
I have written .fmt file like below:
test.fmt
13.0
3
1 SQLCHAR 0 4000 "\",\"" 1 col1 SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 4000 "\",\"" 2 col2 SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 4000 "\r\n" 3 col3 SQL_Latin1_General_CP1_CI_AS
Below is command I am using:
DECLARE
#V_BCP_QUERY VARCHAR(4000),
#V_BCP_OUTPUT_FILE VARCHAR(1500),
#V_BCP_FORMAT_FILE VARCHAR(1500),
#V_BCP_COMMAND VARCHAR(4000)
begin
SET #V_BCP_QUERY='"SELECT col1,col2,col3 FROM TABS..XYZ"'
SET #V_BCP_OUTPUT_FILE='"D:\OUTPUT.csv"'
SET #V_BCP_FORMAT_FILE='"D:\test.fmt"'
SET #V_BCP_COMMAND='bcp '+#V_BCP_QUERY+' queryout '+#V_BCP_OUTPUT_FILE+' -f '+#V_BCP_FORMAT_FILE+' -T -S "DEV-CR"'
EXECUTE Master.dbo.xp_CmdShell #V_BCP_COMMAND
end
I am getting below data in OUTPUT.csv file:
abcd,inc.","USD,inc","1234
pqrs,inc","USD,inc","6789
stuv,inc","USD,inc","0009
there is no " at start of line and end of line.
Also when I open this in excel then all rows are coming in a single line
my requirement is to export file in csv file.
Kindly help
You could hack a solution together - but using the correct tool for the job would be much easier and better in the long run.
Instead of using BCP to output individual columns, create a single column formatted with your desired result:
SELECT quotename(concat_ws(',', quotename(col1, char(34)), quotename(col2, char(34)), quotename(col3, char(34)), char(34))
FROM yourTable
This will give you a single column in your output - with double-quotes around the whole string, double-quotes around each column, concatenated with the '.' separator.
Sure it's ugly, but it's simple, quick and gets it done.
SELECT '"' + col1 + '",' +
'"' + col2 + '",' +
'"' + col3 + '"'
FROM Table

SSRS report multivalue report filtering and grouping

Let's start with the basics. Here's the simplified structure of the data coming into the report:
ID | Tags
1 |A|
2 |A|B|
3 |B|
4 |A|C|D|
5 |B|D|
6 |D|A|C| --I added this row to show that tags could be in any order
I have a parameter on the report where users can choose one or more tags from a list (A,B,C,D)
Here's the output I'd like to see on the report. It'll be exported into Excel so I'll be using that to describe the desired output.
Sample report output: (Tag parameter selection: A and D)
Worksheet 1 = displays all records => [1,2,3,4,5,6]
Worksheet 2 = displays records that match all tags selected (must have tags for both A AND D!) => [4,6]
Worksheet 3 = displays records that have tag A => [1,2,4,6]
Worksheet 4 = displays records that have tag D => [4,5,6]
**Note: Worksheets 3 and up will show each of the tags selected in a separate worksheet, there could be 1 to N sheets.
Currently in the report I have 3 tables ready to go:
Table 1: Just displays the full query (nice and easy!) and has a PageName="All records"
Table 2: Need to filter full query down to match Worksheet 2 above and will have a PageName="Filtered records" This is problem #1! Looking for ideas on a filter query!
Table 3: Need to group the full query by Tag, but also only displays groups where the tag is in the list of tags selected in the parameter. This is problem #2! Can't just take the filter from Table 2 and then group because records would be missing (such as number 5 for tag D)
Any and all help would be greatly appreciated!!
Additional notes:
Tag delimiter could be changed (I chose | because the data has commas)
Regardless of delimiter, tags can only come back in one column (delimited list) due to aggregation in other columns
There are several questions asked here. I'll deal with the one that will make you queries simple first.
I'm not sure if I've met your criteria as I didn't understand some of what you said but anyway, it might point you in the right direction.
Create a split function in your database if you don't already have one
If you don't have one, you can use this one I created years ago. It's not perfect but does the job for me.
CREATE FUNCTION [fnSplit](#sText varchar(8000), #sDelim varchar(20) = ' ')
RETURNS #retArray TABLE (idx smallint Primary Key, value varchar(8000))
AS
BEGIN
DECLARE #idx smallint,
#value varchar(8000),
#bcontinue bit,
#iStrike smallint,
#iDelimlength tinyint
IF #sDelim = 'Space'
BEGIN
SET #sDelim = ' '
END
SET #idx = 0
SET #sText = LTrim(RTrim(#sText))
SET #iDelimlength = DATALENGTH(#sDelim)
SET #bcontinue = 1
IF NOT ((#iDelimlength = 0) or (#sDelim = 'Empty'))
BEGIN
WHILE #bcontinue = 1
BEGIN
--If you can find the delimiter in the text, retrieve the first element and
--insert it with its index into the return table.
IF CHARINDEX(#sDelim, #sText)>0
BEGIN
SET #value = SUBSTRING(#sText,1, CHARINDEX(#sDelim,#sText)-1)
BEGIN
INSERT #retArray (idx, value)
VALUES (#idx, #value)
END
--Trim the element and its delimiter from the front of the string.
--Increment the index and loop.
SET #iStrike = DATALENGTH(#value) + #iDelimlength
SET #idx = #idx + 1
SET #sText = LTrim(Right(#sText,DATALENGTH(#sText) - #iStrike))
END
ELSE
BEGIN
--If you can't find the delimiter in the text, #sText is the last value in
--#retArray.
SET #value = #sText
BEGIN
INSERT #retArray (idx, value)
VALUES (#idx, #value)
END
--Exit the WHILE loop.
SET #bcontinue = 0
END
END
END
ELSE
BEGIN
WHILE #bcontinue=1
BEGIN
--If the delimiter is an empty string, check for remaining text
--instead of a delimiter. Insert the first character into the
--retArray table. Trim the character from the front of the string.
--Increment the index and loop.
IF DATALENGTH(#sText)>1
BEGIN
SET #value = SUBSTRING(#sText,1,1)
BEGIN
INSERT #retArray (idx, value)
VALUES (#idx, #value)
END
SET #idx = #idx+1
SET #sText = SUBSTRING(#sText,2,DATALENGTH(#sText)-1)
END
ELSE
BEGIN
--One character remains.
--Insert the character, and exit the WHILE loop.
INSERT #retArray (idx, value)
VALUES (#idx, #sText)
SET #bcontinue = 0
END
END
END
RETURN
END
This function just splits your delimited strings into it's components as a table.
We can then use CROSS APPLY to give us a result set that should be easier to work with. As an example I recreated your sample data then used CROSS APPLY like this...
DECLARE #t table(ID int, Tags varchar(100))
INSERT INTO #t VALUES
(1,'|A|'),
(2,'|A|B|'),
(3,'|B|'),
(4,'|A|C|D|'),
(5,'|B|D|'),
(6,'|D|A|C|')
SELECT * FROM #t t
CROSS APPLY fnSplit(Tags,'|') f
WHERE f.Value != ''
This gives us this output
ID Tags idx value
1 |A| 1 A
2 |A|B| 1 A
2 |A|B| 2 B
3 |B| 1 B
4 |A|C|D| 1 A
4 |A|C|D| 2 C
4 |A|C|D| 3 D
5 |B|D| 1 B
5 |B|D| 2 D
6 |D|A|C| 1 D
6 |D|A|C| 2 A
6 |D|A|C| 3 C
To get all records just do
SELECT DISTINCT t.* FROM #t t
CROSS APPLY fnSplit(Tags,'|') f
WHERE f.Value != ''
To get the filtered records, assuming you have a parameter called #pTags then change the dataset statement to something like
SELECT DISTINCT t.ID, f.Value FROM #t t
CROSS APPLY fnSplit(Tags,'|') f
WHERE f.Value != ''
and f.Value IN (#pTags)
As long as this is directly in your dataset query and the parameter is multi-value then this should filter correctly, use DISTINCT if required.

Modify and Check Data while using Bulk Insert from CSV file - SQL

I'm using method below inserting Data from csv file into SQL.
BULK
INSERT tblMember
FROM 'F:\target.txt'
WITH
(
DATAFILETYPE='widechar',
CODEPAGE = 'ACP',
FIELDTERMINATOR = ';',
ROWTERMINATOR = '\n',
ERRORFILE = 'C:\CSVDATA\ErrorRows.csv',
)
GO
I need to do two things. First check if All Chars in Column One of CSV file for each row are only Digit and if yes Insert it. and Also I need to add a specific Word before these chars while inserting.
01 - 123,M,A,USA
02 - H24,N,Z,USA
I need to only insert row one, Because Column One is only Digit numbers '123', and I need to add "D" before this numbers and then insert it into SQL. so we have something like this is SQL after insertion:
"D123","M","A","USA"
Possible?
Lets consider a sample CSV(in C Drive) file target-c.txt which contain four lines of data.(Notice i have use target-c.txt not target.txt)
123,M,A,USA
H24,N,Z,USA
H25,N,V,USA
456,M,U,USA
Now create a Non-XML Format File(in C Drive) named it targetFormat.fmt. and populate the file in following way
9.0
4
1 SQLCHAR 0 100 "," 1 Col1 SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 100 "," 2 Col2 SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 100 "," 3 Col3 ""
4 SQLCHAR 0 100 "\r\n" 4 Col4 SQL_Latin1_General_CP1_CI_AS
Please Be Careful with this formatting.Click this Link if you want to read more about Non-XML Format File.The basic example would be like this.
Please change the format file according to your need.(like DataType, ChaterLength etc.)
I have created a sample table tblMember (please change according to your way, like column name , datatype etc. Remember you have to change the targetFormat.fmt file too)
CREATE TABLE tblMember
(
Col1 nvarchar(50),
Col2 nvarchar(50) ,
Col3 nvarchar(50) ,
Col4 nvarchar(50)
);
Then Use the following query for bulk-insert according to your way(its add a character "D" in front of Col1 with integer value)
INSERT INTO tblMember(Col1,Col2,Col3,Col4)
(
select 'D'+t1.Col1 AS Col1,t1.Col2,t1.Col3,t1.Col4
from openrowset(bulk 'C:\target-c.txt'
, formatfile = 'C:\targetFormat.fmt'
, firstrow = 1) as t1
where t1.Col1 not like '%[^0-9]%' --Not Like Letter Number mixed (123, 456)
UNION
select t1.Col1,t1.Col2,t1.Col3,t1.Col4
from openrowset(bulk 'C:\target-c.txt'
, formatfile = 'C:\targetFormat.fmt'
, firstrow = 1) as t1
where t1.Col1 like '%[^0-9]%'--Like Letter Number mixed (H24, H25)
)
Now if you select your table you will get this (i have tried and its working fine)
Here is your answer You can ordered the column if you want to. its very easy, just hold the query in a first bracket and order it or format it according your way.

How can I CUT a specific string part to another column in SQL?

I have about 500 records in a table with an nvarchar column.
I want to cut a part of that data into another column. by "cut" I mean deleting it in the original column and add it to the target column.
All the data that has to be cut is contained within brackets. The bracketed text may occur anywhere in the string.
For example, ColumnA has: SomeTest Data [I want to cut this], and I want to move [I want to cut this] (but without the brackets) to ColumnB.
How do I achieve this?
UPDATE
Eventually found it out. The problem was that I didn't escaped my brackets.
What I have now (and works):
UPDATE TableA
SET TargetColumn = substring(SourceColumn,charindex('[',SourceColumn)+1,charindex(']',SourceColumn)-charindex('[',SourceColumn)-1),
SourceColumn = substring(SourceColumn, 0, charindex('[',SourceColumn))
where TableA.SourceColumn like '%\[%\]%' ESCAPE '\'
An UPDATE statement along these lines would do it:
CREATE TABLE #Test
(
StringToCut VARCHAR(100)
,CutValue VARCHAR(100)
)
INSERT #Test
VALUES
('SomeTest Data 1 [I want to cut this 1] More Testing',NULL),
('SomeTest Data 2 [I want to cut this 2]',NULL),
('SomeTest Data 3 [I want to cut this 3] Additional Test',NULL),
('[I want to cut this 4] last test',NULL)
SELECT * FROM #Test
--Populate CutValue column based on starting position of '[' and ending position of ']'
UPDATE #Test
SET CutValue = SUBSTRING(StringToCut,CHARINDEX('[',StringToCut),(CHARINDEX(']',StringToCut)-CHARINDEX('[',StringToCut)))
--Remove the '[' ']'
UPDATE #Test
SET CutValue = REPLACE(CutValue,'[','')
UPDATE #Test
SET CutValue = REPLACE(CutValue,']','')
--Remove everything after and including '[' from StringToCut
UPDATE #Test
SET StringToCut = LEFT(StringToCut,CHARINDEX('[',StringToCut)-1) + LTRIM(RIGHT(StringToCut,LEN(StringToCut)-CHARINDEX(']',StringToCut)))
SELECT * FROM #Test
DROP TABLE #Test
You left some questions unanswered.
How do you want to handle NULL values? I am leaving them NULL.
Where should the 'cut' string go? I am assuming "at the end"
What do you do if you find nested brackets? [[cut me]]
Do you need to remove any surrounding spaces? For example, does "The cat [blah] sleeps" become "The cat**sleeps" with two spaces before "sleeps"?
To make the operation atomic, you'll want to use a single UPDATE.
Here is a sample script to get you started.
--build a temp table with sample data
declare #t table(ikey int, sourcecolumn nvarchar(100), targetcolumn nvarchar(100));
insert into #t
select 0,'SomeTest Data [I want to cut this]','Existing Data For Row 1'
union select 1,'SomeTest [cut this too] Data2','Existing Data For Row 2'
union select 2,'[also cut this please] SomeTest Data3',null
union select 3,null,null
union select 4,null,''
union select 5,'Nested bracket example [[[within nested brackets]]] Other data',null
union select 6,'Example with no brackets',null
union select 7,'No brackets, and empty string in target',''
--show "before"
select * from #t order by ikey
--cut and paste
update #t
set
targetcolumn =
isnull(targetcolumn,'') +
case when 0 < isnull(charindex('[',sourcecolumn),0) and 0 < isnull(charindex(']',sourcecolumn),0)
then substring(sourcecolumn,charindex('[',sourcecolumn)+1,charindex(']',sourcecolumn)-charindex('[',sourcecolumn)-1)
else ''
end
,sourcecolumn =
case when sourcecolumn is null
then null
else substring(sourcecolumn,0,charindex('[',sourcecolumn)) + substring(sourcecolumn,charindex(']',sourcecolumn)+1,len(sourcecolumn))
end
where sourcecolumn like '%[%'
and sourcecolumn like '%]%'
--show "after"
select * from #t order by ikey
And another one in single update statement -
CREATE TABLE #Test
(
StringToCut VARCHAR(50)
,CutValue VARCHAR(50)
)
INSERT #Test
VALUES
('SomeTest Data 1 [I want to cut this 1]',NULL),
('SomeTest Data 2 [I want to cut this 2]',NULL),
('SomeTest Data 3 [I want to cut this 3]',NULL),
('SomeTest Data 4 [I want to cut this 4]',NULL)
UPDATE #Test
SET CutValue =
SUBSTRING(StringToCut, CHARINDEX('[', StringToCut)+1, CHARINDEX(']', StringToCut) - CHARINDEX('[', StringToCut) - 1)
SELECT * FROM #Test

Importing Maxmind CSV into SQL Server

I have downloaded the GeoLiteCountry CSV file from Maxmind - http://www.maxmind.com/app/geolitecountry. Using the format given to me as standard (so that this can become an automated task) I am attempting import all the data into a table.
I created a new table IPCountries2 which has columns exactly matching the columns provided:
FromIP varchar(50),
ToIP varchar(50),
BeginNum bigint,
EndNum bigint,
CountryCode varchar(50),
CountryName varchar(250)
Using the various chunks of code I could find, I was unable to get it working using the field terminator and row terminator:
BULK
INSERT CSVTest
FROM 'c:\csvtest.txt'
WITH
(
FIELDTERMINATOR = '","',
ROWTERMINATOR = '\n'
)
GO
The result of this was a single row inserted, all correct except the last one had overflowed with the next lines (presumably the whole database if I didn't have a limit). Also, the first cell had a quote at the start.
I looked around and found something called a format file (never used these). Made one which looks like:
10.0
6
1 SQLCHAR 0 50 "," 1 FromIP ""
2 SQLCHAR 0 50 "," 2 ToIP ""
3 SQLBIGINT 0 19 "," 3 BeginNum ""
4 SQLBIGINT 0 19 "," 4 EndNum ""
5 SQLCHAR 0 50 "," 5 CountryCode ""
6 SQLCHAR 0 250 "\n" 6 CountryName ""
but this errors on the bigint lines:
Msg 4867, Level 16, State 1, Line 1
Bulk load data conversion error (overflow) for row 1, column 3 (BeginNum).
It does that 10 times and then stops because of maximum error count.
I was able to get the first method working if I took it into Excel and re-saved, this removed the quotes. However, I don't want to rely on this method as I want this to update automatically every week and not have to open and re-save manually.
I don't mind which of the two methods I use ultimately, just so long as it works with a clean file. I had a look at their documentation but they only have code for PHP or MS Access.
Edit
Some lines from the CSV file:
"1.0.0.0","1.0.0.255","16777216","16777471","AU","Australia"
"1.0.1.0","1.0.3.255","16777472","16778239","CN","China"
"1.0.4.0","1.0.7.255","16778240","16779263","AU","Australia"
"1.0.8.0","1.0.15.255","16779264","16781311","CN","China"
"1.0.16.0","1.0.31.255","16781312","16785407","JP","Japan"
"1.0.32.0","1.0.63.255","16785408","16793599","CN","China"
"1.0.64.0","1.0.127.255","16793600","16809983","JP","Japan"
"1.0.128.0","1.0.255.255","16809984","16842751","TH","Thailand"
"1.1.0.0","1.1.0.255","16842752","16843007","CN","China"
"1.1.1.0","1.1.1.255","16843008","16843263","AU","Australia"
"1.1.2.0","1.1.63.255","16843264","16859135","CN","China"
"1.1.64.0","1.1.127.255","16859136","16875519","JP","Japan"
"1.1.128.0","1.1.255.255","16875520","16908287","TH","Thailand"
Update
After some persisting I was able to get things working 95% with the original method (without the format document). However, it was changed slightly to look like so:
BULK INSERT IPCountries2
FROM 'c:\Temp\GeoIPCountryWhois.csv'
WITH
(
FIELDTERMINATOR = '","',
ROWTERMINATOR = '"'
)
GO
Everything goes in the right fields as they should, the only issue I have is in the first column there is a quote at the beginning. Some sample data:
FromIP ToIP BeginNum EndNum CountryCode Country
"2.21.248.0 2.21.253.255 34994176 34995711 FR France
"2.21.254.0 2.21.254.255 34995712 34995967 EU Europe
"2.21.255.0 2.21.255.255 34995968 34996223 NL Netherlands
Success. Searching around and some help from another forum finally got me to my solution. For those in need of a similar solution, keep reading:
I ended up using the format file method - whether it would be possible to use fieldterminators and row terminators I'm not sure.
My SQL code looks like:
CREATE TABLE #TempTable
(
DuffColumn varchar(50),
FromIP varchar(50),
ToIP varchar(50),
BeginNum bigint,
EndNum bigint,
CountryCode varchar(50),
CountryName varchar(250)
)
BULK
INSERT #TempTable
FROM 'c:\Temp\GeoIPCountryWhois.csv'
WITH
(
FORMATFILE = 'C:\Temp\format.fmt'
)
INSERT INTO IPCountries2 (FromIP, ToIP, BeginNum, EndNum, CountryCode, Country)
SELECT FromIP, ToIP, BeginNum, EndNum, CountryCode, CountryName FROM #TempTable
As found in my research, it was necessary to have a useless column which simply captured the first quote.
My format file looks like:
10.0
7
1 SQLCHAR 0 1 "" 1 DuffColumn ""
2 SQLCHAR 0 50 "\",\"" 2 FromIP ""
3 SQLCHAR 0 50 "\",\"" 3 ToIP ""
4 SQLCHAR 0 19 "\",\"" 4 BeginNum ""
5 SQLCHAR 0 19 "\",\"" 5 EndNum ""
6 SQLCHAR 0 50 "\",\"" 6 CountryCode ""
7 SQLCHAR 0 250 "\"\n" 7 CountryName ""
To note, despite eventually being stored as a BIGINT, BeginNum and EndNum are both passed in as SQLCHARS, otherwise the insert does an odd multiplication on the numbers (something about reading it as bytes rather than digits, I didn't entirely understand it).
And that's about it. The last thing to automate this script fully is to truncate the table first so as to clear out old records. However that might not be to everyones needs.
Try this command. All I did is remove the double quotes from your FIELDTERMINATOR:
BULK
INSERT CSVTest
FROM 'c:\csvtest.txt'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
GO
Your data fields are actually terminated by commas, not commas wrapped in quotes. I also suggest building a staging/import table match the datatypes of your source file exactly, which in this case would look like:
FromIP varchar(50),
ToIP varchar(50),
BeginNum varchar(50),
EndNum varchar(50),
CountryCode varchar(50),
CountryName varchar(250)
Your source data for BeginNum and EndNum is actually string, not bigint. You can convert this data once you have it imported into your staging table.
declare #sql varchar(1000)
declare #filename varchar(100) = 'C:\Temp\GeoIPCountryWhois.csv'
set #sql =
'BULK INSERT geoip FROM ''' + #filename + '''
WITH
(
CHECK_CONSTRAINTS,
FIELDTERMINATOR = '','',
ROWTERMINATOR = ''' + char(0x0A) + '''
)'
exec (#sql)

Resources