Bulk insert into SQL Server a CSV with line breaks in fields - sql-server

I have a csv that looks like this:
"blah","blah, blah, blah
ect, ect","column 3"
"foo","foo, bar, baz
more stuff on another line", "another column 3"
Is it possible to import this directly into SQL server?

Every row in your file finishes with new line (\n) but the actual rows you want to get finishes with quotation mark and new line. Set ROWTERMINATOR in BULK INSERT command to:
ROWTERMINATOR = '"\n'
EDITED: I think the bigger problem will be with commas in the text. SQL Server does not use text enclosures. So the row will be divided on commas without checking if the comma is inside quotation marks or not.
You may do like this:
BULK INSERT newTable
FROM 'c:\file.txt'
WITH
(
FIELDTERMINATOR ='",',
ROWTERMINATOR = '"\n'
)
This will give you the following result:
col1 | col2 | col3
----------------------------------------------------------------
"blah | "blah, blah, blah ect, ect | "column 3
"foo | "foo, bar, baz more stuff on another line | "another column 3
All you have to do is to get rid of the quotation marks on the beginning of each cell.
For example:
UPDATE newTable
SET col1 = RIGHT(col1,LEN(col1)-1),
col2 = RIGHT(col2,LEN(col2)-1),
col3 = RIGHT(col3,LEN(col3)-1)
I think you can also do this using bcp utility with format file

Related

Cannot insert Array in Snowflake

I have a CSV file with the following data:
eno | phonelist | shots
"1" | "['1112223333','6195551234']" | "[[11,12]]"
The DDL statement I have used to create table in snowflake is as follows:
CREATE TABLE ArrayTable (eno INTEGER, phonelist array,shots array);
I need to insert the data from the CSV into the Snowflake table and the method I have used is:
create or replace stage ArrayTable_stage file_format = (TYPE=CSV)
put file://ArrayTable #ArrayTable_stage auto_compress=true
copy into ArrayTable from #ArrayTable_stage/ArrayTable.gz
file_format = (TYPE=CSV FIELD_DELIMITER='|' FIELD_OPTIONALLY_ENCLOSED_BY='\"\')
But when I try to run the code, I get the error:
Copy to table failed: 100069 (22P02): Error parsing JSON:
('1112223333','6195551234')
How to resolve this?
FIELD_OPTIONALLY_ENCLOSED_BY='\"\' base on the row you have that should just be '\"'
select parse_json('[\'1112223333\',\'6195551234\']');
works (the back slashes are to get around the sql parser)
but your output has parens (, ) which is different.
SELECT column2, TRY_PARSE_JSON(column2) as j
FROM #ArrayTable_stage/ArrayTable.gz
file_format = (TYPE=CSV FIELD_DELIMITER='|' FIELD_OPTIONALLY_ENCLOSED_BY='\"\')
WHERE j is null;
will show which values are failing to parse..
failing that you might want to use to_array to parse column2 and thus insert into you table the SELECTED/transformed data, that is failing to auto transform

escape comma in snowflake copy into

COPY INTO #TMP_STG
FROM table
FILE_FORMAT = (
TYPE=CSV
EMPTY_FIELD_AS_NULL = false
FIELD_DELIMITER=','
)
single = false
max_file_size=4900000000;
I'm generating a file from a sf table using COPY INTO, set the delimiter as ','. But there's a column which contains comma in the value, e.g.
col1 col2 col3
CAD Toronto,ON 10
USD Dallas,Texas 10
I was thinking to add ESCAPE = '/' inside FILE_FORMAT, but also noticed it mentions to use with FIELD_OPTIONALLY_ENCLOSED_BY, do I need to use them together? How do I make sure Toronto,ON is in col2 and not divided by delimiter?

SSIS - remove character X unless it's followed by character Y

Let's say I have the following dataset imported from a textfile:
Data
--------------------
1,"John Davis","Germany"
2,"Mike Johnson","Texas, USA"
3,"Bill "The man" Taylor","France"
I am looking for a way to remove every " in the data, unless it's followed or preceded by a ,.
So in my case, the data should become:
Data
--------------------
1,"John Davis","Germany"
2,"Mike Johnson","Texas, USA"
3,"Bill The man Taylor","France"
I tried it with the import tekst file component in SSIS, but that gives an error when I set the column delimiter to ". If I don't set a delimiter, it sees the comma in "Texas, USA" as a split delimiter....
Any suggestions/ideas? The textfile is too large to change this manually for every line so that's not an option.
Bit of a cop-out on the last '"', but:
Create table #test ([Data] nvarchar(max))
insert into #test values ('1,"John Davis","Germany"' )
insert into #test values ('2,"Mike Johnson","Texas, USA"' )
insert into #test values ('3,"Bill "The man" Taylor","France"')
select replace(replace(replace(replace([Data],',"',',~'), '",','~,'),'"', ''),'~','"') + '"'
from #test

SSIS Merge Varying Columns

Using SSIS, I am importing a .txt file, which for the most part if straight forward.
The file being imported has a set amount of columns up to a point, but there is a free text/comments field, which can repeat to unknown length, similar to below.
"000001","J Smith","Red","Free text here"
"000002","A Ball","Blue","Free text here","but can","continue"
"000003","W White","Green","Free text here","but can","continue","indefinitely"
"000004","J Roley","Red","Free text here"
What I would ideally like to do (within SSIS) is to keep the first three columns as singular columns, but to merge any free-text ones into a single column. i.e. Merge/concatenate anything which appears after the 'colour' column.
So when I load this into an SSMS table, it appears like:
000001 | J Smith | Red | Free text here |
000002 | A Ball | Blue | Free text here but can continue |
000003 | W White | Green | Free text here but can continue indefinitely |
000004 | J Roley | Red | Free text here |
I do not see any easy solution. You can try something like below:
1. Load the complete raw data to a temp table (without any delimiter):
Steps:
Create temp table in Execute SQL Task
Create a data flow task, with flat file source (with Ragged Right format) and
OLEDB destination (usint #temp table create in previous task)
Set the delayValidation=True for connection manager and DFT
Set retainSameConnection=True for connection manager
Refer this to create temp table and using it.
2. Create T-SQL to separate the 3 columns (something like below)
with col1 as (
Select
[Val],
substring([Val], 1 ,charindex(',', [Val]) - 1) col1,
len(substring([Val], 1 ,charindex(',', [Val]))) + 1 col1Len
from #temp
), col2 as (
select
[Val],
col1,
substring([Val], col1Len, charindex(',', [Val], col1Len) - col1Len) as col2,
charindex(',', [Val], col1Len) + 1 col2Len
from col1
) select col1, col2, substring([Val], col2Len, 200) as col3
from col2
T-SQL Output:
col1 col2 col3
"000001" "J Smith" "Red","Free text here"
"000002" "A Ball" "Blue","Free text here","but can","continue"
"000003" "W White" "Green","Free text here","but can","continue","indefinitely"
3. Use the above query in OLEDB source in different data flow task
Replace double quotes (") as per your requirement.
This was a fun exercise:
Add a data flow
Add a Script Component (select Source)
Add 4 columns to Outputs ID, Name Color , FreeText all type string
edit script:
Paste the following namespaces up top:
using System.Text.RegularExpressions;
using System.Linq;
Paste the following code into CreateNewOutputRows:
string strPath = #"a:\test.txt"; \\put your file path in here
var lines = System.IO.File.ReadAllLines(strPath);
foreach (string line in lines)
{
//Code I stole to read CSV
string delimeter = ",";
Regex rgx = new Regex(String.Format("(\"[^\"]*\"|[^{0}])+", delimeter));
var cols = rgx.Matches(line)
.Cast<Match>()
.Select(m => m.Value.Trim().Trim('"'))
.Where(v => !string.IsNullOrWhiteSpace(v));
//create a column counter
int ctr = 0;
Output0Buffer.AddRow();
//Preset FreeText to empty string
string FreeTextBuilder = String.Empty;
foreach( string col in cols)
{
switch (ctr)
{
case 0:
Output0Buffer.ID = col;
break;
case 1:
Output0Buffer.Name = col;
break;
case 2:
Output0Buffer.Color = col;
break;
default:
FreeTextBuilder += col + " ";
break;
}
ctr++;
}
Output0Buffer.FreeText = FreeTextBuilder.Trim();
}

Store line number of text file in table with bulk insert in tsql

I have a text in the file like this :
In the name of God, the Mercy-giving, the Merciful! (1)
Praise be to God, Lord of the Universe, (2)
the Mercy-giving, the Merciful (3)
Ruler on the Day for Repayment! (4)
...
then, a table with 2 field like this :
TextData Varchar(max) NoNullable
Id Int NoNullable Identity(1,1)
When Use this query :
bulk insert MyDb.dbo.translation
FROM 'd:\IRVING.Arb'
WITH
(
ROWTERMINATOR ='\n',
codepage='1256'
)
I get (0 row(s) affected)
But, When I delete Id column from table , all line data copy it.
How can I store line number of file in Id columns ?
The format of this file is really specific and you have to specify a feature called "format file" so SQL Server knows the mapping of your file (what part of the file belongs to what sql column)
You could probably use something like this :
C:\test_format.fmt :
9.0
4
1 SQLCHAR 0 100 "(" 1 Text
2 SQLINT 0 12 ")\r\n" 2 Id
Add this to your bulk insert :
bulk insert MyDb.dbo.translation
FROM 'd:\IRVING.Arb'
WITH
(
codepage='1256',
FORMATFILE = 'C:\test_format.fmt'
)
read more about format files here : http://msdn.microsoft.com/en-us/library/ms178129.aspx

Resources