Import CSV data into SQL Server - sql-server

I have data in the csv file similar to this:
Name,Age,Location,Score
"Bob, B",34,Boston,0
"Mike, M",76,Miami,678
"Rachel, R",17,Richmond,"1,234"
While trying to BULK INSERT this data into a SQL Server table, I encountered two problems.
If I use FIELDTERMINATOR=',' then it splits the first (and sometimes the last) column
The last column is an integer column but it has quotes and comma thousand separator whenever the number is greater than 1000
Is there a way to import this data (using XML Format File or whatever) without manually parsing the csv file first?
I appreciate any help. Thanks.

You can parse the file with http://filehelpers.sourceforge.net/
And with that result, use the approach here: SQL Bulkcopy YYYYMMDD problem or straight into SqlBulkCopy

Use MySQL load data:
LOAD DATA LOCAL INFILE 'path-to-/filename.csv' INTO TABLE `sql_tablename`
CHARACTER SET 'utf8'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'
IGNORE 1 LINES;
The part optionally enclosed by '\"', or escape character and quote, will keep the data in the first column together for the first field.
IGNORE 1 LINES will leave the field name row out.
UTF8 line is optional but good to use if names have diacritics, like in José.

Related

Changing .csv delimiter on ADF

I am trying to load a .csv table to MS SQL Server via Azure Data Factory, but I have a problem with the delimiter (;) since it appears as a character in some of the values included in some columns.
As a result, I get an error saying in the details "found more columns than expected column count".
Is there any way to change the delimiter directly on ADF before/while loading the .csv table (ex.: making it from ";" to "|||")?
Thanks in advance!
I have a problem with the delimiter (;) since it appears as a
character in some of the values included in some columns.
As you have quoted that your delimiter is ; but it is occurring as a character in some of the columns which means that there is no specific pattern of the occurrence. Hence, it is not possible in ADF.
The recommendation is to write a program using any preferred language (like python) which will iterate each row from the dataset and write a logic to replace the delimiter to ||| or you can also remove the unrequired ; and append the changes in new file. Later you can ingest this new file in ADF.

Load table issue - BCP from flat file - Sybase IQ

I am getting the below error while trying to do bcp from a flat delimited file into Sybase IQ table.
Could not execute statement.
Non-space text found after ending quote character for an enclosed field.
I couldn't observe any non space text in the file, but this error is stopping me from doing the bulk copy. | is column delimiter with " as text qualifier and \n is row delimiter.
Below is the sample template for the same, am using.
LOAD TABLE TABLE_NAME(a NULL('(null)'),b NULL('(null)'),c NULL('(null)'))
USING CLIENT FILE '/home/...../a.txt' //unix
QUOTES ON
FORMAT bcp
STRIP RTRIM
DELIMITED BY '|'
ROW DELIMITED BY '\n'
When i perform the same query with QUOTES OFF, the load was successful. But, the same query is getting failed with QUOTES ON. I would like to get quotes stripped off, as well.
Sample Data
12345|"abcde"|(null)
12346|"abcdf"|"zxf"
12347|(null)|(null)
12348|"abcdg"|"zyf"
Any leads would be helpful!
If IQ bcp is the same as ASE, then I think those '(null)' fields are being interpreted as strings, not fields that are NULL.
You'd need to stream edit out those (null).
You're on unix so use sed or perl -ne.
E.g. pipe the file through " | perl -pne 's/(null)//g'" to the loading command or filename.
QUOTES OFF might seem to work, but I wonder if when you look in your loaded data, you'll see double quotes inside the 2nd field, and '(null)' where you expect a field to be NULL.

Format fields during bulk insert SQL 2008

I am currently working on a project that requires data from a report generated by third party software to be inserted into a local SQL database. So far I have the data stored as a tab delimited .txt file and the following bulk insert SQL statement:
BULK INSERT ExampleTable
FROM 'c:\temp\Example.txt'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = '\t',
ROWTERMINATOR = '\n'
)
GO
The two problems I am encountering are, quotation marks around any value that includes it's own comma, and money signs in every field that has a dollar amount.
For instance one of the columns of the table is a description field and some of the values come out looking like:
"this is an example description, some more information, I don't know why the author would use commas in the first place here"
I don't care about the description field nearly as much as other fields that include dollar amounts. Each of these fields is already prefixed with a $ sign, so I have to set them as a nvarchar instead of a decimal or a float, which would be A LOT more useful for reporting. Furthermore, when the dollar amount is greater than 1000, the field will also contain a comma, and thus, quotation marks. ex "$1,084.59"
I am familiar with SSMS, but I have never made a format or bcp file (the solutions I have found online).
Any help would be greatly appreciated.
You can use a format file, but only if your metadata remains constant, which it does not appear to be in your case. You state that the dollar amounts are enclosed in quotes only when they exceed 999 and the comma is inserted. A format file would allow you to define per column delimiters such as [,] or [","]. But if that delimiter is shifting throughout your file, you will have to pre-process the file. Text qualifiers themselves are not supported.
For reference:
CSV import in SQL Server 2008
http://jessesql.blogspot.com/2010/05/bulk-insert-csv-with-text-qualifiers.html
i dont see why, but ThiefMaster deleted my answer :-(
probabaly a mistake and he did not check the link, as this link is the full answer to you question, i will try again for the last time here...
Tip: if your CSV file don't have consistent format, for example ON THE SAME COLUMN some of the values are doubleqouted and some not than this blog will help you do it in an easy way (using openrowset in the last step make it a one simple query): http://ariely.info/Blog/tabid/83/EntryId/122/Using-Bulk-Insert-to-import-inconsistent-data-format-using-pure-T-SQL.aspx
There is a new WIKI at: http://social.technet.microsoft.com/wiki based on this blog if you prefer to read from Microsoft site.

SQL Server CSV extract of tables with newline, doublequotes and commas in columns?

I extracted some 10 tables in CSV with " as the text qualifier. Problem is my extract does not look right in Excel because of special characters in a few columns. Some columns are breaking into a new row when it should stay in the column.
I've been doing it manually using the management studio export feature, but what's the best extract the 10 tables to CSV with the double quote qualifier using a script?
Will I have to escape commas and double quotes? Best way to do this?
How should I handle newline codes in my columns, we need them for migration to a new system, but the PM wants to open the files and make modifications using Excel. Can they have it both ways?
I understand that much of the problem is that Excel is interpreting the file where a load utility into another database might not do anything special with new line, but what about double quotes and commas in the data, if I don't care about excel, must I escape that?
Many Thanks.
If you are using SQL Server 2005 or later, the export wizard will export the excel file out for you.
Right click the database, select Tasks-> Export Data...
Set the source to be the database.
Set the destination to excel.
At the end of the wizard, select the option to create an SSIS package. You can then create a job to execute the package on a schedule or on demand.
I'd suggest never using commas for your delimiter - they show up too frequently in other places. Use a tab, since a tab isn't too easy to include in Excel tables.
Make sure you never start a field with a space unless you want that space in the field.
Try changing your text lf's into the literal text \n. That is:
You might have:
0,1,"Line 1
Line 2", 3
I suggest you want:
0 1 "Line 1\nLine 2" 3
(assuming the spacing between lines are tabs)
Good luck
As far as I know, you cannot have new line in csv columns. If you know a column could have comma, double quotes or new line, then you can use this SQL statement to extract the value as valid csv
SELECT '"' + REPLACE(REPLACE(REPLACE(CAST([yourColumnName] AS VARCHAR(MAX)), '"', '""'), char(13), ''), char(10), '') + '"' FROM yourTable.

Commas within CSV Data

I have a CSV file which I am directly importing to a SQL server table. In the CSV file each column is separated by a comma. But my problem is that I have a column "address", and the data in this column contains commas. So what is happening is that some of the data of the address column is going to the other columns will importing to SQL server.
What should I do?
For this problem the solution is very simple.
first select => flat file source => browse your file =>
then go to the "Text qualifier" by default its none write here double quote like (") and follow the instruction of wizard.
Steps are -
first select => flat file source => browse your file => Text qualifier (write only ") and follow the instruction of wizard.
Good Luck
If there is a comma in a column then that column should be surrounded by a single quote or double quote. Then if inside that column there is a single or double quote it should have an escape charter before it, usually a \
Example format of CSV
ID - address - name
1, "Some Address, Some Street, 10452", 'David O\'Brian'
New version supports the CSV format fully, including mixed use of " and , .
BULK INSERT Sales.Orders
FROM '\\SystemX\DiskZ\Sales\data\orders.csv'
WITH ( FORMAT='CSV');
I'd suggest to either use another format than CSV or try using other characters as field separator and/or text delimiter. Try looking for a character that isn't used in your data, e.g. |, #, ^ or #. The format of a single row would become
|foo|,|bar|,|baz, qux|
A well behave parser must not interpret 'baz' and 'qux' as two columns.
Alternatively, you could write your own import voodoo that fixes any problems. For the later, you might find this Groovy skeleton useful (not sure what languages you're fluent in though)
Most systems, including Excel, will allow for the column data to be enclosed in single quotes...
col1,col2,col3
'test1','my test2, with comma',test3
Another alternative is to use the Macintosh version of CSV, which uses TAB's as delimiters.
The best, quickest and easiest way to resolve the comma in data issue is to use Excel to save a comma separated file after having set Windows' list separator setting to something other than a comma (such as a pipe). This will then generate a pipe (or whatever) separated file for you that you can then import. This is described here.
I don't think adding quote could help.The best way I suggest is replacing the comma in the content with other marks like space or something.
replace(COLUMN,',',' ') as COLUMN
Appending a speech mark into the select column on both side works. You must also cast the column as a NVARCVHAR(MAX) to turn this into a string if the column is a TEXT.
SQLCMD -S DB-SERVER -E -Q "set nocount on; set ansi_warnings off; SELECT '""' + cast ([Column1] as nvarchar(max)) + '""' As TextHere, [Column2] As NormalColumn FROM [Database].[dbo].[Table]" /o output.tmp /s "," -W

Resources