There is an example csv file:
category,fruits,cost
'Fruits','Apple,banana,lemon','10.58'
When I import this csv into SQL Server 2014
by clicking the database in "Object explorer"=>Task=>Import data.
No matter how I play around with column delimiter options, the row 2 will always become
5 columns (Fruits,Apple,banana,lemon,10.58) instead of the desired 3 columns
('Fruits','Apple,banana,lemon','10.58'). (So I want 'Apple,banana,lemon' to be in one column.)
The solution here How do I escape a single quote in SQL Server? doesn't work. Any guru could enlighten? Python, Linux bash, SQL or simple editor tricks are welcome! Thank you!
No matter how I play around with column delimiter options
That's not the option you need to play with - it's the Text Qualifier:
And it now imports easily.
Related
'bcp DBName..vieter out c:\test003.txt -c -T -t"\",\"" -r"\"\n\"" -S SERVER'
The above field terminator (ie. -t"","" -r""\n"") works and gets all .csv data fields surrounded by quotation marks.
However, one of the fields in the data stores written articles which have quotes themselves. When I import the data to a database it doesn't copy perfectly because the parser is interpreting quotations within the articles as terminated fields. Is there an easy fix to this?
I tried multiple variations of options for 'FIELDS TERMINATED BY' 'ENCLOSED BY' and 'ESCAPED BY' but can't seem to get the files to import perfectly.
This is the query structure, in case you aren't familiar with it:
LOAD DATA LOCAL INFILE '/home/myinfotel/dump_new/NewsItemImages.csv' INTO TABLE NewsItemImages FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n';
And here is an example record that isn't copying correctly into the database:
"CP1000066268","BX101-1219_2016_205649.jpg","FILE - In this Monday, Dec. 19, 2016 file photo Maine Republican Gov. Paul LePage, right, and House Speaker Sara Gideon, D-Freeport, attend the Electoral College vote at the State House in Augusta, Maine. LePage says he had weight loss surgery and jokes that now "there's 50 less pounds of me to hate." The Republican revealed the bariatric surgery for the first time Wednesday, Jan. 11. He says he underwent the procedure on Sept. 29 and returned to work a day later. (AP Photo/Robert F. Bukaty)","","100","69","650","447","0","","","1","","1","0","live","2017-01-11 16:56:18.000"
Any help is appreciated!
Turns out exporting from Sql Server Management Studio was the solution (or I suppose using the sqlcmd command in the command line would also do the trick). Bizarre. I tried exporting to a CSV with both bcp, and Excel (it's built in Get Data function), but all to no avail.
All I did was connect to the database in SSMS, spit out the table in a query, and "Save Results As" into a .csv. Everything is parsed perfectly now. Here is a link that I used to guide me for this solution.
https://blog.devart.com/how-to-export-sql-server-data-from-table-to-a-csv-file.html
I am importing some Excel spreadsheets into a MS SQL Server. I load the spreadsheets, cleanse the data and then export it to SQL using Alteryx. Some files have text columns where the cells span multiple lines (i.e. with new line characters, like when you press ALT + ENTER in Excel). When I export the tables to SQL and then query the table, I see lots of '_x000D_' which are not in the original file.
Is it some kind of newline character encoding? How do I get rid of it?
I haven't been able to replicate the error. The original file contains some letters with accents (à á etc); I created multi-line spreadsheets with accented letters, but I managed to export these to SQL just fine, with no 'x000D'.
If these were CSV files I would think of character encoding, but Excel spreadsheets? Any ideas? Thanks!
I know this is old, but: if you're using Alteryx, just run it through the "Data Cleansing" tool as the last thing prior to your export to SQL. For the field in question, tell the tool to remove new lines by checking the appropriate checkbox.
If that still doesn't work... 0x000D is basically ASCII 13; (Hex "D" = Int 13)... so try running your data through a regular Formula tool, and for the [field] in question, just use the expression Replace([field],CharFromInt(13),""), which should remove that character by replacing it with the empty string.
This worked for me:
REGEX_REPLACE([field],"_x000D_","")
I extracted some 10 tables in CSV with " as the text qualifier. Problem is my extract does not look right in Excel because of special characters in a few columns. Some columns are breaking into a new row when it should stay in the column.
I've been doing it manually using the management studio export feature, but what's the best extract the 10 tables to CSV with the double quote qualifier using a script?
Will I have to escape commas and double quotes? Best way to do this?
How should I handle newline codes in my columns, we need them for migration to a new system, but the PM wants to open the files and make modifications using Excel. Can they have it both ways?
I understand that much of the problem is that Excel is interpreting the file where a load utility into another database might not do anything special with new line, but what about double quotes and commas in the data, if I don't care about excel, must I escape that?
Many Thanks.
If you are using SQL Server 2005 or later, the export wizard will export the excel file out for you.
Right click the database, select Tasks-> Export Data...
Set the source to be the database.
Set the destination to excel.
At the end of the wizard, select the option to create an SSIS package. You can then create a job to execute the package on a schedule or on demand.
I'd suggest never using commas for your delimiter - they show up too frequently in other places. Use a tab, since a tab isn't too easy to include in Excel tables.
Make sure you never start a field with a space unless you want that space in the field.
Try changing your text lf's into the literal text \n. That is:
You might have:
0,1,"Line 1
Line 2", 3
I suggest you want:
0 1 "Line 1\nLine 2" 3
(assuming the spacing between lines are tabs)
Good luck
As far as I know, you cannot have new line in csv columns. If you know a column could have comma, double quotes or new line, then you can use this SQL statement to extract the value as valid csv
SELECT '"' + REPLACE(REPLACE(REPLACE(CAST([yourColumnName] AS VARCHAR(MAX)), '"', '""'), char(13), ''), char(10), '') + '"' FROM yourTable.
I am working with SQL Server 2008. My task is to investigate the issue where FTS cannot find the right result for Thai.
First, I have the table which enables the FTS on the column 'ItemName' which is nvarchar. The Catalog is created with the Thai Language. Note that the Thai language is one of the languages that doesn't separate the word by spaces, so 'หลวง' 'พ่อ' 'โสธร' are written like this in a sentence: 'หลวงพ่อโสธร'
In the table, there are many rows that include the word (โสธร); for example row#1 (ItemName: 'หลวงพ่อโสธร')
On the webpage, I try to search for 'โสธร' but SQL Server cannot find it.
So I try to investigate it by trying the following query in SQL Server:
select * from sys.dm_fts_parser(N'"หลวงพ่อโสธร"', 1054, 0, 0)
...to see how the words are broken. The first one is the text to be broken. The second parameter is to specify that we're using Thai (WorkBreaker, so on). Here is the result:
row#1 (display_item: 'ງลวง', source_item: 'หลวงพ่อโสธร')
row#2 (display_item: 'พຝโส', source_item: 'หลวงพ่อโสธร')
row#3 (display_item: 'ธร', source_item: 'หลวงพ่อโสธร')
Notice that the first and second row display the wrong display_item 'ງ' in the 'ງลวง' isn't even Thai characters. 'ຝ' in 'พຝโส' is not a Thai character either.
So the question is where did those alien characters come from? I guess this why I cannot search for 'โสธร' because the word breaker is broken and keeping the wrong character in the indexes.
Please help!
This should be due to the different Dialect of thai selected while the indexing was applied.
From FTS properties check what is your selected language / culture
I have a CSV file with quote text delimiters. Most of the 90000 rows are fine, but I have a few rows that have a text field that contains both a quote and a comma. For example the fields value would be:
AB",AB
When Delimited this becomes
"AB"",AB"
When SQL 2005 attempts to import this I get errors such as...
Messages
Error 0xc0202055: Data Flow Task: The column delimiter for column "Column 4" was not found.
(SQL Server Import and Export Wizard)
This only seems to happen when a quote and comma are in a text value together. Values like
AB"AB which becomes "AB""AB"
or
AB,AB which becomes "AB,AB"
work fine.
Here are some example rows...
"1464885","LEVER WM","","B","MP17"
"1465075",":PLT-BC !!NOTE!!","","B",""
"1465076","BRKT-STR MTR !NOTE!","","B",""
"1465172",":BRKT-SW MTG !NOTE!","","B","MP16"
"1465388","BUSS BAR !NOTE!","","B","MP10"
"1465391","PLT-BLKHD ""NOTE""","","B","MP20"
"1465564","SPROCKET:13TEETH,74MM OD,66MM","ID W/.25"" SETSCR","B","MP6"
"S01266330002","CABLE:224"",E122/261,8 CO","","B","MP11"
The last row is an example of the problem - the "", causes the error.
I've had MAJOR problems with SSIS. Things that Access, Excel and even DTS seemed to do very well, SSIS chokes on. Variable record-length data is another problem but, yes, these embedded qualifiers are a major problem. Especially if you do not have access to the import files because they're on someone else's server that you pay to gain access to and might even be 4 to 5 GB in size! Cant just to a "replace all" on that every import.
You may want to check into this at Microsoft Downloads called "UnDouble" and here is another workaround you might try.
Seems like with SSIS in SQL Server 2008, the bug is still there. I dont know why they havent addressed this in the parser but its like we went back in time with SSIS in basic import functionality.
UPDATE 11-18-2010: This bug still exists in SSIS. Amazing.
How about just:
Search/replace all "", with ''; (fix all the broken fields)
Search/replace all ;''; with ,"", (to "unfix" properly empty fields.)
Search/replace all '';''; with "","", (to "unfix" properly empty fields which follow a correct encapsulation of embedded delimiters.)
That converts your original to:
"1464885","LEVER WM","","B","MP17"
"1465075",":PLT-BC !!NOTE!!","","B",""
"1465076","BRKT-STR MTR !NOTE!","","B",""
"1465172",":BRKT-SW MTG !NOTE!","","B","MP16"
"1465388","BUSS BAR !NOTE!","","B","MP10"
"1465391","PLT-BLKHD ""NOTE""","","B","MP20"
"1465564","SPROCKET:13TEETH,74MM OD,66MM","ID W/.25"" SETSCR","B","MP6"
"S01266330002","CABLE:224'';E122/261,8 CO","","B","MP11"
Which seems to run the gauntlet fine in SSIS. You may have to step 3 recursively to account for 3 empty fields in a row ('';'';'';, etc.) but the bottom line here is that when you have embedded text qualifiers, you have to either escape them or replace them. Let this be a lesson in your CSV creation processes going forward.
Microsoft says doubled double quotes inside double quote delimited fields just don't work. A fix is planned for the end of 2011...
In the mean time we will have to use workarounds like described in the other answers.
I would just do a search/replace for ", and replace it with ,
Do you have access to the original file?