Related
Every time that I try to import an Excel file into SQL Server I'm getting a particular error. When I try to edit the mappings the default value for all numerical fields is float. None of the fields in my table have decimals in them and they aren't a money data type. They're only 8 digit numbers. However, since I don't want my primary key stored as a float when it's an int, how can I fix this? It gives me a truncation error of some sort, I'll post a screen cap if needed. Is this a common problem?
It should be noted that I cannot import Excel 2007 files (I think I've found the remedy to this), but even when I try to import .xls files every value that contains numerals is automatically imported as a float and when I try to change it I get an error.
http://imgur.com/4204g
SSIS doesn't implicitly convert data types, so you need to do it explicitly. The Excel connection manager can only handle a few data types and it tries to make a best guess based on the first few rows of the file. This is fully documented in the SSIS documentation.
You have several options:
Change your destination data type to float
Load to a 'staging' table with data type float using the Import Wizard and then INSERT into the real destination table using CAST or CONVERT to convert the data
Create an SSIS package and use the Data Conversion transformation to convert the data
You might also want to note the comments in the Import Wizard documentation about data type mappings.
Going off of what Derloopkat said, which still can fail on conversion (no offense Derloopkat) because Excel is terrible at this:
Paste from excel into Notepad and save as normal (.txt file).
From within excel, open said .txt file.
Select next as it is obviously tab delimited.
Select "none" for text qualifier, then next again.
Select the first row, hold shift, select the last row, and select the text radial button. Click Finish
It will open, check it to make sure it's accurate and then save as an excel file.
There is a workaround.
Import excel sheet with numbers as float (default).
After importing, Goto Table-Design
Change DataType of the column from Float to Int or Bigint
Save Changes
Change DataType of the column from Bigint to any Text Type (Varchar, nvarchar, text, ntext etc)
Save Changes.
That's it.
When Excel finds mixed data types in same column it guesses what is the right format for the column (the majority of the values determines the type of the column) and dismisses all other values by inserting NULLs. But Excel does it far badly (e.g. if a column is considered text and Excel finds a number then decides that the number is a mistake and insert a NULL instead, or if some cells containing numbers are "text" formatted, one may get NULL values into an integer column of the database).
Solution:
Create a new excel sheet with the name of the columns in the first row
Format the columns as text
Paste the rows without format (use CVS format or copy/paste in Notepad to get only text)
Note that formatting the columns on an existing Excel sheet is not enough.
There seems to be a really easy solution when dealing with data type issues.
Basically, at the end of Excel connection string, add ;IMEX=1;"
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\\YOURSERVER\shared\Client Projects\FOLDER\Data\FILE.xls;Extended Properties="EXCEL 8.0;HDR=YES;IMEX=1";
This will resolve data type issues such as columns where values are mixed with text and numbers.
To get to connection property, right click on Excel connection manager below control flow and hit properties. It'll be to the right under solution explorer. Hope that helps.
To avoid float type field in a simple way:
Open your excel sheet..
Insert blank row after header row and type (any text) in all cells.
Mouse Right-Click on the head of the columns that cause a float issue and select (Format Cells), then choose the category (Text) and press OK.
And then export the excel sheet to your SQL server.
This simple way worked with me.
A workaround to consider in a pinch:
save a copy of the excel file, modify the column to format type 'text'
copy the column values and paste to a text editor, save the file (call it tmp.txt).
modify the data in the text file to start and end with a character so that the SQL Server import mechanism will recognize as text. If you have a fancy editor, use included tools. I use awk in cygwin on my windows laptop. For example, I start end end the column value with a single quote, like "$ awk '{print "\x27"$1"\x27"}' ./tmp.txt > ./tmp2.txt"
copy and paste the data from tmp2.txt over top of the necessary column in the excel file, and save the excel file
run the sql server import for your modified excel file... be sure to double check the data type chosen by the importer is not numeric... if it is, repeat the above steps with a different set of characters
The data in the database will have the quotes once the import is done... you can update the data later on to remove the quotes, or use the "replace" function in your read query, such as "replace([dbo].[MyTable].[MyColumn], '''', '')"
I have a table with VARCHAR field that has a value like A "B" C. When I click "Download Results" in the Web UI and inspect the resulting CSV or TSV file, the value returned is A ""B"" C, meaning the sets of quotes have been duplicated. Note that I do not see this issue with a COPY INTO statement (exported to S3).
To replicate this issue easily, you can run the following in a Snowflake Web Console session and download the results to CSV:
SELECT 'A "B" C' AS QUOTE_FIELD ;
Note that double quotes are simply duplicated, so an example of two double quotes (A ""B"" C) would export as A """"B"""" C.
Does anyone know of a way to resolve this unexpected behavior?
In CLI, it produces the expected result.
A "B" C
hey TSV export looks good, but not csv.
Please open a Snowflake Support ticket for this.
Kindly note that the double quote you are noticing in the CSV file is expected. By default, the escape character is a " (double quote) for CSV-formatted files.
When you import the same data into the Snowflake table, you can also specify the escape character as such so that it parses the data as expected and lod in to the table.
For more details, please refer https://docs.snowflake.net/manuals/sql-reference/sql/copy-into-table.html#type-csv
I have a CSV file I am importing through SSIS.Below is an sample of the data in my file
"MEM1001","OTHER","P" ,20101001,20781231,,20781231,20101001,
"Medic","General >21" ,
"A100100" ,"2210",20101001,20781231
I have added , as column delimiter and " as Text Qualifier in the connection manager.
But columns like "P" ,"Medic","General >21" ,"A100100" , are still coming enclosed with double quotes when I preview the data while rest the of the string columns are coming without double quotes.
I am guessing it has something to do with the spaces after the quotes.
Can somebody explain why this is happening and how can i make this columns to come without double quotes while importing the data from file to table.
I just stumbled across this post, I had the same issues, I was trying around and could not find any other solution.
The text qualifier " only works in csv files, when the quote is directly after the colon, no space after the colon and the text identifier/qualifier. I have no idea why.
If you aren't able to fix the input data, an option would be to create a derived column and to replace the double quotes.
This worked for me:
How to replace double quotes in derived column transformation?
Trim(REPLACE(COLA, "\"", ""))
You should also add the Trim(), otherwise you have empty spaces before and maybe after the word. This could be problematic in a merge join (in my case it was).
I don't know why this extra spaces cause this issue.
Here is what I would do. It may not be the best idea, but it should work.
You will need to add script task before data flow task that would replace all " ," and ", " to ",".
Thank you
Why not just go to the Connection Manager for that csv file, click on Columns, and under the Column delimiter box just enter a space followed by a comma? Worked for me.
I have a CSV I'm trying to import into SQL using SSIS packages through code.
A line might look something like this
321,1234,"SOME MACHINE, MACHINE ACCESSORIES 1 1/2"" - 4"""
In this example they're using a double quote to symbolize inches. They are trying to escape the inches double quote with a double quote. SSIS, however, does not honour this escapism and fails.
Is there anyway I can still use the double quote symbol for inches and escape it within the quoted text?
Many suggestions are to replace the double quote with two single quotes. Is this the only work around or can I use some other escape technique?
I've seen people talk about using the Derived Column transformation but in my case SSIS fails at the Flat File Source step and I therefore cannot get to a derived column transform step.
I'm currently running a script task in the control flow, just before the data flow, to manipulate the Csv with some regex's to cleanup the data.
I need the string to be text qualified with the 2 outer double quotes because of potential commas in the description column.
What can I do about the double quotes within the text qualified string?
Wow, I expected to be able to answer with "Just set the text qualifier", but figured you would have already tried that so I tried it before I answered. Surprise, SSIS doesn't support standard CSV files!
Looks like this is a common complaint. There is one comment in there from Microsoft about some samples that may help; Here is the codeplex project, they mentioned that the Regular Expression Flat File Source sample and the Delimited File Reader Source sample in particular may help -- I'm guessing the Delimited File Reader would be more worthwhile.
I ran into a similar problem yesterday.
We got the csv file that using comma , as delimiter and double quote " as text qualifier, but there is a field that contain double quote within double quote(non-escaped double quote within a string).
After spending half day searching, came up with the solution below:
// load the file into a one dimensional string array.
// fullFilePath is the full path + file name.
var fileContent = File.ReadAllLines(fullFilePath);
// Find double quotes within double quotes and replace with a single quote
var fileContentUpdated = fileContent.Select(
x => new Regex(#"(?<!^)(?<!\,)""(?!\,)(?!$)"
).Replace(x, "'")).ToArray();
// write the string array into the csv file.
File.WriteAllLines(fullFilePath, fileContentUpdated);
I don't see any other way than replace the double quote with something else to avoid the issue.
This answer is not applicable to 2005 as referenced here, but in case someone comes across this while searching and is using 2008, this other question appears to have a working answer: SSIS 2008 and Undouble
There is a workaround if in the File connection you remove the " as text qualifier you can remove all the double quotes later with a derived column expression REPLACE(Item_Name,"\"",""). The downside is that you will need to do it for every field
I didn't find a direct way to achieve this so I wrote a script:
Add a Script component to the workflow (make sure to connect the input arrow or it won't recognize the columns)
Right click on the Script component -> Input Columns, change the column Usage Type to READWRITE
Click Ok
Edit the Script, replace double quotes with two single quotes
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
var descr = Row.Description;
Row.Description = Row.Description.Replace("\"", "''");
}
Probably old news now, but this issue was fixed in SQL Server 2012. I was able to import the same file on a 2012 server that failed on my 2008 server.
I have a CSV file with quote text delimiters. Most of the 90000 rows are fine, but I have a few rows that have a text field that contains both a quote and a comma. For example the fields value would be:
AB",AB
When Delimited this becomes
"AB"",AB"
When SQL 2005 attempts to import this I get errors such as...
Messages
Error 0xc0202055: Data Flow Task: The column delimiter for column "Column 4" was not found.
(SQL Server Import and Export Wizard)
This only seems to happen when a quote and comma are in a text value together. Values like
AB"AB which becomes "AB""AB"
or
AB,AB which becomes "AB,AB"
work fine.
Here are some example rows...
"1464885","LEVER WM","","B","MP17"
"1465075",":PLT-BC !!NOTE!!","","B",""
"1465076","BRKT-STR MTR !NOTE!","","B",""
"1465172",":BRKT-SW MTG !NOTE!","","B","MP16"
"1465388","BUSS BAR !NOTE!","","B","MP10"
"1465391","PLT-BLKHD ""NOTE""","","B","MP20"
"1465564","SPROCKET:13TEETH,74MM OD,66MM","ID W/.25"" SETSCR","B","MP6"
"S01266330002","CABLE:224"",E122/261,8 CO","","B","MP11"
The last row is an example of the problem - the "", causes the error.
I've had MAJOR problems with SSIS. Things that Access, Excel and even DTS seemed to do very well, SSIS chokes on. Variable record-length data is another problem but, yes, these embedded qualifiers are a major problem. Especially if you do not have access to the import files because they're on someone else's server that you pay to gain access to and might even be 4 to 5 GB in size! Cant just to a "replace all" on that every import.
You may want to check into this at Microsoft Downloads called "UnDouble" and here is another workaround you might try.
Seems like with SSIS in SQL Server 2008, the bug is still there. I dont know why they havent addressed this in the parser but its like we went back in time with SSIS in basic import functionality.
UPDATE 11-18-2010: This bug still exists in SSIS. Amazing.
How about just:
Search/replace all "", with ''; (fix all the broken fields)
Search/replace all ;''; with ,"", (to "unfix" properly empty fields.)
Search/replace all '';''; with "","", (to "unfix" properly empty fields which follow a correct encapsulation of embedded delimiters.)
That converts your original to:
"1464885","LEVER WM","","B","MP17"
"1465075",":PLT-BC !!NOTE!!","","B",""
"1465076","BRKT-STR MTR !NOTE!","","B",""
"1465172",":BRKT-SW MTG !NOTE!","","B","MP16"
"1465388","BUSS BAR !NOTE!","","B","MP10"
"1465391","PLT-BLKHD ""NOTE""","","B","MP20"
"1465564","SPROCKET:13TEETH,74MM OD,66MM","ID W/.25"" SETSCR","B","MP6"
"S01266330002","CABLE:224'';E122/261,8 CO","","B","MP11"
Which seems to run the gauntlet fine in SSIS. You may have to step 3 recursively to account for 3 empty fields in a row ('';'';'';, etc.) but the bottom line here is that when you have embedded text qualifiers, you have to either escape them or replace them. Let this be a lesson in your CSV creation processes going forward.
Microsoft says doubled double quotes inside double quote delimited fields just don't work. A fix is planned for the end of 2011...
In the mean time we will have to use workarounds like described in the other answers.
I would just do a search/replace for ", and replace it with ,
Do you have access to the original file?