ORA-01401: inserted value too large for column CHAR - database

I'm pretty new to SQL Oracle and my class is going over Bulk Loading at the moment. I pretty much get the idea however I am having a little trouble getting it to read all of my records.
This is my SQL File;
PROMPT Creating Table 'CUSTOMER'
CREATE TABLE CUSTOMER
(CustomerPhoneKey CHAR(10) PRIMARY KEY
,CustomerLastName VARCHAR(15)
,CustomerFirstName VARCHAR(15)
,CustomerAddress1 VARCHAR(15)
,CutomerAddress2 VARCHAR(30)
,CustomerCity VARCHAR(15)
,CustomerState VARCHAR(5)
,CustomerZip VARCHAR(5)
);
Quick and easy. Now This is my Control File to load in the data
LOAD DATA
INFILE Customer.dat
INTO TABLE Customer
FIELDS TERMINATED BY"|"
(CustomerPhoneKey, CustomerLastName, CustomerFirstName, CustomerAddress1 , CutomerAddress2, CustomerCity, CustomerState, CustomerZip)
Then the Data File
2065552123|Lamont|Jason|NULL|161 South Western Ave|NULL|NULL|98001
2065553252|Johnston|Mark|Apt. 304|1215 Terrace Avenue|Seattle|WA|98001
2065552963|Lewis|Clark|NULL|520 East Lake Way|NULL|NULL|98002
2065553213|Anderson|Karl|Apt 10|222 Southern Street|NULL|NULL|98001
2065552217|Wong|Frank|NULL|2832 Washington Ave|Seattle|WA|98002
2065556623|Jimenez|Maria|Apt 13 B|1200 Norton Way|NULL|NULL|98003
The problem is, that only the last record
2065556623|Jimenez|Maria|Apt 13 B|1200 Norton Way|NULL|NULL|98003
is being loaded in. The rest are in my bad file
So I took a look at my log file and the errors I'm getting are
Record 1: Rejected - Error on table CUSTOMER, column CUSTOMERZIP.
ORA-01401: inserted value too large for column
Record 2: Rejected - Error on table CUSTOMER, column CUSTOMERZIP.
ORA-01401: inserted value too large for column
Record 3: Rejected - Error on table CUSTOMER, column CUSTOMERZIP.
ORA-01401: inserted value too large for column
Record 4: Rejected - Error on table CUSTOMER, column CUSTOMERZIP.
ORA-01401: inserted value too large for column
Record 5: Rejected - Error on table CUSTOMER, column CUSTOMERZIP.
ORA-01401: inserted value too large for column
Table CUSTOMER: 1 Row successfully loaded. 5 Rows not loaded due
to data errors. 0 Rows not loaded because all WHEN clauses were
failed. 0 Rows not loaded because all fields were null.
Onto the question. I see that CustomerZip is the problem, and initially I had it as CHAR(5) -- I did this because my understanding of the data type, is that for numeric values like a zip code, I would not be doing arithmetic operations with it, so it would be better to store it as CHAR. Also I did not use VARCHAR2 (5) initially, because seeing as it is a zip code, I don't want the value to vary, It should always be 5. Now maybe I'm just misunderstanding this. So if there is anyone that can clear that up, that would be awesome.
My second question, is "How do I fix this problem?" Given the above understanding of these data types, it doesn't make sense why CHAR(5) NOR VARCHAR2(5) work. As I am getting the same errors for both.
It makes even less sense that one record(the last one) actually works.
Thank you for the help in advance

Your data file has extra, invisible characters. We can't see the original but presumably it was created in Windows and has CRLF new line separators; and you're running SQL*Loader in a UNIX/Linux environment that is only expecting line feed (LF). The carriage return (CR) characters are still in the file, and Oracle is seeing them as part of the ZIP field in the file.
The last line doesn't have a CRLF (or any new-line marker), so on that line - and only that line - the ZIP field is being seen as 5 characters, For all the others it's being seen as six, e.g. 98001^M.
You can read more about the default behaviour in the documentation:
On UNIX-based platforms, if no terminator_string is specified, then SQL*Loader defaults to the line feed character, \n.
On Windows NT, if no terminator_string is specified, then SQL*Loader uses either \n or \r\n as the record terminator, depending on which one it finds first in the data file. This means that if you know that one or more records in your data file has \n embedded in a field, but you want \r\n to be used as the record terminator, then you must specify it.
If you open the data file in an edit like vi or vim, you'll see those extra ^M control characters.
There are several ways to fix this. You can modify the file; the simplest thing to do that is copy and paste the data into a new file created in the environment you'll run SQL*Loader in. There are utilities to convert line endings if you prefer, e.g. dos2unix. Or your Windows editor may be able to save the file without the CRs. You could also add an extra field delimiter to the data file, as Ditto suggested.
Or you could tell SQL*Loader to expect CRLF by changing the INFILE line:
LOAD DATA
INFILE Customer.dat "str '\r\n'"
INTO TABLE Customer
...
... though that will then cause problems if you do supply a file created in Linux, without the CR characters.

There is a utility, dos2unix, that is present on almost all UNIX machines. If you run it, you can output the data file with the DOS/Windows CRLF combination stripped.

Related

Bulk import only works with 1 column csv file

Whenever I try to import a CSV file into sql server with more than one column I get an error (well, nothing is imported). I know the file is terminated fine because it works with 1 column ok if I modify the file and table. I am limiting the rows so it never gets to the end, the line terminator is the correct and valid one (also shown by working when having 1 column only).
All I get is this and no errors
0 rows affected
I've also check all the other various questions like this and they all point to a bad end of file or line terminator, but all is well here...
I have tried quotes and no quotes. For example, I have a table with 2 columns of varchar(max).
I run:
bulk insert mytable from 'file.csv' WITH (FIRSTROW=2,lastrow=4,rowterminator='\n')
My sample file is:
name,status
TEST00040697,OK
TEST00042142,OK
TEST00042782,OK
TEST00043431,BT
If I drop a column then delete the second column in the csv ensuring it has the same line terminator \n, it works just fine.
I have also tried specifying the 'errorfile' parameter but it never seems to write anything or even create the file.
Well, that was embarrassing.
SQL Server in it's wisdom is using \t as the default field terminator for a CSV file, but I guess when the documentation says 'FORMAT = 'CSV'' it's an example and not the default.
If only it produced actual proper and useful error messages...

Loading numeric values incorrectly stored as text in to SQL

I have a .csv file that I have received from a third party - so am unable to sort the issue out at source. The .csv file is separated as follows:
"1","client_name","0","0"
"2","client_name","0","0"
I have successfully loaded the file into a staging table with all columns stored as varchar(255). I am unable to convert column 4 to a decimal(10,0). Which is the datatype of the live destination table.
Strangely enough column 3 and 4 contain only zeros, column 3 can be converted to decimal(10,0) but column 4 cannot.
ISNUMERIC(REPLACE(credit_terms, '"', ''))
Returns 0 for all rows in column 4, though 1 for all rows in column 3.
CAST(REPLACE(credit_terms, '"', '') AS DECIMAL(10,0))
Returns the error:
Msg 8114, Level 16, State 5, Line 1
Error converting data type varchar to numeric.
It is strange that what looks on the surface as an identical column (3) will covert fine, but 4 will not.
I fully understand that storing numbers as text is poor practice, but am unable to resolve this as the data originates from elsewhere.
Following suggestions I have now opened the file in a hex editor. At the end of each line is a line break that looks like this: .. (Od & Oa).
In the hex editor each row looks like this:
"1","client_name","0","0",..
"2","client_name","0","0",..
Ensure that there isn't a hidden space " " character after the last 0; this can happen a lot in excel/csv files. That would cause it to fail the conversion to numeric.
Also, make sure your index is correct. Many things start at a 0 index so are you possibly trying to convert column[4], which relates to the 5th logical column?
I HIGHLY recommend downloading a free trial of UltraEdit, then opening it, open one of the above files, hit Ctrl-H to go into Hex mode which allows you to view the Hex / ASCII code of every character in a file (even the not visible ones), and eyeball whethere there is any characters that can't be seen that would cause the above conversion error.
I've seen this all the time with files generated by mainframes that add characters that hose up a subsequent import.
Also, since you have loaded it in SQL, do a quick SELECT LEN(Column4) to see if it returns anything other than 1.
Even better, do a SELECT Column4 FROM YourTable WHERE ISNUMERIC(Column4) = 0 to return a set with the offending values.
Try using TRY_CAST to find values that won't convert;
select * from MyTable
where TRY_CAST(REPLACE(credit_terms, '"', '') AS DECIMAL(10,0)) is null
After advice from this question I found that the line break for the data was OA and OD.
0A is a line feed
0D is a carriage return
I was then able to replace char(10) - line feed and char(13) - carriage return as follows:
CAST(REPLACE(REPLACE(REPLACE(credit_terms,char(10),''),CHAR(13),''),'"','') AS DECIMAL(10,0))
The column will now convert to a decimal as required. Thank you for all your pointers.

Word wrap issues with SSIS Flat file destination

Background: I need to generate a text file with 5 records each of 1565 character length. This text file is further used to feed the data to a software.
Hence, they are some required fields and optional fields. I created a query with all the fields added together to get one single field. I populated optional fields with a blank.
For example:
Here is the sample input layout for each fields
Field CharLength Required
ID 7 Yes
Name 15 Yes
Address 15 No
DOB 10 Yes
Age 1 No
Information 200 No
IDNumber 13 Yes
and then i generated a query for each unique ID with the above fields into a single row which looks like following:
> SELECT Cast(1 AS CHAR(7))+CAST('XYZ' AS CHAR(15))+CAST('' AS CHAR(15))+CAST('22/12/2014' AS
CHAR(10))+CAST('' AS CHAR(1))+CAST(' AS CHAR(200))+CAST('123456' AS CHAR(13))
UNION
SELECT Cast(2 AS CHAR(7))+CAST('XYZ' AS CHAR(15))+CAST('' AS CHAR(15))+CAST('22/12/2014' AS
CHAR(10))+CAST('' AS CHAR(1))+CAST(''AS CHAR(200))+CAST('123456' AS CHAR(13))
Then, I created an SSIS package to produce the output text file through Flat file destination delimited.
Problem:
Even though the flat file is generated as per the desired length(1565). The text file looks differently when the word wrap is ON or OFF. When Word wrap is off , i get the record in single line. If the Word wrap is on, the line is broken into multiple. the length of the record in either case is same.
Even i tried to use VARCHAR + Space in the query instead of CHAR for each field, but there is no success. Its breaking the line for blank fields.
For example: Cast('' as varchar(1)) + Space(200-len(Cast('' as varchar(1)))) for Information field
Question: How do make it into a single line even though the word wrap is ON.
Since its my first post, please excuse me for format of the question
The purpose of word wrap is to put characters on the next line in instances of overflow rather than creating an extremely horizontal scrolling document.
Word wrap is the additional feature of most text editors, word processors, and web browsers, of breaking lines between words rather than within words, when possible.
Because this is what word wrap is there's nothing you can do to change its behavior. What does it matter anyway? The document should still be parsed as you would expect. Just don't turn word wrap on.
As far as I'm aware, having word wrap on or off has no impact on the document itself, it's simply a presentation option.
Applications parsing a document parse it as if word wrap were off. Something that could throw off parsing is breaks for a new line, but that is a completely different thing from word wrap.

Format fields during bulk insert SQL 2008

I am currently working on a project that requires data from a report generated by third party software to be inserted into a local SQL database. So far I have the data stored as a tab delimited .txt file and the following bulk insert SQL statement:
BULK INSERT ExampleTable
FROM 'c:\temp\Example.txt'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = '\t',
ROWTERMINATOR = '\n'
)
GO
The two problems I am encountering are, quotation marks around any value that includes it's own comma, and money signs in every field that has a dollar amount.
For instance one of the columns of the table is a description field and some of the values come out looking like:
"this is an example description, some more information, I don't know why the author would use commas in the first place here"
I don't care about the description field nearly as much as other fields that include dollar amounts. Each of these fields is already prefixed with a $ sign, so I have to set them as a nvarchar instead of a decimal or a float, which would be A LOT more useful for reporting. Furthermore, when the dollar amount is greater than 1000, the field will also contain a comma, and thus, quotation marks. ex "$1,084.59"
I am familiar with SSMS, but I have never made a format or bcp file (the solutions I have found online).
Any help would be greatly appreciated.
You can use a format file, but only if your metadata remains constant, which it does not appear to be in your case. You state that the dollar amounts are enclosed in quotes only when they exceed 999 and the comma is inserted. A format file would allow you to define per column delimiters such as [,] or [","]. But if that delimiter is shifting throughout your file, you will have to pre-process the file. Text qualifiers themselves are not supported.
For reference:
CSV import in SQL Server 2008
http://jessesql.blogspot.com/2010/05/bulk-insert-csv-with-text-qualifiers.html
i dont see why, but ThiefMaster deleted my answer :-(
probabaly a mistake and he did not check the link, as this link is the full answer to you question, i will try again for the last time here...
Tip: if your CSV file don't have consistent format, for example ON THE SAME COLUMN some of the values are doubleqouted and some not than this blog will help you do it in an easy way (using openrowset in the last step make it a one simple query): http://ariely.info/Blog/tabid/83/EntryId/122/Using-Bulk-Insert-to-import-inconsistent-data-format-using-pure-T-SQL.aspx
There is a new WIKI at: http://social.technet.microsoft.com/wiki based on this blog if you prefer to read from Microsoft site.

Migrating from sql server to Oracle varchar length issues

Im facing a strange issue trying to move from sql server to oracle.
in one of my tables i have column defined by NVARCHAR(255)
after reading a bit i understod that SQL server is counting characters when oracle count bytes.
So i defined my table in oracle as VARCHAR(510) 255*2 = 510
But when using sqlldr to load the data from a tab delimetered text file i get en error indicating some entries had exiceeded the length of this column.
after checking in the sql server using:
SELECT MAX(DATALENGTH(column))
FROM table
i get that the max data length is 510.
I do use Hebrew_CI_AS collationg even though i dont think it changes anything....
I checked in SQL Server also if any of the entries contains TAB but no... so i guess its not a corrupted data....
Any one have an idea?
EDIT
After further checkup i've noticed that the issue is due to the data file (in addition to the issue solved by #Justin Cave post.
I have changed the row delimeter to '^' since none of my data contains this character and '|^|' as column delimeter.
creating a control file as follows:
load data
infile data.txt "str '^'"
badfile "data_BAD.txt"
discardfile "data_DSC.txt"
into table table
FIELDS TERMINATED BY '|^|' TRAILING NULLCOLS
(
col1,
col2,
col3,
col4,
col5,
col6
)
The problem is that my data contain <CR> and sqlldr expecting a stream file there for fails on the <CR>!!!! i do not want to change the data since its a textual data (error messages for examples).
What is your database character set
SELECT parameter, value
FROM v$nls_parameters
WHERE parameter LIKE '%CHARACTERSET'
Assuming that your database character set is AL32UTF8, each character could require up to 4 bytes of storage (though almost every useful character can be represented with at most 3 bytes of storage). So you could declare your column as VARCHAR2(1020) to ensure that you have enough space.
You could also simply use character length semantics. If you declare your column VARCHAR2(255 CHAR), you'll allocate space for 255 characters regardless of the amount of space that requires. If you change the NLS_LENGTH_SEMANTICS initialization parameter from the default BYTE to CHAR, you'll change the default so that VARCHAR2(255) is interpreted as VARCHAR2(255 CHAR) rather than VARCHAR2(255 BYTE). Note that the 4000-byte limit on a VARCHAR2 remains even if you are using character length semantics.
If your data contains line breaks, do you need the TRAILING NULLCOLS parameter? That implies that sometimes columns may be omitted from the end of a logical row. If you combine columns that may be omitted with columns that contain line breaks and data that is not enclosed by at least an optional enclosure character, it's not obvious to me how you would begin to identify where a logical row ended and where it began. If you don't actually need the TRAILING NULLCOLS parameter, you should be able to use the CONTINUEIF parameter to combine multiple physical rows into a single logical row. If you can change the data file format, I'd strongly suggest adding an optional enclosure character.
The bytes used by an NVARCHAR field is equal to two times the number of characters plus two (see http://msdn.microsoft.com/en-us/library/ms186939.aspx), so if you make your VARCHAR field 512 you may be OK. There's also some indication that some character sets use 4 bytes per character, but I've found no indication that Hebrew is one of these character sets.

Resources