Inserting irregular data into SQL Server - sql-server

I have a file of data which I want to regularly insert into a table in SQL Server 2008. The file contains several fixed width text fields and one field which is free text and very long.
When I try to insert the data via SSIS I get error messages telling me that data has been truncated and if I choose to ignore truncation then the free text field is simply not imported and I get an empty field. If I try to use a bulk insert I get a message saying that I am exceeding the maximum row size of 8060.
The free text field which is causing the problem contains a lot of white space so one option would be to trim it before I insert it but I am not sure how to do this.
I'm afraid I cannot post any sample data as it is of a sensitive, medical nature.
Could anyone suggest a possible solution to this problem?

Related

How to Identify Problematic Data in SSIS CSV to SQL

I have an Data Flow step in an SSIS package that simply reads data from a CSV with 180 or so columns and inserts it into a MS SQL Server table.
It works.
However, there's a CSV file with 110,000+ and it fails. In the Output window in Visual Studio there a message that says:
The data conversion for column "Address_L2" returned status value 2 and status text "The value could not be converted because of a potential loss of data.
In the Flat File Connection Manager Editor, the data type for the column is string [DT_STR] 50. TextQualified is True.
The SQL column with the same name is a varchar(100).
Anyhow, in the Flat File Source Editor I set all Truncation errors to be ignored, so I don't think this has to do with truncation.
My problem is identifying the "offending" data.
In the same Output window it says:
... SSIS.Pipeline: "Staging Table" wrote 93217 rows.
I looked at row 93218 and a few before and after (Notepad++, Excel, SQL) and nothing caught my attention.
So I went ahead and removed rows from the CSV file up to what I thought was the offending row and when I tried the process again I got the same error, but when I look at the last entry that was actually inserted into the SQL table it doesn't match the last, or close to the last rows in the CSV file.
Is it because it doesn't necessarily insert them in the same order?
In any case, how do I know what the actual issue is, especially with a file this size that you can't go through it manually?
You can simply change the length of the column in the flat file connection manager to meet the destination table specifications. Just open the flat file connection manager, go to the Advanced tab and change the column length.
Note that you can select multiple columns and change the data type and length at once
You could add an Error output to the SSIS component which is causing the error (not sure from your question whether it's the flat file source or the Staging Table destination).
Hook up the Error output to "nowhere" (I use the Konesans Trash Destination), activate a data viewer on it, and select just the problem column (along with any thing which helps you identify the row) into the data viewer. Run in Visual Studio, and you'll see which rows are failing.

I have a problem inserting more than 255 chars per column into an Excel file using INSERT INTO OPENROWSET from SQL Server

I am getting an error while exporting data from SQL Server to an already created .xlsx file using openrowset.
It works fine most of times, but when the data comes in of the field as a large string, while inserting into Excel, it shows this error:
The statement has been terminated, string or binary data would be truncated.
Data gets inserted into table, but while inserting in Excel, this error appears. Please help me find a solution.
As the error mentions "data would be truncated", you should be provide a longer string value into a placeholder or field that has a smaller storage size.
For example, the source field may have data type nvarchar(max) and in your SQL development or where a mapping exists, you assing the values into a smaller data size type. For example, in source table you have a string value 5000 characters, but during the process it is assigned to a nvarchar(4000) then a data truncation will occur
I would suggest you to check data mappings in your statements
Regedit: Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\14.0\Access Connectivity Engine\Engines\Excel
In Regedit, set the "TypeGuessRows" value on this path to a value greater than 8, for example 100000.enter image description here

SQL Server field getting truncated

Ok I'm using SQL Server 2008 and have a table field of type VARCHAR(MAX). Problem is that when saving information using Hibernate, the contents of VARCHAR(MAX) field is getting truncated. I don't see any error messages on either the app server or database server.
The content of this field is just a plain text file. The size of this text file is 383KB.
This is what I have done so far to troubleshoot this problem:
Changed the database field from VARCHAR(MAX) to TEXT and same
problem occurs.
Used the SQL Server Profiler and I noticed that the full text
content is being
received by the database server, but for some reason the profiler freezes when trying
to view the SQL with the truncation problem. Like I said, just before it freezes, I
did noticed that the full text file content (383KB) are being received, so it seems
that it might be the database problem.
Has anyone encountered this problem before? Any ideas what causes this truncation?
NOTE: just want to mention that I'm just going into SQL Studio and just copying the TEXT field content and pasting it to Textpad. That's how I noticed it's getting truncated.
Thanks in advance.
Your problem is that you think Management Studio is going to present you with all of the data. It doesn't. Go to Tools > Options > Query Results > SQL Server. If you are using Results to Grid, change "Maximum Characters Retrieved" for "Non XML data" (just note that Results to Grid will eliminate any CR/LF). If you are using Results to Text, change "Maximum number of characters displayed in each column."
You may be tempted to enter more, but the maximum you can return within Management Studio is:
65535 for Results to Grid
8192 for Results to Text
If you really want to see all the data in Management Studio, you can try converting it to XML, but this has issues also. First set Results To Grid > XML data to 5 MB or unlimited, then do:
SELECT CONVERT(XML, column) FROM dbo.table WHERE...
Now this will produce a grid result where the link is actually clickable. This will open a new editor window (it won't be a query window, so won't have execute buttons, IntelliSense, etc.) with your data converted to XML. This means it will replace > with > etc. Here's a quick example:
SELECT CONVERT(XML, 'bob > sally');
Result:
When you click on the grid, you get this new window:
(It does kind of have IntelliSense, validating XML format, which is why you see the squigglies.)
BACK AT THE RANCH
If you just want to sanity check and don't really want to copy all 383K elsewhere, then don't! Just check using:
SELECT DATALENGTH(column) FROM dbo.table WHERE...
This should show you that your data was captured by the database, and the problem is the tool and your method of verification.
(I've since written a tip about this here.)
try using SELECT * FROM dbo.table for XML PATH
I had a similar situation. I have an excel sheet. A couple of columns in the sheet may have more than 255 chars, sometimes even 500. A simple way was to sort the rows of data, placing the rows with the most characters up on top. You actually need just one row. When SQL imports the data, it recognizes the field being more than 255 characters and imports the entire data :)
Otherwise, they suggested using regedit to change a specific value. Didn't want to do that.
Hope this helps

SSIS: Capture Truncation Warning from Flat File Source with "Ignore Failure" Enabled

I have a 2005 SQL Server Integration Services (SSIS) package that is loading delimited flat files into some tables. A very small percentage of records have a text field that is longer than the file format specification says it can be. Rather than try to play an ongoing game of "guess the real maximum length", the customer has requested I just truncate anything over the size in the spec.
I have set the Trunctation event to "Ignore Failure" in the Flat File Source Editor, and that takes care of my extra data. However, it seems to be a completely silent truncation (it does not write any warning to the log). I am concerned that if there is ever a question about what data has been truncated, I have no way to identify it.
What is a simple way to log the fact the truncation happened?
It would be enough to identify that the file had a truncated row in it, but if I could also specify the actual row that would be great. Whether it is captured as part of the built in package logging or I have to make a special call makes no difference to me.
Before you do the actual insert have a conditional split task that takes the records longer than the actual field length and puts them into a logging table. Then you can truncate the data and rejoin them to the orginal path using a Merge or Merge Join transformation.
You can do the truncation yourself as part of the data flow. Set the flat file column width to a value that is very big (larger than any expected values). You can use a conditional split to identify rows that violate the length.
In the data flow path for invalid rows, you can record the information to your log. Then, you can convert the values to the valid length and merge them back with the valid rows. And, finally add the rows to the destination.

SQL Server: how do I find which column/row gave me an error?

Does anyone have any hints on the best way to find the source of an conversion or truncation error such as:
Error converting data type varchar to numeric.
String or binary data would be truncated. The statement has been terminated.
When I'm inserting batches of data, I'll get these type of errors, and it then becomes a educated guessing game as to what column is having problems, and then what row in my data is the culprit. Any advice?
Here is my solution (probably one you've already considered):
Isolate a sample insert query and open in Management Studio.
Comment out the second half of the column inserts. If you still get an error, you definitely have a problem in the first half. Otherwise, it's in the second half.
Keep commenting out half of your search space (a binary search) until you find at least one of the offending columns.
Another thing to do is to pull the data into a separate work table with the wizard (set allthe datalengths to something large like 4000) and then do the following selct on each column:
select max(len(coloumn1)) from worktable
Not only can you see which fields are too big for your data structure, you have the data available to search for all the records which are too big in that field or you know the size that column you are inserting into will need to be to accomodate the data.

Resources