How to make SSIS truncate numeric field in Flat File Destination - sql-server

I'd like some advice about handling truncation issues in SSIS. I have a column Col1 which is MONEY in a table. I'd like to output that to a text file (fixed width, ragged right). In the output file, the column which holds Col1 must only be 8 characters wide.
In the OLEDB Data Source, Col1 is specified as:
currency [DT_CY] in both the External Columns and Output Columns tab.
In the Flat File Connection Manager's Advanced tab, Col1 is specified as:
currency [DT_CY], with InputColumnWidth set to 8.
If I populate Col1 with 123456789.00 and execute the task, the OLEDB source succeeds and passes rows to the destination, but the task fails with :
Error: 0xC02020A1 at DFT_Test, FFDEST_Test [3955]: Data conversion
failed. The data conversion for column "Col1" returned status value 4
and status text "Text was truncated or one or more characters had no
match in the target code page.". Error: 0xC02020A0 at DFT_Test,
FFDEST_Test [3955]: Cannot copy or convert flat file data for column
"Col1".
I want to avoid these truncation errors. In the Error Output of the source, I change the Truncation property for Col1 from Fail Component to Ignore Failure. I would have expected that would resolve the issue, but executing the task still gives the same error.
Can someone give some guidance about how to make SSIS simply truncate the column to 8 charactes?

Use a Derived Column task to create a column that is an 8-character string and populate it from the money column. Then in the Destination component, map the Derived Column to the Col1 Destination instead of the original column.
Or, even better, in your source component, use a SQL query that converts your money column to a varchar(8) or char(8) column.

Related

How to pass optional column in TABLE VALUE TYPE in SQL from ADF

I have the following table value type in SQL which is used in Azure Data Factory to import data from a flat file in a bulk copy activity via a stored procedure. File 1 has all three columns in it so this works fine. File 2 only has Column1 and Column2, but NOT Column3. I figured since the column was defined as NULL it would be ok but ADF complains that its attempting to pass in 2 columns when the table type expects 3. Is there a way to reuse this type for both files and make Column3 optional?
CREATE TYPE [dbo].[TestType] AS TABLE(
Column1 varchar(50) NULL,
Column2 varchar(50) NULL,
Column3 varchar(50) NULL
)
Operation on target LandSource failed:
ErrorCode=SqlOperationFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=A
database operation failed with the following error: 'Trying to pass a
table-valued parameter with 2 column(s) where the corresponding
user-defined table type requires 3 column(s)
Would be nice if the copy activity behavior was consistent regardless of whether or not a stored procedure with table type is used or native BCP in the activity. When not using the table type and using the default bulk insert, missing columns in the source file end up being NULL in the target table without error (assumming the column is NULLABLE).
It will cause the mapping error in ADF.
In the Copy Activity, every column needs to be mapped.
If the source file only has two columns, it will cause mapping error.
So, I suggest you to create two different Copy activities and create a two columns table type.
You can pass optional column, I've made a test successfully, but the steps will be a bit complex. In my case, File 1 has all three columns, File 2 only has Column1 and Column2, but NOT Column3. It will use Get Metadata activity, Set Variable activity, ForEach activity, IfCondition activity.
Please follow my steps:
You need to define a variable FileName to foreach.
In the Get Metadata1 activity, I specified the file path.
In the ForEach1 activity, use #activity('Get Metadata1').output.childItems to foreach the filelist. It need to be Sequential.
Inside the ForEach1 activity, use Set Variable1 to set the FileName variable.
In the Get Metadata2, use item().name to specify the file.
In the Get Metadata2, use Column count to get the column count from the file.
In the If Contdition1, use #greater(activity('Get Metadata2').output.columnCount,2) to determine whether the file is larger than two columns.
In the True activity, use variable FileName to specify the file.
In the False activity, use Additional columns to add a Column.
When I run debug, the result shows:

Staged Internal file csv.gz giving error that file does not match size of corresponding table?

I am trying to copy a csv.gz file into a table I created to start analyzing location data for a map. I was running into an error that says that there are too many characters, and I should add a on_error option. However, I am not sure if that will help load the data, can you take a look?
Data source: https://data.world/cityofchicago/array-of-things-locations
SELECT * FROM staged/array-of-things-locations-1.csv.gz
CREATE OR REPLACE TABLE ARRAYLOC(name varchar, location_type varchar, category varchar, notes varchar, status1 varchar, latitude number, longitude number, location_2 variant, location variant);
COPY INTO ARRAYLOC
FROM #staged/array-of-things-locations-1.csv.gz;
CREATE OR REPLACE FILE FORMAT t_csv
TYPE = "CSV"
COMPRESSION = "GZIP"
FILE_EXTENSION= 'csv.gz'
CREAT OR REPLACE STAGE staged
FILE_FORMAT='t_csv';
COPY INTO ARRAYLOC FROM #~/staged file_format = (format_name = 't_csv');
Error message:
Number of columns in file (8) does not match that of the corresponding table (9), use file format option error_on_column_count_mismatch=false to ignore this error File '#~/staged/array-of-things-locations-1.csv.gz', line 2, character 1 Row 1 starts at line 1, column "ARRAYLOC"["LOCATION_2":8] If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more information on loading options, please run 'info loading_data' in a SQL client.
Solved:
The real issue was that I need to better clean the data I was staging. This was my error. This is what I ended up changing: the column types, changing the file from " to ' and had to separate one column due to a comma in the middle of the data.
CREATE OR REPLACE TABLE ARRAYLOC(name varchar, location_type varchar, category varchar, notes varchar, status1 varchar, latitude float, longitude varchar, location varchar);
COPY INTO ARRAYLOC
FROM #staged/array-of-things-locations-1.csv.gz;
CREATE or Replace FILE FORMAT r_csv
TYPE = "CSV"
COMPRESSION = "GZIP"
FILE_EXTENSION= 'csv.gz'
SKIP_HEADER = 1
ERROR_ON_COLUMN_COUNT_MISMATCH=FALSE
EMPTY_FIELD_AS_NULL = TRUE;
create or replace stage staged
file_format='r_csv';
copy into ARRAYLOC from #~/staged
file_format = (format_name = 'r_csv');
SELECT * FROM ARRAYLOC LIMIT 10;
Your error doesn't say that you have too many characters but that your file has 8 columns and your table has 9 columns, so it doesn't know how to align the columns from the file to the columns in the table.
You can list out the columns specifically using a subquery in your COPY INTO statement.
Notes:
Columns from the file are positional based, so $1 is the first column in the file, $2 is the second, etc....
You can put the columns from the file in any order that you need to match your table.
You'll need to find the column that doesn't have data coming in from the file and either fill it with null or some default value. In my example, I assume it is the last column and in it I will put the current timestamp.
It helps to list out the columns of the table behind the table name, but this is not required.
Example:
COPY INTO ARRAYLOC (COLUMN1,COLUMN2,COLUMN3,COLUMN4,COLUMN5,COLUMN6,COLUMN7,COLUMN8,COLUMN9)
FROM (
SELECT $1
,$2
,$3
,$4
,$5
,$6
,$7
,$8
,CURRENT_TIMESTAMP()
FROM #staged/array-of-things-locations-1.csv.gz
);
I will advise against changing the ERROR_ON_COLUMN_COUNT_MISMATCH parameter, doing so could result in data ending up in the wrong column of the table. I would also advise against changing the ON_ERROR parameter as I believe it is best to be alerted of such errors rather than suppressing them.
Yes, setting that option should help. From the documentation:
ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE | FALSE
Use: Data loading only
Definition: Boolean that specifies whether to generate a parsing error
if the number of delimited columns (i.e. fields) in an input file does
not match the number of columns in the corresponding table.
If set to FALSE, an error is not generated and the load continues. If
the file is successfully loaded:
If the input file contains records with more fields than columns in
the table, the matching fields are loaded in order of occurrence in
the file and the remaining fields are not loaded.
If the input file contains records with fewer fields than columns in
the table, the non-matching columns in the table are loaded with NULL
values.
This option assumes all the records within the input file are the same
length (i.e. a file containing records of varying length return an
error regardless of the value specified for this parameter).
So assuming you are okay with getting NULL values for the missing column in your input data, you can use ERROR_ON_COLUMN_COUNT_MISMATCH=FALSE to load the file successfully.
When viewing that table directly on data.world, there are columns named both location and location_2 with identical data. It looks like that display is erroneous, because when downloading the CSV, it has only a single location column.
I suspect if you change your CREATE OR REPLACE statement with the following statement that omits the creation of location_2, you'll get to where you want to go:
CREATE OR REPLACE TABLE ARRAYLOC(name varchar, location_type varchar, category varchar, notes varchar, status1 varchar, latitude number, longitude number, location variant);

Bulk Load Data Conversion Error - Can't Find Answer

For some reason I keep receiving the following error when trying to bulk insert a CSV file into SQL Express:
Bulk load data conversion error (type mismatch or invalid character for the
specified codepage) for row 2, column 75 (Delta_SM_RR).
Msg 4864, Level 16, State 1, Line 89
Bulk load data conversion error (type mismatch or invalid character for the
specified codepage) for row 3, column 75 (Delta_SM_RR).
Msg 4864, Level 16, State 1, Line 89
Bulk load data conversion error (type mismatch or invalid character for the
specified codepage) for row 4, column 75 (Delta_SM_RR).
... etc.
I have been attempting to insert this column as both decimal and numeric, and keep receiving this same error (if I take out this column, the same error appears for the subsequent column).
Please see below for an example of the data, all data points within this column contain decimals and are all rounded after the third decimal point:
Delta_SM_RR
168.64
146.17
95.07
79.85
60.52
61.03
-4.11
-59.57
1563.09
354.36
114.78
253.46
451.5
Any sort of help or advice would be greatly appreciated as it seems that a number of people of SO have come across this issue. Also, if anyone knows of another automated way to load a CSV into SSMS, that would be a great help as well.
Edits:
Create Table Example_Table
(
[Col_1] varchar(255),
[Col_2] numeric(10,5),
[Col_3] numeric(10,5),
[Col_4] numeric(10,5),
[Col_5] date,
[Delta_SM_RR] numeric(10,5),
)
GO
BULK INSERT
Example_Table
FROM 'C:\pathway\file.csv'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
FIRSTROW = 2
);
Table Schema - This is a standalone table (further calculations and additional tables are built off of this single table, however at the time of bulk insert it is the only table)
It's likely that your data has an error in it. That is, that there is a character or value that can't be converted explicitly to NUMERIC or DECIMAL. One way to check this and fix it is to
Change [Delta_SM_RR] numeric(10,5) to [Delta_SM_RR] nvarchar(256)
Run the bulk insert
Find your error row: select * from Example_Table where [Delta_SM_RR] like '%[^-.0-9]%'
Fix the data at the source, or delete from Example_Table where [Delta_SM_RR] like '%[^-.0-9]%'
The last statements returns/deletes rows where there is something other than a digit, period, or hyphen.
For your date column you can follow the same logic above, by changing the column to VARCHAR, and then find your error by using ISDATE() to find the ones which can't be converted.
I'll bet anything there is some weird character in your data set. Open your data set in Notepad++ and view the data. Any aberration should become apparent very quickly! The problem is coming from Col75 and it's affecting the first several rows, and thus everything that comes after that also fails to load.
Make sure that .csv is not using text qualifiers and that none of your fields in the .csv have a comma inside the desired value.
I am struggling with this issue right now. The issue is that I have a 68 column report I am trying to import.
Column 17 is a "Description" column that has a double quote text qualifier on top of the comma delimitation.
Bulk insert with a comma field terminator won't identify the double quote text qualifier and munge all of the data to the right of the offending column.
It looks like to overcome this, you need to create a .fmt file to instruct the Bulk Insert which columns it needs to treat as simple delimited, and which columns it needs to treat as delimited and qualified (see this answer).

Data flow task fails at source because of datetime conversion? SSIS 2008

Ive seen this type of error before (eg truncation) with strings before but not with datetime fields.
Have a Data flow task that seems to fail at the source. The OLEDB data source is a call to a procedure and among the columns of the resultset is a datetime field GAPPOSTDT. The return value is a datetime and the procedure returns just fine with the expected results. Not so when I run this through the data flow task. Looking at the advanced properties of the oledb source, I see the type for this field set to database timestamp [DT_DBTIMESTAMP] which seems right.
What could be causing this field not to get mapped?
Ive tried simply deleting the dataflow task and recreating it. Same issue.
See error message below.
Error: 0xC020901C at Data Flow Task, OLE DB Source [1]: There was an
error with output column "GAPOSTDT" (61) on output "OLE DB Source
Output" (11). The column status returned was: "The value could not be
converted because of a potential loss of data.". Error: 0xC0209029 at
Data Flow Task, OLE DB Source [1]: SSIS Error Code
DTS_E_INDUCEDTRANSFORMFAILUREONERROR. The "output column "GAPOSTDT"
(61)" failed because error code 0xC0209072 occurred, and the error row
disposition on "output column "GAPOSTDT" (61)" specifies failure on
error. An error occurred on the specified object of the specified
component. There may be error messages posted before this with more
information about the failure.
[UPDATE #1]
SSIS 2008
So I changed the procedure output to return varchar(10) instead of datetime. The OLEDB Source in the dataflow now errors with the following
Error: 0xC020901C at Data Flow Task, OLE DB Source [1]: There was an error with output column "GAPOSTDT" (61) on output "OLE DB Source Output" (11). The column status returned was: "Text was truncated or one or more characters had no match in the target code page.".
Error: 0xC020902A at Data Flow Task, OLE DB Source [1]: The "output column "GAPOSTDT" (61)" failed because truncation occurred, and the truncation row disposition on "output column "GAPOSTDT" (61)" specifies failure on truncation. A truncation error occurred on the specified object of the specified component.
Im now suspecting some "goofy" characters in the data. The collation being used in the source system is SQL_Latin1_General_CP1_CI_AS
[UPDATE #2]
Ok, I think I may have found the issue. In my procedure I have a "dummy" result because my procedure uses temp tables and this is one "work around" with SSIS (see other topics on this). My final result and the dummy result had their columns in the wrong order. So this put data into the wrong columns. I noticed this when I put on a data viewer. After it popped up, I noticed data into the wrong columns. Strange I thought, then after reviewing my procedure, I found the culprit.
There are always problems with datetime fields... ;)
Try to convert this field to string in the query on the source and then apply changes needed with Derived Column Transformation. Where I live we use date format dd.mm.yyyy, but I'm getting at least 3 other different formats of date...
Maybe this is not the best answer but it's worth to try... it suits me well... :)
That is because the data type of input column does not match the destination column well. You mentioned that you have seen that before fot the varchar column, that is because the size(data length) of input record for that column is smaller than the destination, if input is varchar(30), destination is varchar(20), that will cause this issue. Check your datatype, I am guessing there may have some conflicts when you are trying to do with datetime and datetime2, or datetimeoffset
for example:
input varchar(30) destination varchar(30) perfectly match
input varchar(20) destination varchar(30) fine but a warning
input varchar(30) destination varchar(20) if actual is smaller than 20, that is fine, but if bigger than 20, cause error

Handling embedded new lines when creating/selecting External Tables in SQL Data Warehouse

In SQL Data Warehouse (editors please don't change this, it is the actual name see: here) I have a JobCandidate_ext external table that looks like this.
CREATE EXTERNAL TABLE [HumanResources].[JobCandidate_ext](
[JobCandidateID] int,
[BusinessEntityID] int,
[Resume] Varchar(8000),
[ModifiedDate] Datetime
)
WITH (
LOCATION='/[HumanResources].[JobCandidate]/data.txt',
DATA_SOURCE=AzureStorage,
FILE_FORMAT=TextFile)
GO
The column [Resume] was an XML type in SQL Server but in SQL Data Warehouse XML types should be converted to varchar(8000) as described here.
I am using a flat file data.txt to export the data to a blob and then create an external table from it.
The [Resume] column has carriage returns in it (as expected from an XML file), and so when you run a SELECT * FROM [HumanResources].[JobCandidate_ext] you get an error. In this case:
Query aborted-- the maximum reject threshold (0 rows) was reached while reading from an external source: 1 rows rejected out of total 2 rows processed.
(/[HumanResources].[JobCandidate]/data.txt)Column ordinal: 0, Expected data type: INT, Offending value: some text .... (Column Conversion Error), Error: Error converting data type NVARCHAR to INT.
I know that I cannot configure a row delimiter when creating external tables as described here.
The row delimiter must be UTF-8 and supported by Hadoop’s LineRecordReader. The row delimiter must be either '\r', '\n', or '\r\n'. These are not user-configurable.
And if you try to put quotes on each column field you get this error while selecting rows from the external table: No closing string delimiter.
Query aborted-- the maximum reject threshold (0 rows) was reached while reading from an external source: 1 rows rejected out of total 1 rows processed.
(/[HumanResources].[JobCandidate]/data.txt)Column ordinal: 2, Expected data type: VARCHAR(8000) collate SQL_Latin1_General_CP1_CI_AS, Offending value: 'ShaiBassli (Tokenization failed), Error: No closing string delimiter.
Is there a way to get around this issue?
Today, PolyBase does not allow for row or field delimiters inside fields i.e. it does not allow you to escape these characters. As Greg pointed out, you can vote for this functionality here: https://feedback.azure.com/forums/307516-sql-data-warehouse/suggestions/10600132-polybase-allow-line-ends-within-qualified-text-f
To workaround this limitation, you can either pre-process the data (using sed or tr for example) to replace unwanted characters before reading it with PolyBase. Or you can switch to other polybase supported file formats RCFile/ORC/Parquet to avoid dealing with row and field delimiters completely.

Resources