SSIS Import UTF-8 TSV file into SQL Server 2014 - sql-server

First time user/question:
I have numerous TSV files exported from a computer forensics application encoded as "Unicode (UTF-8)". I created a package using visual studio 2013 and have a flat file connection manager where my code page is 65001 (UTF-8) and advanced settings are all unicode string (dt_wstr). My OLE DB destination is hooked to a table with the same unicode settings, and the component properties have "always use default code page" = 65001 and set to true.
However, the package fails with the error: "The data type for "Flat File Source.Outputs[Flat File Source Output].Columns[MyCOLUMN]" is DT_NTEXT, which is not supported with ANSI files. Use DT_TEXT instead and convert the data to DT_NTEXT using the data conversion component."
I'm puzzled: How does this have anything to do with ANSI? The file was exported encoded as UTF-8. Now, as the error suggests, I can work around this by setting my connection manager advanced properties to Varchar (dt_str), then using a data conversion task to convert each and every field to dt_wstr, but that seems unnecessary.
Thank you.

Related

Cannot import long text from Excel to SQL Server using SSIS

Environment:
Microsoft® Excel® for Microsoft 365 MSO (Version 2112 Build 16.0.14729.20254) 64-bit
Microsoft SQL Server 2019 (RTM-CU14) (KB5007182) - 15.0.4188.2 (X64)
Microsoft Visual Studio 2019
Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\ClickToRun\REGISTRY\MACHINE\Software\Microsoft\Office\16.0\Access Connectivity Engine\Engines\Excel\TypeGuessRows set to 0
In Excel I formated the cell as "Text" and I also filled up the 2nd and 3rd rows with some super-long dummy text... (1st row is title row)
When I go to Excel Source/Advanced editor, I can set the Output to Unicode text stream [DT_NTEXT] or anything else but the External column cannot be changed to something else than Unicode string [DT_WSTR] (255 characters) despite the setting in registry that normally should allow it and despite having super-long strings in first two rows (other than the 1st row which holds the field names)
Then of course when I try to execute the SSIS task it throws a normal truncation error.
Question: What I am doing wrong or what else should be done here to actually be able to import the data? By the way, this is supposed to be automated at some point.
Since an Excel workbook is not a database, the OLE DB provider tries to detect the most relevant metadata from the Excel worksheet and read it as tabular data, which is mostly inaccurate when handling medium and large Excel files. After spending years on creating SSIS packages, I will convert the Excel file to a CSV file and import it using a Flat File Connection Manager instead. Or I will use a C# script to import the data.
I. Converting Excel to CSV
You can automate the process of converting Excel to CSV using a C# script:
Converting XLSX file using to a CSV file
Convert .xlsx & .xls to .csv
How to Convert Excel to CSV using Interop
After converting the Excel file to a CSV file, you can dynamically import it using a Flat File Connection Manager:
Dynamic Flat File Connections in SQL Server Integration Services
II. Using a C# script
It is good to check the following class, which is a part of the SchemaMapper project:
SchemaMapper - MsExcelImport.cs
Besides, a step-by-step guide on how to use this library can be found in the following link:
Import data from multiple files into one SQL table step by step guide
III. Editing the Excel connection string
If you don't have the choice to convert Excel to flat files, then you can force the Excel connection manager to ignore headers from the first row by adding IMEX=1 to tell the OLEDB provider to specify data types from the first row (which is the header - all string most of the time).
To edit the connectionstring property, click on the Excel Connection Manager and press on the F4 key. In the Properties Tab, you can edit the connectionstring property.
SSIS Excel Import Columns with More or Less than 255 Characters
IV. Changing columns length from advanced editor
Try changing the Excel Source column metadata from the advanced editor:
In SSIS excel datasource not taking more characters than 255
Importing Excel using SSIS may cause a headache! You can check the following question:
Workaround for exporting data to Excel with more than 255 columns
Dynamically Creating Excel table through SSIS
SQL Server Import Wizard doesn't support importing from excel sheet with more than 255 columns
Importing Excel Data Seems to Randomly Give Null Values
Failing to read String value from an excel column
Importing Excel Data Seems to Randomly Give Null Values
SSIS - Excel data shows as scientific notations and Null Values

Change the file encoding of the file which is created using SSIS Log provider for Text Files

I am new to SSIS, I have already designed a package and configured SSIS Log provider for Text Files.
This works fine and log files are generated successfully.
We have a monitoring team, they use this log file for monitoring. They are unable to read the log files since the file encoding is in Unicode format.
They are expecting a non unicode format for their monitoring.
I tried to change the existing log file encoding to ANSI but when I re-run the package my log file has been created again with UNICODE encoding.
Is any way we can create log files using SSIS Log provider for Text Files with non unicode encoding. Kindly suggest me any workaround. I am unable to find solution for the past two days.
Trying to figure out the issue
Since SSIS Log provider for Text Files use a File connection manager for logging purposes, you don't have the choice to edit the file encoding within the SSIS package because this type of connection manager can be used for different files format (excel, text ...).
While searching for this issue it looks like if the log is created for the first time by SSIS it will write unicode data.
why are my log files getting generated with a space between every two characters?
Why is my SSIS text logfile formatted in this way?
Possible workaround
Try to create an empty text file using notepad and save it with ANSI encoding.
Then select this file from the SSIS logging configuration.
Other helpful links
Change the default of encoding in Notepad
Add Logging with SSIS
Update 1 - Experiments
To test the workaround i provided i have run the following experiments:
I add SSIS Logging and created and a new log file
After executing the package the file is create in Unicode (to check that i opened the file using notepad and click Save As the encoding shown in the combobox is Unicode)
I create a new file using Notepad and save it using Ansi encoding as mentioned above.
In SSIS i changed the File connection manager to Use Existing instead of Create New and i selected the file i created
After executing the package the log is filled within the file and the encoding is still Ansi
I repeated executing the package several times and the undoing wont changes.
TL DR: Create a file with ANSI encoding outside the ssis package and within the package create a file connection manager, select Use Existing option and choose the created file. Use this file connection manager for logging purposes.

Getting error on creating a package in SSIS VS2017

I am using VS2017 and SSMS 2017 and imported a CSV file in FLAT FILE option in Data Flow tab in SSIS and also created anADO NET SOURCE database connection through my local server that stores the data in my database. But when am trying to create a connection between FLAT FILE and ADO NET SOURCE, by dragging the blue arrow from file option to ADO NET option, am getting this error. Can anybody provide some input on how to get rid of this error?
enter image description here
You are attempting to route the flat file source into another source component, which cannot be done. Did you mean to route the data into a destination?
Error is that you have connected Source to Source.
Fix it by connecting Source to Destination, so that the package can build successfully.
Replace your ADO.NET SOURCE with ADO.NET DESTINATION and this error will be corrected.

SSIS: Code page goes back to 65001

In an SSIS package that I'm writing, I have a CSV file as a source. On the Connection Manager General page, it has 65001 as the Code page (I was testing something). Unicode is not checked.
The columns map to a SQL Server destination table with varchar (among others) columns.
There's an error at the destination: The column "columnname" cannot be processed because more than one code page (65001 and 1252) are specified for it.
My SQL columns have to be varchar, not nvarchar due to other applications that use it.
On the Connection Manager General page I then change the Code page to 1252 (ANSI - Latin I) and OK out, but when I open it again it's back to 65001. It doesn't make a difference if (just for test) I check Unicode or not.
As a note, all this started happening after the CSV file and the SQL table had columns added and removed (users, you know.) Before that, I had no issues whatsoever. Yes, I refreshed the OLE DB destination in the Advanced Editor.
This is SQL Server 2012 and whichever version of BIDS and SSIS come with it.
If it is a CSV file column text stream [DT_TEXT] to SQL varchar(max) data type that you want to convert to, change the flat file Connection Manager Editor property Code page to 1252 (ANSI - Latin I).
65001 Code page = Unicode (UTF-8)
Based on this Microsoft article (Flat File Connection Manager):
Code page
Specify the code page for non-Unicode text.
Also
You can configure the Flat File connection manager in the following ways:
Specify the file, locale, and code page to use. The locale is used to interpret locale-sensitive data such as dates, and the code page is used to convert string data to Unicode.
So when the flat file has a Unicode encoding:
Unicode, UTF-8, UTF-16, UTF-32
Then this property cannot be changed, it will always return to it original encoding.
For more infor about the Code Page identifiers, you can refer to this article:
Code Page Identifiers
I solved this in SSIS through Derived Column Transformation
If it's a csv file, you can still use code page 1252 to process it. When you open the flat file connection manager it shows you the code page for the file, but you don't need to save that setting. If you have other changes to make in the connection manager, change the code page back to 1252 before you save the changes. It will process fine if there are no unicode characters in the file.
I was running into a similar challenge, which is how I ended up on this page looking for a solution. I resolved it using a different approach.
I opened the csv in Notepad++. One of the menu options is called Encoding. If you select that, it will give you the option to "Convert to ANSI."
I knew that my file did not contain any Unicode specific characters.
When I went back to the SSIS package, I edited the flat file connection and it automatically changed it to 1252.
In my case the file was generated in Excel and (mistakenly) saved as CSV UTF-8 (Comma delimited) (*.csv) instead of simply CSV (Comma delimited) (*.csv). Once I saved the file as the correct form of CSV, the code page no longer changed from 1252 (ANSI - Latin I).

SSIS - ANSI flatfile always saved as UTF-8 (w/o BOM)

I am facing an issue with SSIS where a customer wants a (previously delivered file in UTF-8) to be delivered in ANSI-1252. No big deal i thought. change the file connection manager and done... unfortunately it wasn't that simple. Been stuck on this for a day and clueless on what to try next.
the package itself
IN - OLE DB source with a query. Source database fields are NVARCHAR.
Next i have created a Data conversion block where i convert the incoming DT_WSTR to DT_STR using 1252 codepage.
After that is a outbound file connection destination. The flat file connection is tab delimited using codepage 1252. I have mapped the converted columns to the columns used in this flat file. Below are some screenshots of the connection manager and destination block
Now when i create a new txt file from explorer it will be ANSI (as detected by Notepad++)
When the package runs the file becomes UTF-8 w/o BOM
I have tried experimenting with the checkbox for overwriting as suggested in SSIS - Flat file always ANSI never UTF-8 encoded
as building the project from scratch and experimenting with the data conversion.
Does anyone have a suggestion on what I am missing here? The strange thing is we have a different package with exact the same blocks build previously and it does output an ANSI file (checked the package from top to bottom). However we are getting mixed results on different machines. Some machines will give an ANSI file other the UTF-8 file.
Is this solved already? My idea is to delete the whole Data Flow Task and re-create it. I suppose the metadata is stuck and overwritten at each execution.
I believe you need not to change anything in your ssis package just check your editor setting (notepad++). Go to settings --> Preferences --> new document setting
You need to uncheck the 'Apply to opened ANSI files' checkbox.
Kindly check and let me know if it works for you.

Resources