SSIS Oracle Source not pulling through Unicode characters correctly - sql-server

Problem
I'm creating an SSIS package to move data from Oracle to SQL Server but am having trouble with certain Unicode characters.
Source Oracle database column is NVARCHAR2 and an example character is U+0102 but also applies to other characters. These will be migrated to an NVARCHAR column in SQL Server but the issue seems to be at the point of extraction as when I preview the source in SSIS the characters in question just show as inverted question marks e.g.
DEMO¿DEMO
Setup
I'm using the Attunity - Oracle Source task/connection as couldn't get the OLE DB connection working
Oracle Database has NLS_NCHAR_CHARACTERSET AL16UTF16
Things I've tried
Changing the DefaultCodePage value in Advanced settings of the Source task to 1252, 65001, 1200, 1201
Converting the source column in the SQL command text in various ways, E.G.: Convert(SOURCE_COLUMN,'AL32UTF8')
Using UTL_RAW.Convert_To_Raw in the SQL Command text. This generates the correct binary values (as DT_BYTES in SSIS), but I couldn't then transform it back into a DT_WSTR using either Data Conversion or Derived Column.
I've tested extracting the same characters from a SQL Server database and they are appearing correctly in the SSIS preview window just to rule out an SSIS issue.
I'm using the SQL Command access mode as per below:
SELECT SOURCE_COLUMN
FROM SOURCE_TABLE;
Any help greatly appreciated.

Related

How to keep trailing space from source to destination using SSIS

Environment
Source: Oracle database via OLE DB
Destination: SQL Server 2019 via OLE DB
Tools: SSIS Visual Studio 2019
Problem
The source has value with space (e.g. 12345 ) but loaded into target database space is gone (e.g. 12345)
I want to keep all spaces in source data and input the same into the target table but cannot find the configuration or any way to keep those spaces.
The problem is data type on target table.
Oracle uses VARCHAR2 and CHAR datatypes, and SSIS define them to Unicode string.
The first time, I declared a column for character with VARCHAR in SQL Server but found extra space missing after ETL.
I have changed data type in SQL Server from VARCHAR to NVARCHAR, that can resolve my problem.

How to change DT_STR to DT_WSTR by default in SSIS for Oracle source

We have an SSIS package on our Virtual Machine(assume this as VM1) where we are pulling data from Oracle source. The data type in Oracle for the column is Varchar2 and here in SSIS it's pulling as DT_WSTR data type and storing the data as NVarchar column.
When I open the same package from different Virtual Machine(assume this as VM2), the SSIS package is pulling as DT_STR data type and the package is failing due to conversion error in the validation phase of SSIS package. I'm also getting a warning which is pasted below when I click on columns in Data Flow Task of Oracle source SSIS package.
Warning - Cannot retrieve the column code page info from the OLE DB
provider. If the component supports the "DefaultCodePage" property,
the code page from that property will be used. Change the value of
the property if the current string code page values are incorrect. If
the component does not support the property, the code page from the
component's locale ID will be used.
We have Oracle Java(JDK) and Oracle client installed on both VM1 and VM2.
The OS on our VMs is Windows 7 and SSIS packages are of Visual Studio 2013 on both VMs.
I have had to deal with similar datatype issues between Oracle and SSIS. And with SSIS being so finicky about datatypes, I had to find a solution to implement on the Oracle side.
Before I explain my answer, I should mention that I use the Attunity Connectors for Oracle from Microsoft. I highly recommend using these connectors over the default connects Microsoft and Oracle provide.
So, with that said, I have found two techniques that seem to work to pull data over in the correct encoding. SSIS is really bad at reading and translating metadata from the Oracle system, but explicitly CASTing to a VARCHAR2, even if the column is already a VARCHAR2, seems to be enough of a hint that SSIS knows that column will be a DT_STR type. In all of my Oracle Source tasks, I use a SQL Command rather than just choosing the table (it's a best practice), and that allows me to add in the CAST to the query. For a VARCHAR2 column, I'd do something like this:
SELECT CAST("PO Number" AS VARCHAR2(30)) AS "PONumber" FROM TABLE1
This will usually be enough. But sometimes it won't be, because Oracle allows for some weird characters in a VARCHAR2 column. If you see the error [Oracle Source [2345]] Error: OCI error encountered. ORA-29275: partial multibyte character even after explicity CASTing your column to VARCHAR2, this is due to a code page mismatch. To correct it you can CONVERT the character encoding of the string like this:
SELECT CONVERT("PO Number",'AL32UTF8','WE8MSWIN1252') AS "PONumber" FROM TABLE1
AL32UTF8 is the default (Unicode) encoding that Oracle uses, and WE8MSWIN1252 is the default (ASCII 1252) encoding used by Windows systems.

unwanted characters from Oracle database and text files

I have come across a SSIS loading issue where one particular column sometimes produces some unwanted character which i'm not aware of and the loading of data into SQL server fails. Now this data comes from a ORACLE database. Data is extracted from oracle database in a normal text format on Solaris platform. This is then brought across to a windows sql server platform for loading into SQL server db.
please find attached image showing the issue highlighted in yellow:
Is there any possibility of escaping this character from the oracle database when extracting or escaping during loading into SQL Server database via SSIS package?

CSV - SSIS - SQL Server, Character Encoding Problems

I have a large set of CSV files which I am transferring to Microsoft SQL Server 2014 Management Studio. I am using a SSIS package in Microsoft Visual Studio 2012 to achieve this. I currently have around 2 million lines of data so I need SSIS
The problem I have is that while the data in my CSV already has encoding issues I am making them far worse in transit.
For now what I need to do is just maintain the characters so that whatever I see in my CSV appears in my SQL server table. I am particularly interested in 'Benoît' which is fine in my CSV but not in my SQL table where it becomes 'BenoŒt' Please see the list at the bottom of my post.
I am also wondering if once I have imported the data in its current state I can reliably use find and replace to deal with the existing encoding issues from my CSV's
Character encoding is a confusing subject and I am not all sure if a couple of paragraphs can put me on the right track. Please help with a steer (if its possible!) I am sure I need to look at the settings across both SQL server and Visual studio but am not sure what needs a tweaking or where..
"Benoît" in my CSV becomes "BenoŒt" in my SQL table
"Angélique" in my CSV becomes "AngÇŸ¶¸lique" in my SQL table
"Michèle" in my CSV becomes "MichÇŸ¶ùle" in my SQL table
"josée" in my CSV becomes "josÇŸ¶¸e" in my SQL table
"Amélie" in my CSV becomes "AmÇŸ¶¸lie" in my SQL table
First of all make sure that your CSV files are in the Unicode encoding (try to open CSV file in the Notepad-> Save As -> and check Encoding in the bottom). If they are not - save them in Unicode
And make sure that in the Flat File Source properties inside SSIS packege the Unicode check box is selected
Your csv file needs to be in UCS-2 Little Endian encoding.
Please open it in notepad++ to check.

Why does ADO NET Source Show incorrect string data types by default?

I'm new to SSIS and I created a simple Data Flow task. The ADO NET Source shows all my varchar columns in ADO NET Source Output -> Output columns as Unicode string [DT_WSTR]. This is incorrect. The table in Sql Server uses only varchar columns. I tried to add a Data Conversion transformation, but I still get errors about converting unicode strings. Why is SSIS reading my table schema incorrectly?
Thanks!
DT_WSTR is an SSIS datatype which I think SSIS covert's to if working on it. Though I don't know why it is in this case.
My recommendation would be to try using the OLEDB source and destination - they seem to have much more power and flexibility in SSIS.

Resources