unwanted characters from Oracle database and text files - sql-server

I have come across a SSIS loading issue where one particular column sometimes produces some unwanted character which i'm not aware of and the loading of data into SQL server fails. Now this data comes from a ORACLE database. Data is extracted from oracle database in a normal text format on Solaris platform. This is then brought across to a windows sql server platform for loading into SQL server db.
please find attached image showing the issue highlighted in yellow:
Is there any possibility of escaping this character from the oracle database when extracting or escaping during loading into SQL Server database via SSIS package?

Related

SSIS Oracle Source not pulling through Unicode characters correctly

Problem
I'm creating an SSIS package to move data from Oracle to SQL Server but am having trouble with certain Unicode characters.
Source Oracle database column is NVARCHAR2 and an example character is U+0102 but also applies to other characters. These will be migrated to an NVARCHAR column in SQL Server but the issue seems to be at the point of extraction as when I preview the source in SSIS the characters in question just show as inverted question marks e.g.
DEMO¿DEMO
Setup
I'm using the Attunity - Oracle Source task/connection as couldn't get the OLE DB connection working
Oracle Database has NLS_NCHAR_CHARACTERSET AL16UTF16
Things I've tried
Changing the DefaultCodePage value in Advanced settings of the Source task to 1252, 65001, 1200, 1201
Converting the source column in the SQL command text in various ways, E.G.: Convert(SOURCE_COLUMN,'AL32UTF8')
Using UTL_RAW.Convert_To_Raw in the SQL Command text. This generates the correct binary values (as DT_BYTES in SSIS), but I couldn't then transform it back into a DT_WSTR using either Data Conversion or Derived Column.
I've tested extracting the same characters from a SQL Server database and they are appearing correctly in the SSIS preview window just to rule out an SSIS issue.
I'm using the SQL Command access mode as per below:
SELECT SOURCE_COLUMN
FROM SOURCE_TABLE;
Any help greatly appreciated.

Oracle to SQLServer export

I have to move data from existing database oracle to which I don't have direct access. The data is about 11 tables, 5GB each. The database admin can export the tables to some .csv or xml. The problem with csv is that some data is textual with lots of special characters. The problem with xml is that the markup is an overhead which will increase significantly the size of the files. The DBA admin is not competent enough to provide a working and neat solution. He uses toad as the database tool. Can you provide some ideas how to perform such a migration in the best possible way?
Please refer the below steps to migrate the data from Oracle to SQL server.
Recommended Migration Process
To successfully migrate objects and data from Oracle databases to SQL Server, Azure SQL DB, or Azure SQL Data Warehouse, use the following process:
1.Create a new SSMA project.
2.After you create the project, you can set project conversion, migration, and type mapping options. For information about project settings, see Setting Project Options (OracleToSQL). For information about how to customize data type mappings, see Mapping Oracle and SQL Server Data Types (OracleToSQL).
3.Connect to the Oracle database server.
4.Connect to an instance of SQL Server.
5.Map Oracle database schemas to SQL Server database schemas.
6.Optionally, Create assessment reports to assess database objects for conversion and estimate the conversion time.
7.Convert Oracle database schemas into SQL Server schemas.
8.Load the converted database objects into SQL Server.
You can do this in one of the following ways:
* Save a script and run it in SQL Server.
* Synchronize the database objects.
9. Migrate data to SQL Server.
10.If necessary, update database applications.
For more details :
[https://learn.microsoft.com/en-us/sql/ssma/oracle/migrating-oracle-databases-to-sql-server-oracletosql?view=sql-server-2017]
After the admin export data into CSV, try to convert it into a character set which will recognize all special characters.
Then, try to follow the steps from this link: link, it might work.
If after the import, there are still special characters, thy to manually convert them.
Get the DBA to export the tables using the ASCII delimiters which were designed for this purpose:
Row delimiter: Decimal 30 / 0x1E
Column delimiter: Decimal 31 / 0x1F
Then you can use BCP (or any other similar product) to upload the data to SQL Server.

Can I replicate Informix 11.5 tables to SQL Server?

I am trying to find a way to, as close to real time as possible, have Informix 11.50.FC9GE database data available for SQL Server 2014 SSRS reports.
Right now, we have SSIS (Integration Packages) that are on a 4 hour schedule to go out to our 8 Informix databases via ODBC, gather all of their table data, and update tables on the SQL Server side.
So, table 'abc' exists on all 8 databases. All of that data is input into a single table on the SQL Server. As that data is gathered, an artificial column is created to say which database the data came from.
Select *, "250" as db from abc
This process takes about 1-2 hours to complete. If someone attempts to run a report during this time, they get skewed data.
My hope is to have all of the table data in the SQL Server, and only pass over changed data.
I was looking at SQL Server Replication, but it doesn't look like it can replicate from a Non-SQL Server database?
I also started looking at IBM InfoSphere Change Data Capture 6.5. I installed the Access Server and Management Console on a Windows server with one of my Informix databases.
I installed InfoSphere CDC Configuration Tool (Instance) on the SQL Server with the Database entries pointing to the Informix server, but when I try to start that Instance I get the error:
IBM InfoSphere Change Data Capture could not identify a supported default database encoding. The detected encoding is null. Please override the encoding with a supported IANA encoding name that matches or is very close to your default database encoding and restart IBM InfoSphere Change Data Capture. Use dmset command line utility to override the encoding.
I found this command to enter:
dmset -I instanceName database_default_character_encoding=UTF-8
But that gives an error:
C:\Program Files (x86)\IBM\InfoSphere Change Data Capture\Replication Engine for
IBM Informix Dynamic Server\bin>dmset -I vsqldev2014 database_default_character
_encoding=UTF-8
There is a problem with the IBM InfoSphere Change Data Capture service.
Frankly, I probably didn't set it up right, because there's hardly any instructions out there. :(
I did find a 3rd party software that appears to work, but they are quoting tens of thousands of dollars. No way, my company would go for that.
Any help/suggestions?

Access cannot filter on Unicode characters after tables migrated to SQL Server

I have moved MS Access 2010 Data to SQL using their tools, and now filtering by Unicode is not working in Access linked tables. I see the linked table column in SQL is "nvarchar" but in Access there is "Unicode compression" set to "No" and I can't change it.
It is my understanding that the "Unicode compression" setting only affects native Access (ACE/Jet) tables and has no effect on ODBC linked tables. Instead, what you likely need to do is change the "Collation" setting of the SQL Server database itself by using SQL Server Management Studio:
For example, with the above SQL Server collation setting ("SQL_Latin1_General_CP1_CI_AS") I cannot filter on Greek characters (e.g., 'γιορτή') from Access, but if I change the collation of the SQL Server database to "Greek_CI_AS" then that same Access filter will work.
Edit re: comments
While this solution will work for single-byte code pages that are natively supported by SQL Server (e.g., Greek, which would correspond to Windows-1253), it won't work for languages that lack those code pages and must be represented either by
a code page that is not supported by SQL Server, or
Unicode.
ODBC linked tables in Access apparently do not fully support Unicode, passing search strings to SQL Server as 'text', not as N'text', so SQL Server feels compelled to interpret any such text according to the selected single-byte code page (via the "Collation") setting.

Character set issues with Oracle Gateways, SQL Server, and Application Express

I am migrating data from a Oracle on VMS that accesses data on SQL Server using heterogeneous services (over ODBC) to Oracle on AIX accessing the SQL Server via Oracle Gateways (dg4msql). The Oracle VMS database used the WE8ISO8859P1 character set. The AIX database uses WE8MSWIN1252. The SQL Server database uses "Latin1-General, case-insensitive, accent-sensitive, kanatype-insensitive, width-insensitive for Unicode Data, SQL Server Sort Order 52 on Code Page 1252 for non-Unicode Data" according to sp_helpsort. The SQL Server databases uses nchar/nvarchar or all string columns.
In Application Express, extra characters are appearing in some cases, for example 123 shows up as %001%002%003. In sqlplus, things look ok but if I use Oracle functions like initcap, I see what appear as spaces between each letter of a string when I query the sql server database (using a database link). This did not occur under the old configuration.
I'm assuming the issue is that an nchar has extra bytes in it and the character set in Oracle can't convert it. It appears that the ODBC solution didn't support nchars so must have just cast them back to char and they showed up ok. I only need to view the sql server data so I'm open to any solution such as casting, but I haven't found anything that works.
Any ideas on how to deal with this? Should I be using a different character set in Oracle and if so, does that apply to all schemas since I only care about one of them.
Update: I think I can simplify this question. SQL Server table uses nchar. select dump(column) from table returns Typ=1 Len=6: 0,67,0,79,0,88 when the value is 'COX' whether I select from a remote link to sql server, cast the literal 'COX' to an nvarchar, or copy into an Oracle table as an nvarchar. But when I select the column itself it appears with extra spaces only when selecting from the remote sql server link. I can't understand why dump would return the same thing but not using dump would show different values. Any help is appreciated.
There is an incompatibility between Oracle Gateways and nchar on that particular version of SQL Server. The solution was to create views on the SQL Server side casting the nchars to varchars. Then I could select from the views via gateways and it handled the character sets correctly.
You might be interested in the Oracle NLS Lang FAQ

Resources