XML column in SSIS has byte-order-mark - sql-server

I'm using an oledb data source in an SSIS package to pull a column from a database. The column is XML data type. In SSIS, it is automatically recognized as data type DT_NTEXT. It's going to a script component where I'm trying to load it into a System.Xml.XmlDocument. This is the code that I'm using to get the xml data into a string:
System.Text.Encoding.Default.GetString(Row.Data.GetBlobData(0, Row.Data.Length))
Is this the correct way?
One odd thing that I'm seeing is that on one server, I get a byte-order-mark in the resulting string, and another server I don't. I wouldn't mind knowing why that is the case, but my real desire is how to get this string without the BOM.
Help me, Stack Overflow, you're my only hope...

This is the only way I was able to get it to work:
System.Text.UnicodeEncoding.Unicode.GetString(...).Trim()
The .Trim() removes the BOM. I'm not sure if this is the "right" way, but it's the only thing that's worked so far.

Related

Importing Excel Data Seems to Randomly Give Null Values

Using SSIS for Visual Studio 2017 for some excel file imports.
I've created a package with several loop containers that call to specific packages to handle some files. I have an issue with one particular package being executed in that it seemingly randomly decides the data for columns is NULL per excel file. I was/am under the impression that this is part of the registry setting for TypeGuessRows (changed initially to 0 then to 1000 as a test) located at
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\14.0\Access Connectivity Engine\Engines\Excel
The reason I think this is because the various files being brought in generally have the same data, but it seems that if the first few rows of columns in the source data contains only numbers, that the data with mixed values will not be brought in correctly. All other columns aside from this seems fine.
Looking at the source files, all have the same datatype.
I've tried changing the registry TypeGuessRows value and ensured that the output column property was string-based instead of numerical.
The connection string has IMEX=1
So I fixed it. Or at least found a sufficient workaround that should help anyone in my situation. I think it has to do with the cache of SSIS.
I ended up putting a sort function on the problem column so the records getting read as NULL for having a random data type are read first, and not being considered random. I will say, I tried this initially and it didn't work.
Through a little experiment of making a new data flow in the same package I discovered that this solution actually does work, hence me thinking the cache was the issue.
If anyone has any further questions on this, let me know.
This issue is related to the OLEDB provider used to read excel files: Since excel is not a database where each column has a specific data type, OLEDB provider tries to identify the dominant data types found in each column and replace all other data types that cannot be parsed with NULLs.
There are many articles found online discussing this issue and giving several workarounds (links listed below).
But after using SSIS for years, i can say that best practice is to convert excel files to csv files and read them using Flat File components.
Or, if you don't have the choice to convert excel to flat files then you can force excel connection manager to ignore headers from the first row bu adding HDR=NO to the connection string and adding IMEX=1 to tell the OLEDB provider to specify data types from the first row (which is the header - all string most of the time), in this case all columns are imported as string and no values are replaced with NULLs but you will lose the headers and a additional row (header row is imported).
If you cannot ignore the header row, just add a dummy row that contains dummy string values (example: aaa) after the header row and add IMEX=1 to the connection string.
Helpful links
SSIS Excel Data Import - Mixed data type in Rows
Mixed data types in Excel column
Importing data from Excel having Mixed Data Types in a column (SSIS)
Why SSIS always gets Excel data types wrong, and how to fix it!
EXCEL IN SSIS: FIXING THE WRONG DATA TYPES
IMEX= 1 extended properties in ssis

Make Kingswaysoft truncate input data that is too long

I have an SSIS project that I'm using to automate pulling CRM data into a SQL Server Database using Kingswaysoft. These SSIS packages are autogenerated, so my solution to this issue needs to be compatible with that.
The description field on Contact in CRM is a nvarchar(2000), but this CRM org still has old data, and some of those old contact records have a description longer than 2000 characters. When I try to pull those using Kingsway, I get this error:
Error: 0xC002F304 at Stage Data for contact, Export contact Data [2]: An error occurred with the following error message: "The input value for 'description' field (or one of its related fields) does not fit into the output buffer, please consider increasing the output column's Length property or changing its data type to one that can accommodate more data such as ntext (DT_NTEXT). This change can be done using the component's Advanced Editor window.".
This makes sense, since I'm pulling a column longer than specified in the metadata, but the problem is that I want to ignore this error, truncate the column, and continue the data load. Obviously I could set the column to DT_NTEXT and not worry about it, but since these packages are autogenerated I have no way of knowing beforehand which columns have old data and which don't, so I won't know which should be DT_NTEXT.
So is there a way to make Kingswaysoft truncate input data which is longer than what's specified in the metadata?
Thank you for choosing KingswaySoft as your integration solution. For this situation, unfortunately there is no way to make that work without making those changes in the component’s Advanced Editor.
If the source component just simply ignores the error and truncates the value, you will lose some of your data and thus affect the data integrity during the integration. Therefore, you may need to change the data type to DT_NTEXT or increase the length of this field in order to handle this situation properly. Alternatively, you can try to change the field length on your CRM side so that the SSIS package can be generated correctly.

Read/write a large text/comments cell (6k+ chars) in Excel from/to SQL Server

I am trying to alter an existing Excel 2010 workbook so that the data is hosted on MSSQL since the data is getting to large and the workbook too slow. I'm using ADO. My issue here is I don't know how best to handle the cells that contain a large amount of comments which also include carriage returns/line feeds and possible other special characters. The largest cell contains approx 6000 characters so far. I don't really expect the text within a given cell to get much bigger than that.
Questions: -
What data type should I use within SQL Server to store this data? I'm concerned about special characters like carriage returns.
What is the best method to transfer data back and forth from Excel and MSSQL? I could probably use a hidden ListObject to read the data, but I'm more concerned about writing any edits back. The cell length is too long for a SQL string and I don't know how to handle the carriage returns. I keep getting an Application Object error. I don't have any problem with most cells and their data, just these large text cells that represent comment descriptions.
I don't know how to handle the initial large data dump into MSSQL. The SQL Server Import Wizard keeps failing stating there are characters not within the assigned code page. There is no indication of the row it failed on or what characters are causing the issue. Is that down to the data type I've chosen? It is currently Varchar hence my first question. Should I just use Text or NText? Won't they make the database massive? SSIS uses the import wizard so that will still fail. Anything that requires a SQL statement such as ADO or OPENROWSET won't like the length of the data unless I'm missing something.
Any suggestions/help would be much appreciated.

SSIS Adds Date to time field upon exporting to excel

I am trying to run an SSIS program to take some logging data and export it into Excel for later use with a BI tool. The data has 3 time fields, a start time, finish time, and run time. They appear to be correct coming out of my script component. As it looks perfect when I use the dataviewer tool
However when I go into Excel I get this type of format.
On Run Time column
I am not sure what is causing this or how to fix it. The only thing that I was able to notice was a property in the sources advanced editor set the column data type to date.
The Property
But every time I try to change it to type DB_TIME (same type as coming out of script) it just switches back to the date data type.
Is there a way to prevent the adding of this date? It makes the use of the BI tool impossible. Any help would be greatly appreciated.
That seems like odd behavior to me, but try adding a Data Conversion Transformation to your package. This should force whatever type of data you want, either string or time.
Have you tried
DT_WSTR(1252)
to cast the time using Data Conversion Transformation?
I found the issue. It had something to do with the Excel connection manager auto recognizing that field as a date time field, and therefor it exported it in that format. This change was happening in the connection between the final component and the destination, so casting did not work as it happened after that.
I simply changed the xls file to a csv and used the flat file manager and that did the trick!

Mass convert all non-unicode fields to unicode in SSIS

I have quite a few tables and I'm using SSIS to bring the data from Oracle to SQL Server, in the process I'd like to convert all varchar fields to nvarchar. I know I can use the Data Conversion transformer but it seems the only way to do this is to set each field one by one, then I'll have to manually set the mapping in the destination component to map to the "Copy of" field. I've got thousands of fields and it would be tedious to set it on each one... is there a way to say "if field is DT_STR convert to DT_WSTR"?
what you can do is, instead of replacing varchar with nvarchar manually before running the script is copy and save all the create table scripts generated by SSIS to a document. Then you can do a global replace nvarchar x varchar in the document.
Use then the amended script as a step in your SSIS package to create the tables before populating them with the data from Oracle.
The proper way is to use the data conversion step...
That said, it appears if you disable external meta data validation in SSIS, you can bypass this error. SQL will then use an implicit conversion to the destination type.
See this SO post for a quick explanation.

Resources