Special character issue in slowly changing dimension? - sql-server

I am using Slowly Changing Dimension task from SSIS 2008 for delta load. Flat file is the input to slowly changing dimension task. I have observed that '--' character from file is converted into ' â€' after delta load.
Input is the flat file and destination is the database table. Flat file contains few strings having '--' character but somehow after inserting this data to table this character is getting converted to 'â€'.
What can be the issue?
Kindly help me to resolve this issue.
Regards,
Sameer K.

In essence you need to scrub these characters from the data. This can be done in several places, but it's a well accepted design pattern to populate from the source file to a staging table where you can scrub the offending characters before bringing it into your slowly changing dimension. It's also possible to scrub the file prior to import, but it's typically easier to work with the data once it's in a database rather than in a flat file. You could also include a derived column task within SSIS to extract these characters one in the SSIS Pipeline, but you would need to manage this column by column which can become difficult to maintain.

Related

Can SSIS support loading of files with varying column lengths in each row?

Currently I receive a daily file of around 750k rows and each row has a 3 character identifier at the start.
For each identifier, the number of columns can change but are specific to the identifier (e.g. SRH will always have 6 columns, AAA will always have 10 and so on).
I would like to be able to automate this file into an SQL table through SSIS.
This solution is currently built in MSACCESS using VBA just looping through recordsets using a CASE statement, it then writes a record to the relevant table.
I have been reading up on BULK INSERT, BCP (w/Format File) and Conditional Split in SSIS however I always seem to get stuck at the first hurdle of even loading the file in as SSIS errors due to variable column layouts.
The data file is pipe delimited and looks similar to the below.
AAA|20180910|POOL|OPER|X|C
SRH|TRANS|TAB|BARKING|FORM|C|1.026
BHP|1
*BPI|10|16|18|Z
BHP|2
*BPI|18|21|24|A
(* I have added the * to show that these are child records of the parent record, in this case BHP can have multiple BPI records underneath it)
I would like to be able to load the TXT file into a staging table, and then I can write the TSQL to loop through the records and parse them to their relevant tables (AAA - tblAAA, SRH - tblSRH...)
I think you should read each row as one column of type DT_WSTR and length = 4000 then you need to implement the same logic written using vba within a Script component (VB.NET / C#), there are similar posts that can give you some insights:
SSIS ragged file not recognized CRLF
SSIS reading LF as terminator when its set as CRLF
How to load mixed record type fixed width file? And also file contain two header
SSIS Flat File - CSV formatting not working for multi-line fileds
how to skip a bad row in ssis flat file source

SSIS redirect empty rows as flat file source read errors

I'm struggling to find a built-in way to redirect empty rows as flat file source read errors in SSIS (without resorting to a custom script task).
as an example, you could have a source file with an empty row in the middle of it:
DATE,CURRENCY_NAME
2017-13-04,"US Dollar"
2017-11-04,"Pound Sterling"
2017-11-04,"Aus Dollar"
and your column types defined as:
DATE: database time [DT_DBTIME]
CURRENCY_NAME: string [DT_STR]
with all that, package still runs and takes the empty row all the way to destination where it, naturally fails. I was to be able to catch it early and identify as a source read failure. Is it possible w/o a script task? A simple derived column perhaps but I would prefer if this could be configured at the Connection Manager / Flat File Source level.
The only way to not rely on a script task is to define your source flat file with only one varchar(max) column, chose a delimiter that is never used within and write all the content into a SQL Server staging table. You can then clean those empty lines and parse the rest to a relational output using SQL.
This approach is not very clean and a takes lot more effort than using a script task to dump empty lines or ones not matching a pattern. It isn't that hard to create a transformation with the script component
This being said, my advise is to document a clear interface description and distribute it to all clients using your interface. Handle all files that throw an error while reading the flat file and send a mail with the file to the responsible client with information that it doesn't follow the interface rules and needs to be fixed.
Just imagine the flat file is manually generated, even worse using something like excel, you will struggle with wrong file encoding, missing columns, non ascii characters, wrong date format etc.
You will be working on handling all exceptions caused by quality issues.
Just add a Conditional Split component, and use the following expression to split rows
[DATE] == ""
And connect the default output connector to the destination
References
Conditional Split Transformation

How can I make SSIS detect the record length when importing a flat file?

I wrote an SSIS package which imports data from a fixed record length flat file into a SQL table. Within a single file, the record length is constant, but different files may have different record lengths. Each record ends with a CR/LF. How can I make it detect where the end of the record is, and use that length when importing it?
You can use a script task. Pass a ReadWriteVariable into the script task. Let's call the ReadWriteVariable intLineLength. In the script task code, detect the location of the CR/LF and write it to intLineLength. Use the intLineLength ReadWriteVariable in following package steps to import the data.
Here is an article with some good examples: script-task-to-dynamically-build-package-variables
Hope this helps.
This may not work for everyone, but what I ended up doing was simply setting it to a delimited flat file and setting CR/LF as the row delimiter, and leaving the column delimiter and text qualifier blank. This won't work if you actually need to have it split out columns in the import, but I was already using a Derived Column task to do the actual column splitting, because my column positions are variable, so it worked fine.

What is a better alternative to Excel for loading data to a SQL Server database?

I have a huge amount of trouble loading spreadsheets into a SQL Server database.
Currently, I'm using an SSIS package to load the data and I have had to make lots of adjustments to get the data to load:
All numbers must be formatted as text (otherwise they don't load properly).
Sometimes numbers must be preceded with single quote (') to get them to load.
If a column has a mix of number cells and text cells, the text cells must come first in the file (otherwise only numbers load and text comes in as NULL).
If a user changes a column name the file will not load.
If a user changes a tab name the file won't load.
If a user adds a new column (even at the end of a sheet) the file won't load.
Extra sheets in the file is not a problem, thankfully!
Dates seem sensitive whether or not they will load properly.
Connection strings to the Excel file must include "IMEX=1" or things are worse.
Scheduled SSIS jobs must be run as 32-bit even on 64-bit system.
I've been loading the data (usually 200,000-500,000 rows per file) into a table with all fields defined as nvarchar. Then, when loaded I transfer that data in the next step of the SSIS package to the working table with typed data fields.
All of the requirements that I must put on the user for how to format the Excel file is really a pain. We usually have to send the file back multiple times until all the formatting issues are correct before the file will load. I'd like to eliminate this thrash.
I know I'm not the only one that is facing this type of problem. So, I must ask...
What is a better alternative to Excel for loading data into a SQL Server database?
Or, am I going about this the wrong way? Should I be using something other than SSIS to load Excel spreadsheets?
You can try OpenRowSet:
SELECT *
INTO SomeTable
From OpenRowSet('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database=\\servername\c$\filename.xls;HDR=YES;IMEX=1', [Sheet2$])
Not really a SQL answer, but an easy one:
You could require the users to copy and paste data to an excel spreadsheet where everything but the data fields to be included are locked. This will prevent many of the pain points described.

How to import variable record length CSV file using SSIS?

Has anyone been able to get a variable record length text file (CSV) into SQL Server via SSIS?
I have tried time and again to get a CSV file into a SQL Server table, using SSIS, where the input file has varying record lengths. For this question, the two different record lengths are 63 and 326 bytes. All record lengths will be imported into the same 326 byte width table.
There are over 1 million records to import.
I have no control of the creation of the import file.
I must use SSIS.
I have confirmed with MS that this has been reported as a bug.
I have tried several workarounds. Most have been where I try to write custom code to intercept the record and I cant seem to get that to work as I want.
I had a similar problem, and used custom code (Script Task), and a Script Component under the Data Flow tab.
I have a Flat File Source feeding into a Script Component. Inside there I use code to manipulate the incomming data and fix it up for the destination.
My issue was the provider was using '000000' as no date available, and another coloumn had a padding/trim issue.
You should have no problem importing this file. Just make sure when you create the Flat File connection manager, select Delimited format, then set SSIS column length to maximum file column length so it can accomodate any data.
It appears like you are using Fixed width format, which is not correct for CSV files (since you have variable length column), or maybe you've incorrectly set the column delimiter.
Same issue. In my case, the target CSV file has header & footer records with formats completely different than the body of the file; the header/footer are used to validate completeness of file processing (date/times, record counts, amount totals - "checksum" by any other name ...). This is a common format for files from "mainframe" environments, and though I haven't started on it yet, I expect to have to use scripting to strip off the header/footer, save the rest as a new file, process the new file, and then do the validation. Can't exactly expect MS to have that out-of-the box (but it sure would be nice, wouldn't it?).
You can write a script task using C# to iterate through each line and pad it with the proper amount of commas to pad the data out. This assumes, of course, that all of the data aligns with the proper columns.
I.e. as you read each record, you can "count" the number of commas. Then, just append X number of commas to the end of the record until it has the correct number of commas.
Excel has an issue that causes this kind of file to be created when converting to CSV.
If you can do this "by hand" the best way to solve this is to open the file in Excel, create a column at the "end" of the record, and fill it all the way down with 1s or some other character.
Nasty, but can be a quick solution.
If you don't have the ability to do this, you can do the same thing programmatically as described above.
Why can't you just import it as a test file and set the column delimeter to "," and the row delimeter to CRLF?

Resources