SAP Data Services .csv data file load from Excel with special characters - sql-server

I am trying to load data from an Excel .csv file to a flat file format to use as a datasource in a Data Services job data flow which then transfers the data to an SQL-Server (2012) database table.
I consistently lose 1 in 6 records.
I have tried various parameter values in the file format definition and settled on setting Adaptable file scheme to "Yes", file type "delimited", column delimeter "comma", row delimeter {windows new line}, Text delimeter ", language eng(English) and all else as defaults.
I have also set "write errors to file" to "yes" but it just creates an empty error file (I expected the 6,000 odd unloaded rows to be in here).
If we strip out three of the columns containing special characters (visible in XL) it loads a treat so I think these characters are the problem.
The thing is, we need the data in those columns and unfortunately, this .csv file is as good a data source as we are likely to get and it is always likely to contain special characters in these three columns so we need to be able to read it in if possible.
Should I try to specifically strip the columns in the Query source component of the dataflow? Am I missing a data-cleansing trick in the query or file format definition?

OK so didn't get the answer I was looking for but did get it to work by setting the "Row within Text String" parameter to "Row delimiter".

Related

SSIS Ignore Blank Lines

I get the following SSIS error message when my source file has blank lines at the end of the file. I don't care about the blank lines as they don't affect the overall goal of pumping data from a text file to a database table. I'd like to ignore this message or, if its easier, configure SSIS to ignore blanks.
<DTS:Column DTS:ID="96" DTS:IdentificationString="Flat File Source.Outputs[Flat File Source Error Output].Columns[Flat File Source Error Output Column]"/>
I found a similar question below, but the solution isn't an SSIS one, its one that preprocesses the text files which would be my least favorite solution.
SSIS Import Multiple Files Ignore blank lines
If you want to exclude records with blank values you can use the Conditional Split. Add it between you source file and your destination.
The expression can be like below :
ISNULL(Col1) && ISNULL(Col2) && ISNULL(Col3) ...
Name the output as Remove Blank Lines. When connecting your Conditional Split to your destination, SSIS will ask you what output the split component that needs to be returned. In this case chose the Conditional Split Default Output to get the entire records without the blank values.
You can enable Data Viewer before and after the conditional split to see the filtered output.

Changing .csv delimiter on ADF

I am trying to load a .csv table to MS SQL Server via Azure Data Factory, but I have a problem with the delimiter (;) since it appears as a character in some of the values included in some columns.
As a result, I get an error saying in the details "found more columns than expected column count".
Is there any way to change the delimiter directly on ADF before/while loading the .csv table (ex.: making it from ";" to "|||")?
Thanks in advance!
I have a problem with the delimiter (;) since it appears as a
character in some of the values included in some columns.
As you have quoted that your delimiter is ; but it is occurring as a character in some of the columns which means that there is no specific pattern of the occurrence. Hence, it is not possible in ADF.
The recommendation is to write a program using any preferred language (like python) which will iterate each row from the dataset and write a logic to replace the delimiter to ||| or you can also remove the unrequired ; and append the changes in new file. Later you can ingest this new file in ADF.

Can't import characters due to incorrect code page

I have an SSIS job to import data from a flat file into an SQL Server table. I'm having an issue regarding the encoding of the source file and destination table.
The file is an UTF8 encoded CSV file with some standard accented latin characters (ãóé, etc). My destination table is defined as having the Latin1_General_CI_AS Collation, which means I can manually insert the following text with no problem: "JOÃO ANTÓNIO".
When I declare the Flat File source, it automatically determines the file as having the 65001 code page (UTF-8), and infers the string [DT_STR] data type for each column. However, the SSIS package automatically assumes the destination table as having the 1252 Code Page, giving me the following error:
Validation error. <STEPNAME>: <STEPNAME>: The code page 65002 specified on output column "<MYCOLUMN>" (180) is not valid. Select a different code page for output column "<MYCOLUMN>".
I understand why, since the database collation is defined as having that Code Page. However, if I try to set the Flat File datasource as having the Latin1 1252 encoding, the SSIS executes but it imports characters incorrectly:
JOÃO ANTÓNIO (Flat File)-> JOAO ANTÓNIO (Database).
I have already tried to configure the flat file source as being unicode compliant, but then when after I configure each column as having a unicode compliant data type, i can't update the destination step since SSIS infers data types directly from the database and doesn't allow me to change them.
Is there a way to keep the flat file source as being CP 1252, but also importing the correct characters? What am I missing here?
Thanks to Larnu's comment i've been able to get around this problem.
Since SSIS doesn't allow implicit data conversion, I needed to set up a data conversion step first (Derived Column Transformation). Since the source columns were already set up as DTSTR[65002], i had to configure new derived columns form an expression, converting from the source code page into the destination code page, with the following expression:
(DT_STR, 50, 1252)<SourceColumn>
Where a direct cast to DT_STR is being made, stating the column will have a maximum size of 50 characters and the data will be represented with the 1252 code page.

Uploading excel file to sql server [duplicate]

Every time that I try to import an Excel file into SQL Server I'm getting a particular error. When I try to edit the mappings the default value for all numerical fields is float. None of the fields in my table have decimals in them and they aren't a money data type. They're only 8 digit numbers. However, since I don't want my primary key stored as a float when it's an int, how can I fix this? It gives me a truncation error of some sort, I'll post a screen cap if needed. Is this a common problem?
It should be noted that I cannot import Excel 2007 files (I think I've found the remedy to this), but even when I try to import .xls files every value that contains numerals is automatically imported as a float and when I try to change it I get an error.
http://imgur.com/4204g
SSIS doesn't implicitly convert data types, so you need to do it explicitly. The Excel connection manager can only handle a few data types and it tries to make a best guess based on the first few rows of the file. This is fully documented in the SSIS documentation.
You have several options:
Change your destination data type to float
Load to a 'staging' table with data type float using the Import Wizard and then INSERT into the real destination table using CAST or CONVERT to convert the data
Create an SSIS package and use the Data Conversion transformation to convert the data
You might also want to note the comments in the Import Wizard documentation about data type mappings.
Going off of what Derloopkat said, which still can fail on conversion (no offense Derloopkat) because Excel is terrible at this:
Paste from excel into Notepad and save as normal (.txt file).
From within excel, open said .txt file.
Select next as it is obviously tab delimited.
Select "none" for text qualifier, then next again.
Select the first row, hold shift, select the last row, and select the text radial button. Click Finish
It will open, check it to make sure it's accurate and then save as an excel file.
There is a workaround.
Import excel sheet with numbers as float (default).
After importing, Goto Table-Design
Change DataType of the column from Float to Int or Bigint
Save Changes
Change DataType of the column from Bigint to any Text Type (Varchar, nvarchar, text, ntext etc)
Save Changes.
That's it.
When Excel finds mixed data types in same column it guesses what is the right format for the column (the majority of the values determines the type of the column) and dismisses all other values by inserting NULLs. But Excel does it far badly (e.g. if a column is considered text and Excel finds a number then decides that the number is a mistake and insert a NULL instead, or if some cells containing numbers are "text" formatted, one may get NULL values into an integer column of the database).
Solution:
Create a new excel sheet with the name of the columns in the first row
Format the columns as text
Paste the rows without format (use CVS format or copy/paste in Notepad to get only text)
Note that formatting the columns on an existing Excel sheet is not enough.
There seems to be a really easy solution when dealing with data type issues.
Basically, at the end of Excel connection string, add ;IMEX=1;"
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\\YOURSERVER\shared\Client Projects\FOLDER\Data\FILE.xls;Extended Properties="EXCEL 8.0;HDR=YES;IMEX=1";
This will resolve data type issues such as columns where values are mixed with text and numbers.
To get to connection property, right click on Excel connection manager below control flow and hit properties. It'll be to the right under solution explorer. Hope that helps.
To avoid float type field in a simple way:
Open your excel sheet..
Insert blank row after header row and type (any text) in all cells.
Mouse Right-Click on the head of the columns that cause a float issue and select (Format Cells), then choose the category (Text) and press OK.
And then export the excel sheet to your SQL server.
This simple way worked with me.
A workaround to consider in a pinch:
save a copy of the excel file, modify the column to format type 'text'
copy the column values and paste to a text editor, save the file (call it tmp.txt).
modify the data in the text file to start and end with a character so that the SQL Server import mechanism will recognize as text. If you have a fancy editor, use included tools. I use awk in cygwin on my windows laptop. For example, I start end end the column value with a single quote, like "$ awk '{print "\x27"$1"\x27"}' ./tmp.txt > ./tmp2.txt"
copy and paste the data from tmp2.txt over top of the necessary column in the excel file, and save the excel file
run the sql server import for your modified excel file... be sure to double check the data type chosen by the importer is not numeric... if it is, repeat the above steps with a different set of characters
The data in the database will have the quotes once the import is done... you can update the data later on to remove the quotes, or use the "replace" function in your read query, such as "replace([dbo].[MyTable].[MyColumn], '''', '')"

Why am I getting 0 rows processed in SSIS?

I am creating a SSIS package using MS Visual Studio 2012 Shell with .Net framework of 4.6.01055. The SSIS package has a Data Flow task with Flat File Source, Data Source Row count, Final Data Set count and OleDb destination tasks. It connects to a SQL Database and I have checked to see that my connection has been tested.
I have a flatfile connection manager which picks up a text file. On the Preview section it only shows the header columns in the flat file connection manager editor. The error message is only at warning level with the following message: [Flat File Source [10]] Warning: The end of the data file was reached while reading header rows. Make sure the header row delimiter and the number of header rows to skip are correct. The file itself has a total of 19 rows with the first being the header row.
I have spaces in the header names of the origin file. So on that file I edited to have no spaces on the final column. That did not cure the issue. The last column is a date but I am designating as OutputColumnWidth of 50 and datatype as string[DT_STR]. I have the Row delimiter as {CR}{LF}. I have the column delimiter as {|}. When run the package file name does not change.
In the General section of the editor under locale = English; Unicode is not checked; Code Page = 1252 (ANSI-Latin1); Format = Delimited; Text qualifier = none; Header row delimeter = {CR}{LF} (I have tried just CR or LF as well); Header rows to skip=0 (I have tried 1 as well since there is only one header row); and I have checked Column Names if the first data row.
Why am I not getting data in my preview section? And why is it thinking I only have a header?
It seems to me that your text file does not have a matching EOL marker, and so SSIS never splits the lines (and treats the file as just having one big header).
Try opening the file in a text editor that lets you see the EOL marker. I know that NotePad++ can do this for you.
NotePad++ will also let you change the file's encoding as well, in case that is also a problem.
NB: The problem could also be that you are not specifying a correct column delimiter. If the delimiter you specify in SSIS doesn't match characters in the file, then SSIS will also think that you have a single header row where everything is in the first column.
Just to add to the other answer:
I had the same problem, when i opened the file in notepad, it became clear that there was a trailing empty line at the bottom.
So: make sure the last line of the file actually contains text.

Resources