CSV - SSIS - SQL Server, Character Encoding Problems - sql-server

I have a large set of CSV files which I am transferring to Microsoft SQL Server 2014 Management Studio. I am using a SSIS package in Microsoft Visual Studio 2012 to achieve this. I currently have around 2 million lines of data so I need SSIS
The problem I have is that while the data in my CSV already has encoding issues I am making them far worse in transit.
For now what I need to do is just maintain the characters so that whatever I see in my CSV appears in my SQL server table. I am particularly interested in 'Benoît' which is fine in my CSV but not in my SQL table where it becomes 'BenoŒt' Please see the list at the bottom of my post.
I am also wondering if once I have imported the data in its current state I can reliably use find and replace to deal with the existing encoding issues from my CSV's
Character encoding is a confusing subject and I am not all sure if a couple of paragraphs can put me on the right track. Please help with a steer (if its possible!) I am sure I need to look at the settings across both SQL server and Visual studio but am not sure what needs a tweaking or where..
"Benoît" in my CSV becomes "BenoŒt" in my SQL table
"Angélique" in my CSV becomes "AngÇŸ¶¸lique" in my SQL table
"Michèle" in my CSV becomes "MichÇŸ¶ùle" in my SQL table
"josée" in my CSV becomes "josÇŸ¶¸e" in my SQL table
"Amélie" in my CSV becomes "AmÇŸ¶¸lie" in my SQL table

First of all make sure that your CSV files are in the Unicode encoding (try to open CSV file in the Notepad-> Save As -> and check Encoding in the bottom). If they are not - save them in Unicode
And make sure that in the Flat File Source properties inside SSIS packege the Unicode check box is selected

Your csv file needs to be in UCS-2 Little Endian encoding.
Please open it in notepad++ to check.

Related

SSIS Oracle Source not pulling through Unicode characters correctly

Problem
I'm creating an SSIS package to move data from Oracle to SQL Server but am having trouble with certain Unicode characters.
Source Oracle database column is NVARCHAR2 and an example character is U+0102 but also applies to other characters. These will be migrated to an NVARCHAR column in SQL Server but the issue seems to be at the point of extraction as when I preview the source in SSIS the characters in question just show as inverted question marks e.g.
DEMO¿DEMO
Setup
I'm using the Attunity - Oracle Source task/connection as couldn't get the OLE DB connection working
Oracle Database has NLS_NCHAR_CHARACTERSET AL16UTF16
Things I've tried
Changing the DefaultCodePage value in Advanced settings of the Source task to 1252, 65001, 1200, 1201
Converting the source column in the SQL command text in various ways, E.G.: Convert(SOURCE_COLUMN,'AL32UTF8')
Using UTL_RAW.Convert_To_Raw in the SQL Command text. This generates the correct binary values (as DT_BYTES in SSIS), but I couldn't then transform it back into a DT_WSTR using either Data Conversion or Derived Column.
I've tested extracting the same characters from a SQL Server database and they are appearing correctly in the SSIS preview window just to rule out an SSIS issue.
I'm using the SQL Command access mode as per below:
SELECT SOURCE_COLUMN
FROM SOURCE_TABLE;
Any help greatly appreciated.

unwanted characters from Oracle database and text files

I have come across a SSIS loading issue where one particular column sometimes produces some unwanted character which i'm not aware of and the loading of data into SQL server fails. Now this data comes from a ORACLE database. Data is extracted from oracle database in a normal text format on Solaris platform. This is then brought across to a windows sql server platform for loading into SQL server db.
please find attached image showing the issue highlighted in yellow:
Is there any possibility of escaping this character from the oracle database when extracting or escaping during loading into SQL Server database via SSIS package?

SQL Server Management Studio Issues

I accidentally deleted YEARS of data in SQL Server Management Studio from a table. Unfortunately the person in this position before me didn't back anything up. Nor did I before I tried to fix an issue. I understand that it cannot be retrieved from SQL but I have all the data I need in a separate file on my desktop. Is their anyway to get that data and input it back into the table that is in SQL? Or is there a query I can run to input the data again into the table? I'm not sure if I am making any sense :/
You can also used Management Studio without SSIS. Right click on the database in MS and select Tasks -> Import Data. You should then be able to select the type of source (flat file) and the format. The rest of the wizard is pretty self-explanatory.
If it is a flat file like .txt or .csv or even an Excel file like(.xls), you can build an SSIS package and dump the data to a new table. Depends, on what kind of data you have in your hand.

Is it possible to compress data in SQL Server using zlib?

Is it possible to compress data (programmatically) in SQL Server using zlib compression? It's just one column in a specific table that needs compressing.
I'm not a SQL Server person myself - I've tried to find if there's anything available but not had much joy.
Thanks

How do you import UTF-8 flat files into SQL Server 2008 R2?

I have a bunch of UTF-8 encoded flat files that need to be imported into a SQL Server 2008 R2 database. Bulk inserts are not able to identify the diameters nor seems to accept UTF-8.
I understand that there is a number of articles on how SQL Server 2008 deals with UTF-8 encoding, but I'm sort of looking for any updated answers as most of those articles are old.
Is there anything I can to do in order to get these flat files into the database either by converting them before an insert or a process to run during the insert?
I want to stay away from manually converting each one. Furthermore, SSIS packages that I've attempted to create can read and separate the data. It just can't move the data it seems. :(
The flat files are generated by Java. Converting the java environment from UTF-8 to any other encoding has been unsuccessful.
NOTE
I have no intention of storing UTF-8 data. My delimiter is coming in funky because it's UTF-8. SQL Server cannot read the characters when separating the columns and rows. That's it.
Not true, you simply need to choose code page 65001
convert your data file to UTF-16 Little Endian (exactly Little Endian)
use bcp with -w option.
Just for reference, if someone google it, and falls here like me.
I've tried the accepted answer a dozen times, with no success. In my case, my data file was a .csv flat file, which had a lot of accents characters/letters, like ç é ã á.
I also noted that no matter what encoding I choose, the import was made using the 1251 (ANSI - Latin 1) encoding.
So, the solution was convert before import, my .csv file from UTF-8 to the very same 1251 (ANSI - Latin 1) encoding. I did the conversion using Notepad++.
After converting it, did the regular import (through SSMS Tasks -> "Import Data" wizard), selecting the 1251 (ANSI - Latin 1) encoding, and everything was imported correctly.
Environment:
SQL Server Web 2016
SQL Server Management Studio v17.9.1
Notepad++ v7.7.1
Also, this answers too the original OP's question:
Is there anything I can to do in order to get these flat files into the database either by converting them before an insert or a process to run during the insert?
Because it didn't work at first I want to add to Arthur's answer, as mentioned in the comments by live-love:
You should change the string data types to NVARCHAR.
You do can do this by selecting Unicode string(DT_WSTR) from the Advanced tab and the specified columns.
Microsoft has always been crap regarding encoding, especially in SQL Server. Here is your solution.

Resources