PostgreSQL: unable to save special character (regional language) in blob - database

I am using PostgreSQL 9.0 and am trying to store a bytea file which contains certain special characters (regional language characters - UTF8 encoded). But I am not able to store the data as input by the user.
For example :
what I get in request while debugging:
<sp_first_name_gu name="sp_first_name_gu" value="ઍયેઍ"></sp_first_name_gu><sp_first_name name="sp_first_name" value="aaa"></sp_first_name>
This is what is stored in DB:
<sp_first_name_gu name="sp_first_name_gu" value="\340\252\215\340\252\257\340\253\207\340\252\215"></sp_first_name_gu><sp_first_name name="sp_first_name" value="aaa"></sp_first_name>
Note the difference in value tag. With this issue I am not able to retrieve the proper text input by the user.
Please suggest what do I need to do?
PS: My DB is UTF8 encoded.

The value is stored correctly, but is escaped into octal escape sequences upon retrieval.
To fix that - change the settings of the DB driver or chose different different encoding/escaping for bytea.
Or just use proper field types for the XML data - like varchar or XML.

Your string \340\252\215\340\252\257\340\253\207\340\252\215 is exactly ઍયેઍ in octal encoding, so postgres stores your data correctly. PostgreSQL escapes all non printable characters, for more details see postgresql documentation, especially section 8.4.2

Related

PDO DBLIB multibyte (chinese) character encoding - SQL server

On a Linux machine, I am using PDO DBLIB to connect to an MSSQL database and insert data in a SQL_Latin1_General_CP1_CI_AS table. The problem is that when I am trying to insert chinese characters (multibyte) they are inserted as 哈市香åŠåŒºç æ±Ÿè·¯å·.
My (part of) code is as follows:
$DBH = new PDO("dblib:host=$myServer;dbname=$myDB;", $myUser, $myPass);
$query = "
INSERT INTO UserSignUpInfo
(FirstName)
VALUES
(:firstname)";
$STH = $DBH->prepare($query);
$STH->bindParam(':firstname', $firstname);
What I've tried so far:
Doing mb_convert_encoding to UTF-16LE on $firstname and CAST as VARBINARY in the query like:
$firstname = mb_convert_encoding($firstname, 'UTF-16LE', 'UTF-8');
VALUES
(CAST(:firstname AS VARBINARY));
Which results in inserting the characters properly, until there are some not-multibyte characters, which break the PDO execute.
Setting my connection as utf8:
$DBH = new PDO("dblib:host=$myServer;dbname=$myDB;charset=UTF-8;", $myUser, $myPass);
$DBH->exec('SET CHARACTER SET utf8');
$DBH->query("SET NAMES utf8");
Setting client charset to UTF-8 in my freetds.conf
Which had no impact.
Is there any way at all, to insert multibyte data in that SQL database? Is there any other workaround? I've thought of trying PDO ODBC or even mssql, but thought it's better to ask here before wasting any more time.
Thanks in advance.
EDIT:
I ended up using MSSQL and the N data type prefix. I will swap for and try PDO_ODBC when I have more time. Thanks everyone for the answers!
Is there any way at all, to insert multibyte data in [this particular] SQL
database? Is there any other workaround?
If you can switch to PDO_ODBC, Microsoft provides free SQL Server ODBC drivers for Linux (only for 64-bit Red Hat Enterprise Linux, and 64-bit SUSE Linux Enterprise) which support Unicode.
If you can change to PDO_ODBC, then the N-prefix for inserting Unicode is going to work.
If you can change the affected table from SQL_Latin1_General_CP1_CI_AS to UTF-8 (which is the default for MSSQL), then that would be ideal.
Your case is more restricted. This solution is suited for the case when you have mixed multibyte and non-multibyte characters in your input string, and you need to save them to a Latin table, and the N data type prefix isn't working, and you don't want to change away from PDO DBLIB (because Microsoft's Unicode PDO_ODBC is barely supported on linux). Here is one workaround.
Conditionally encode the input string as base64. After all, that's how we can safely transport pictures in line with emails.
Working Example:
$DBH = new PDO("dblib:host=$myServer;dbname=$myDB;", $myUser, $myPass);
$query = "
INSERT INTO [StackOverflow].[dbo].[UserSignUpInfo]
([FirstName])
VALUES
(:firstname)";
$STH = $DBH->prepare($query);
$firstname = "输入中国文字!Okay!";
/* First, check if this string has any Unicode at all */
if (strlen($firstname) != strlen(utf8_decode($firstname))) {
/* If so, change the string to base64. */
$firstname = base64_encode($firstname);
}
$STH->bindParam(':firstname', $firstname);
$STH->execute();
Then to go backwards, you can test for base64 strings, and decode only them without damaging your existing entries, like so:
while ($row = $STH->fetch()) {
$entry = $row[0];
if (base64_encode(base64_decode($entry , true)) === $entry) {
/* Decoding and re-encoding a true base64 string results in the original entry */
print_r(base64_decode($entry) . PHP_EOL);
} else {
/* Previous entries not encoded will fall through gracefully */
print_r($entry . PHP_EOL);
}
}
Entries will be saved like this:
Guan Tianlang
5pys6Kqe44KS5a2maGVsbG8=
But you can easily convert them back to:
Guan Tianlang
输入中国文字!Okay!
Collation shouldn't matter here.
Double-byte characters need to be stored in nvarchar, nchar, or ntext fields. You don't need to perform any casting.
The n data type prefix stands for National, and it causes SQL Server to store text as Unicode (UTF-16).
Edit:
PDO_DBLIB does not support Unicode, and is now deprecated.
If you can switch to PDO_ODBC, Microsoft provides free SQL Server ODBC drivers for Linux which support Unicode.
Microsoft - SQL Server ODBC Driver Documentation
Blog - Installing and Using the Microsoft SQL Server ODBC Driver for Linux
You can use Unicode compatible data-type for the table column for supporting foreign languages(exceptions are shown in EDIT 2).
(char, varchar, text) Versus (nchar, nvarchar, ntext)
Non-Unicode :
Best suited for US English: "One problem with data types that use 1 byte to encode each character is that the data type can only represent 256 different characters. This forces multiple encoding specifications (or code pages) for different alphabets such as European alphabets, which are relatively small. It is also impossible to handle systems such as the Japanese Kanji or Korean Hangul alphabets that have thousands of characters
Unicode
Best suited for systems that need to support at least one foreign language: "The Unicode specification defines a single encoding scheme for most characters widely used in businesses around the world. All computers consistently translate the bit patterns in Unicode data into characters using the single Unicode specification. This ensures that the same bit pattern is always converted to the same character on all computers. Data can be freely transferred from one database or computer to another without concern that the receiving system will translate the bit patterns into characters incorrectly.
Example :
Also i have tried one example you can view its screens below,it would be helpful for issues relating the foreign language insertions as the question is right now.The column as seen below in nvarchar and it do support the Chinese language
EDIT 1:
Another related issue is discussed here
EDIT 2 :
Unicode unsupported scripts are shown here
just use nvarchar, ntext, nChar and when you want to insert then
use
INSERT INTO UserSignUpInfo
(FirstName)
VALUES
(N'firstname');
N will refer to Unicode charactor and it is standard world wide.
Ref :
https://aalamrangi.wordpress.com/2012/05/13/storing-and-retrieving-non-english-unicode-characters-hindi-czech-arabic-etc-in-sql-server/
https://technet.microsoft.com/en-us/library/ms191200(v=sql.105).aspx
https://irfansworld.wordpress.com/2011/01/25/what-is-unicode-and-non-unicode-data-formats/
This link Explain of chinese character in MYSQL. Can't insert Chinese character into MySQL .
You have to create table table_name () CHARACTER SET = utf8;
Use UTF-8 when you insert to table
set username utf8; INSERT INTO table_name (ABC,VAL);
abd create Database in CHARACTER SET utf8 COLLATE utf8_general_ci;
then You can insert in chinese character in table

How to read Arabic characters from varchar datatype?

I have an old system that uses varchar datatype in its database to store Arabic names, now the names appear in the database like this:
"ãíÓÇÁ ÇáãÈíÖíä"
Now I am building a new system using VB.NET, how can I read these names to appear in Arabic characters?
Also I need to point out here that the old system even it stores the data as I mentioned earlier it converts the characters in a correct format.
How to display it properly in the new system and in the SQL Server Management Studio?
have you tried nvarchar? you may find some usefull information at the link below
When must we use NVARCHAR/NCHAR instead of VARCHAR/CHAR in SQL Server?
I faced the same Problem, and I solved it by two steps:
1.change the datatype of the column in DB into nvarchar
2.use the encoding to change the data into Arabic
I used the following function
private string GetDataWithArabic(string srcData)
{
Encoding iso = Encoding.GetEncoding("iso-8859-1");
Encoding unicode = Encoding.Default;
byte[] unicodeBytes = iso.GetBytes(srcData);
return unicode.GetString(unicodeBytes);
}
but make sure you use this method once on DB data, because it will corrupt the data if used twice
I think your answer is here: "storing and retrieving non english characters" http://aalamrangi.wordpress.com/2012/05/13/storing-and-retrieving-non-english-unicode-characters-hindi-czech-arabic-etc-in-sql-server/

SQL Server Bulk Import With Format File of UTF-8 Data

I have been referring to the following page:
http://msdn.microsoft.com/en-us/library/ms178129.aspx
I simply want to bulk import some data from a file that has Unicode characters. I have tried encoding the actual data file in UC-2, UTF-8, etc but nothing works. I have also modified the format file to use SQLNCHAR, but still it doesn't work and gives error:
Bulk load data conversion error (truncation) for row 1, column 1
I think it has to do with this statement from the above link:
For a format file to work with a Unicode character data file, all the
input fields must be Unicode text strings (that is, either fixed-size
or character-terminated Unicode strings).
What exactly does this mean? I thought this means every character string needs to be a fixed 2 bytes, which encoding the file in UCS-2 should handle???
This blog post was really helpful and solved my problem:
http://blogs.msdn.com/b/joaol/archive/2008/11/27/bulk-insert-using-unicode-data-files.aspx
Something else to note - a Java class was generating the data file. In order for the above solution to work, the data file needed to be encoded in UTF-16LE, which can be set in the constructor of OutputStreamWriter (for example).
In SQL Server 2012 I imported a .csv file saved with Notepad++ enconded in UCS-2 with special spanish characters

insert special character in my sql server database

ANSWER :
Sorry about the this sort of question guys, I assumed that it wouldn't work if I directly enter the special character into my string in query but it does. so all you need to do is locate the special character, copy it and paste it into your query and it works :)
folks,
QUESTION CHANGED:
I want to enter a ascii character in the database which is the standard trademark symbol (®) using a direct query and have it read correctly ! how can i do this ?
PREVIOUS QUESTION:
how can i enter a special character in SQL Server in varchar column... ® (there is also a line below this symbol which I am unable to paste here) so that it is read correctly.
Also, I am unable to find the character sequence for that symbol any places where I can look for ?
The symbol is standard ® symbol which hangs on the top and there is a line below it just like an underscore.
Thanks
EDIT 1: I am talking about a direct query to the database.
You can use this T-SQL query:
INSERT INTO dbo.YourTable(UnicodeCol)
VALUES(nchar(0x00AE))
® is the Unicode character with code 0x00AE
But of course - since this is a Unicode character, the column you're inserting into must be of type NVARCHAR (not VARCHAR)
You can convert it to Unicode NCR format before you store to database, or just encode it with related functions of the language you are using , like JavaScript's encodeuricomponent, PHP's urlencode.
You can use 'N' ahead of data.
This query might be helpful to you.
insert into product_master(product_name) values(N'कंप्यूटर')

Accessing varchar field containing extended ASCII characters using ADO

I'm using ADO to connect to an SQL 6.5 Server and extract data from a column storing text data (field type returns as adLongVarChar).
The column data was updated from an old legacy DOS system and contains a few extended ASCII characters like 0xFB (square-root glyph in Code Page 437).
The problem is when I read the Field's Value property the 0xFB is rendered as a "v" character (0x76) which I guess is the nearest match from a square-root glyph into standard 7-bit ASCII.
I have tried using an ADO Stream object to access the field with a charset of "x-ansi" but I'm still receiving the "v" character instead of the 0xFB character. It looks like the "v" is set in the field before I can access it.
Can anyone suggest how I might get the proper character using ADO or is there some other property I need to modify to tell the SQL/ADO connection to leave the encoding alone and stop being "helpful"?
Thanks
Found the answer - I needed to add an "Auto Translate=0;" property to the connection string

Resources