I'm using ADO to connect to an SQL 6.5 Server and extract data from a column storing text data (field type returns as adLongVarChar).
The column data was updated from an old legacy DOS system and contains a few extended ASCII characters like 0xFB (square-root glyph in Code Page 437).
The problem is when I read the Field's Value property the 0xFB is rendered as a "v" character (0x76) which I guess is the nearest match from a square-root glyph into standard 7-bit ASCII.
I have tried using an ADO Stream object to access the field with a charset of "x-ansi" but I'm still receiving the "v" character instead of the 0xFB character. It looks like the "v" is set in the field before I can access it.
Can anyone suggest how I might get the proper character using ADO or is there some other property I need to modify to tell the SQL/ADO connection to leave the encoding alone and stop being "helpful"?
Thanks
Found the answer - I needed to add an "Auto Translate=0;" property to the connection string
Related
I want to read data from text file (.csv), truncate one of the column to 1000 characters and push into SQL table using SSIS Package.
The input (DT_TEXT) is of length 11,000 characters but my Challenge is ...
SSIS can convert to (DT_STR) only if Max length is 8,000 characters.
String operations cannot be performed on Stream (DT_TEXT data type)
Got a workaround/solution now;
I truncate the text in Flat File Source and selected the option to Ignore the Error;
Please share if you find a better solution!
FYI:
To help anyone else that finds this, I applied a similar concept more generally in a data flow when consuming a text stream [DT_TEXT] in a Derived Column Transformation task to transform it to [DT_WSTR] type to my defined length. This more easily calls out the conversion taking place.
Expression: (DT_WSTR,1000)(DT_STR,1000,1252)myLargeTextColumn
Data Type: Unicode string [DT_WSTR]
Length: 1000
*I used 1252 codepage since my DT_TEXT is UTF-8 encoded.
For this Derived Column, I also set the TruncationRowDisposition to RD_IgnmoreFailure in the Advanced Editor (or can be done in the Configure Error Output, setting Truncation to "Ignore failure")
(I'd post images but apparently I need to boost my rep)
Any idea why the Turkish Lira symbol is replaced by a question mark when I insert it in a table in the database. See the image below
This is not a font issue. This is a Unicode (UTF-16) vs 8-bit Code Page character set issue (i.e. NVARCHAR vs VARCHAR). The character you are trying to use does not exist in the particular Code Page indicated by the default Collation of the DB in which you are executing this query. The Code Page used by the DB's default Collation is relevant here since your string literal is not prefixed with an upper-case "N". If it was, then the string would be interpreted as being Unicode and no conversion would take place. But since you are passing in a non-Unicode string, it will be forced into the current DB's default Collation's Code Page as the query is parsed. Any characters not available in that Code Page, and not having a Best-fit mapping, get turned into "?".
You can run the following to see for yourself:
SELECT '₺';
PRINT '₺';
It both prints AND displays in the results grid as ?
If you want to see what character SQL Server thinks it is, run the following:
SELECT ASCII('₺');
And it will return: 63
If you want to see what character has an ASCII value of 63, run this:
SELECT CHAR(63);
And it will return: ?
Now run this:
SELECT N'₺';
PRINT N'₺';
This will both print and display in the results grid correctly.
To see what character value the symbol really is, run the following:
SELECT UNICODE(N'₺'), UNICODE('₺');
This will return: 8378 and 63
But isn't 63 the question mark? Yes. That is because not prefixing the string literal '₺' with a capital "N" tells SQL Server that it is VARCHAR and so it gets translated to the default unknown character.
Now, if you were to execute this VARCHAR version in a DB that had a Collation tied to a Code Page that had this character, then it would work even when not prefixing the string literal with an upper-case "N". However, at the moment, I cannot find any Code Page used within SQL Server that supports this character. So, it might be a Unicode-only character, at least at far as SQL Server is concerned.
The way to fix this is:
Change the datatype of the field to NVARCHAR (I see in a comment on the question that the field is currently VARCHAR). If the field is VARCHAR then even if you use the N prefix on the string, the character will still get stored as ?, unless the Code Page specified by the Collation of the column supports this character, but again, I think this might be a Unicode-only character.
Change your INSERT statement to prefix the string field with a capital "N": (73, 4, N'(3) ₺'). Even if you change the field to NVARCHAR, if you don't prefix the string with N then SQL Server will translate the character to ? first and then insert the ?. This is because the query gets parsed before it gets executed, and parsing (for non-Unicode string literals and variables) is done in the Code Page of the DB's default Collation
Probably for the same reason my browser isn't displaying it in the title for this question: It isn't in the application's character set (or maybe not supported by the font).
In this case, my browser shows some numbers in a box (denoting the character code).
SQL-server is translating it to a known character instead.
Ensure you're storing it in a field that supports the character in it's character set (I think UTF-8 is sufficient)
I am using PostgreSQL 9.0 and am trying to store a bytea file which contains certain special characters (regional language characters - UTF8 encoded). But I am not able to store the data as input by the user.
For example :
what I get in request while debugging:
<sp_first_name_gu name="sp_first_name_gu" value="ઍયેઍ"></sp_first_name_gu><sp_first_name name="sp_first_name" value="aaa"></sp_first_name>
This is what is stored in DB:
<sp_first_name_gu name="sp_first_name_gu" value="\340\252\215\340\252\257\340\253\207\340\252\215"></sp_first_name_gu><sp_first_name name="sp_first_name" value="aaa"></sp_first_name>
Note the difference in value tag. With this issue I am not able to retrieve the proper text input by the user.
Please suggest what do I need to do?
PS: My DB is UTF8 encoded.
The value is stored correctly, but is escaped into octal escape sequences upon retrieval.
To fix that - change the settings of the DB driver or chose different different encoding/escaping for bytea.
Or just use proper field types for the XML data - like varchar or XML.
Your string \340\252\215\340\252\257\340\253\207\340\252\215 is exactly ઍયેઍ in octal encoding, so postgres stores your data correctly. PostgreSQL escapes all non printable characters, for more details see postgresql documentation, especially section 8.4.2
ANSWER :
Sorry about the this sort of question guys, I assumed that it wouldn't work if I directly enter the special character into my string in query but it does. so all you need to do is locate the special character, copy it and paste it into your query and it works :)
folks,
QUESTION CHANGED:
I want to enter a ascii character in the database which is the standard trademark symbol (®) using a direct query and have it read correctly ! how can i do this ?
PREVIOUS QUESTION:
how can i enter a special character in SQL Server in varchar column... ® (there is also a line below this symbol which I am unable to paste here) so that it is read correctly.
Also, I am unable to find the character sequence for that symbol any places where I can look for ?
The symbol is standard ® symbol which hangs on the top and there is a line below it just like an underscore.
Thanks
EDIT 1: I am talking about a direct query to the database.
You can use this T-SQL query:
INSERT INTO dbo.YourTable(UnicodeCol)
VALUES(nchar(0x00AE))
® is the Unicode character with code 0x00AE
But of course - since this is a Unicode character, the column you're inserting into must be of type NVARCHAR (not VARCHAR)
You can convert it to Unicode NCR format before you store to database, or just encode it with related functions of the language you are using , like JavaScript's encodeuricomponent, PHP's urlencode.
You can use 'N' ahead of data.
This query might be helpful to you.
insert into product_master(product_name) values(N'कंप्यूटर')
My database field is set to utf8_general_ci and my websites encoding is utf8.
The £ symbol is coming up as a black diamond with a question mark through the center.
I tried changing it to £ in the database and it just outputted £
I tried a string replace:
$row['Information'] = str_replace("£", "£", $row['Information']);
Nothing seems to work, any ideas?
I tried changing it to £ in the database
Don't. The database should contain raw text, never HTML-encoded content. The time to HTML-encode (using htmlspecialchars()) is when you insert some raw text into HTML at the output templating stage, and not before. Even if you got this to work, you'd only have fixed one character; the other 107025 non-ASCII characters would still break.
Clearly there is a mismatch of encodings here; you must ensure you use the same encoding (preferably UTF-8) everywhere, in particular:
the encoding you've saved the PHP file in, if it contains any non-ASCII characters;
the charset declared on the output page (by Content-Type <meta> or header(), preferably both; if you only use a <meta> to set it and the server is incorrectly configured it may set its own charset overriding yours);
the encoding of the column in the database (each column has its own collation, so just setting it on the table is ineffective);
the encoding used by PHP to talk to MySQL. This should be set using mysql_set_charset.
Unfortunately, none of these settings default to UTF-8.
Before communicating with your database, you need to send the query :
SET NAMES 'UTF-8'
It tells the database to use utf8 encoding for all queries on this connection.