Alright
so we had a problem recently
In reporting services some of the String Columns were appearing as gibberish Chinese characters.
On further investigation we found it is the hyphen. Well that's what we though first. On further investigation we found it a dash (or en dash) . Basically the reason this has happened is people copy pasting values into this column from word which converts hyphens into dashes automatically.
But if you look at the database they both look the same. Though on the application side you can see the difference.
How do I replace the dash with a normal hyphen.
If you copy the value in put it in SQL server. A hyphen is gray while a dash is black
but they both look exactly the same (i.e not bigger or smaller). Problem is I can't write a REPLACE script then (they are the freakin same)
REPLACE ('-' with '-')
is there a way special characters like the dash can be identified in SQL server?
SQL Server v 2005
You can use NCHAR(8211) for the en dash, or NCHAR(8212) for em dash.
Related
I am importing some Excel spreadsheets into a MS SQL Server. I load the spreadsheets, cleanse the data and then export it to SQL using Alteryx. Some files have text columns where the cells span multiple lines (i.e. with new line characters, like when you press ALT + ENTER in Excel). When I export the tables to SQL and then query the table, I see lots of '_x000D_' which are not in the original file.
Is it some kind of newline character encoding? How do I get rid of it?
I haven't been able to replicate the error. The original file contains some letters with accents (à á etc); I created multi-line spreadsheets with accented letters, but I managed to export these to SQL just fine, with no 'x000D'.
If these were CSV files I would think of character encoding, but Excel spreadsheets? Any ideas? Thanks!
I know this is old, but: if you're using Alteryx, just run it through the "Data Cleansing" tool as the last thing prior to your export to SQL. For the field in question, tell the tool to remove new lines by checking the appropriate checkbox.
If that still doesn't work... 0x000D is basically ASCII 13; (Hex "D" = Int 13)... so try running your data through a regular Formula tool, and for the [field] in question, just use the expression Replace([field],CharFromInt(13),""), which should remove that character by replacing it with the empty string.
This worked for me:
REGEX_REPLACE([field],"_x000D_","")
After importing some data into a Sql 2014 database, I realized that there are some fields in which the data replaced German characters such as (ü, ß, ä, ö, etc) with some weird characters. Ex.
München should be München
ChiemgaustraÃe should be Chiemgaustraße
Königstr should be Königstr
I would like to replace these characters with the right German letter. Ex.
ü -> ü
à - > ß
ö -> ö
However when I run queries like the following to try to identify which rows have these characters, the queries returns 0 rows.
select address
from Directory
where street like N'%ChiemgaustraÃe 50%'
select address
from Directory
where street like N'%ü%'
Is there a query I can run to identify and replace these characters?
I must clarify that most of the data was imported correctly, in fact I believe the strange characters were already part of the original data.
Also, I think I can export the data to a text file, replace the characters and re-import, but I was wondering if there is a way to do it directly in sql.
Thanks in advance for the help.
I couldn't get it fix using only sql.
FutbolFa suggestion worked for the most part but there were a couple of symbols, in particular "Ã" that wasn't picked up by any query a tried. I ended up exporting the data to a text file and replacing the symbols there. Then I just re-imported the info.
Any idea why the Turkish Lira symbol is replaced by a question mark when I insert it in a table in the database. See the image below
This is not a font issue. This is a Unicode (UTF-16) vs 8-bit Code Page character set issue (i.e. NVARCHAR vs VARCHAR). The character you are trying to use does not exist in the particular Code Page indicated by the default Collation of the DB in which you are executing this query. The Code Page used by the DB's default Collation is relevant here since your string literal is not prefixed with an upper-case "N". If it was, then the string would be interpreted as being Unicode and no conversion would take place. But since you are passing in a non-Unicode string, it will be forced into the current DB's default Collation's Code Page as the query is parsed. Any characters not available in that Code Page, and not having a Best-fit mapping, get turned into "?".
You can run the following to see for yourself:
SELECT '₺';
PRINT '₺';
It both prints AND displays in the results grid as ?
If you want to see what character SQL Server thinks it is, run the following:
SELECT ASCII('₺');
And it will return: 63
If you want to see what character has an ASCII value of 63, run this:
SELECT CHAR(63);
And it will return: ?
Now run this:
SELECT N'₺';
PRINT N'₺';
This will both print and display in the results grid correctly.
To see what character value the symbol really is, run the following:
SELECT UNICODE(N'₺'), UNICODE('₺');
This will return: 8378 and 63
But isn't 63 the question mark? Yes. That is because not prefixing the string literal '₺' with a capital "N" tells SQL Server that it is VARCHAR and so it gets translated to the default unknown character.
Now, if you were to execute this VARCHAR version in a DB that had a Collation tied to a Code Page that had this character, then it would work even when not prefixing the string literal with an upper-case "N". However, at the moment, I cannot find any Code Page used within SQL Server that supports this character. So, it might be a Unicode-only character, at least at far as SQL Server is concerned.
The way to fix this is:
Change the datatype of the field to NVARCHAR (I see in a comment on the question that the field is currently VARCHAR). If the field is VARCHAR then even if you use the N prefix on the string, the character will still get stored as ?, unless the Code Page specified by the Collation of the column supports this character, but again, I think this might be a Unicode-only character.
Change your INSERT statement to prefix the string field with a capital "N": (73, 4, N'(3) ₺'). Even if you change the field to NVARCHAR, if you don't prefix the string with N then SQL Server will translate the character to ? first and then insert the ?. This is because the query gets parsed before it gets executed, and parsing (for non-Unicode string literals and variables) is done in the Code Page of the DB's default Collation
Probably for the same reason my browser isn't displaying it in the title for this question: It isn't in the application's character set (or maybe not supported by the font).
In this case, my browser shows some numbers in a box (denoting the character code).
SQL-server is translating it to a known character instead.
Ensure you're storing it in a field that supports the character in it's character set (I think UTF-8 is sufficient)
I have been struggling with the following query:
select * from table_name where contains((field1,field2),'"S.E.N.S"');
Following is the text I have in field 1:
"S.E.N.S Productions"
If I search for "production" I get the result, but not for "S.E.N.S".
Any idea regarding how to get the desired result ? Thanks.
Update: Sql Server version is 2005 with SP3.
Update: Well, it is quite weird. When I set the full-text to use noiseENG.txt the query in my question works fine. But everything stored in the database is in Turkish and all the settings are set accordingly, including noiseTRK.txt. "S.E.N.S" is not a word in Turkish nor in English as far as I know. I can set it to noiseENG.txt to get it work, but I doubt that would be appropriate. Can anyone know/think of a reason why noiseTRK.txt would break in above query ? Thanks.
P.S. I have tested with unmodified noise files as well as single-letters-removed version of Turkish noise file.
Out of the box, it may not work for SQL 2005.
As I recall, for the string "S.E.N.S Productions", the SQL 2005 FTS (Full Text Search) word breaker will ignore the punctiation, and break the string into individual letters (S E N S) and the word "Productions". Single letters are, by default, noise words, and therefore won't be included in the index.
You can do a couple things:
Edit your noise word list and remove the single letters and rebuild your index
Upgrade to a later version of SQL, as the word breaking behaviour changes to corretly return your result in 2008.
I am working with SQL Server 2008. My task is to investigate the issue where FTS cannot find the right result for Thai.
First, I have the table which enables the FTS on the column 'ItemName' which is nvarchar. The Catalog is created with the Thai Language. Note that the Thai language is one of the languages that doesn't separate the word by spaces, so 'หลวง' 'พ่อ' 'โสธร' are written like this in a sentence: 'หลวงพ่อโสธร'
In the table, there are many rows that include the word (โสธร); for example row#1 (ItemName: 'หลวงพ่อโสธร')
On the webpage, I try to search for 'โสธร' but SQL Server cannot find it.
So I try to investigate it by trying the following query in SQL Server:
select * from sys.dm_fts_parser(N'"หลวงพ่อโสธร"', 1054, 0, 0)
...to see how the words are broken. The first one is the text to be broken. The second parameter is to specify that we're using Thai (WorkBreaker, so on). Here is the result:
row#1 (display_item: 'ງลวง', source_item: 'หลวงพ่อโสธร')
row#2 (display_item: 'พຝโส', source_item: 'หลวงพ่อโสธร')
row#3 (display_item: 'ธร', source_item: 'หลวงพ่อโสธร')
Notice that the first and second row display the wrong display_item 'ງ' in the 'ງลวง' isn't even Thai characters. 'ຝ' in 'พຝโส' is not a Thai character either.
So the question is where did those alien characters come from? I guess this why I cannot search for 'โสธร' because the word breaker is broken and keeping the wrong character in the indexes.
Please help!
This should be due to the different Dialect of thai selected while the indexing was applied.
From FTS properties check what is your selected language / culture