Escaping converting Danish characters by JTidy - jtidy

I'm using JTidy to parse an HTML page to a XHTML. The HTML contains danish characters then the JTidy converts them in to some specific characters.
eg :
Word "Observér" is converted to "Observér".
Is there a way to avoid this?

Related

moment js not properly translating the date format for japanese locale

I need to translate the date format to Japanese locale but its showing output wrongly.I also tried by changing the locale of the browser but its not working in both chrome and IE
app.filter('japan', function() {
return function(dateString, format) {
return moment().locale('ja').format('LLLL');
};
})
Output for the format is 2016蟷エ6譛�20譌・蜊亥燕11譎N蛻� 譛域屆譌・
Required output is 2016年6月20日午前11時30分 月曜日
This isn't an issue with Moment. It's an encoding problem known as mojibake and can happen when your page has an encoding that doesn't correctly handle the characters you are using. In general, it's preferable to use a neutral encoding like UTF-8 or UTF-16 (UTF-8 is the de-facto standard), and from the comments above, it sounds like this did indeed fix your issue.
Additionally, it is a good idea to set a lang="" attribute on the element containing your localized content (you can do this as high up as the <html> element), because certain characters can have different appearances depending on the locale.
To take your text as an example, the top-right portion of the character 曜 looks like 羽 with lang="zh", but looks like two side-by-side ヨs with lang="jp".

angularjs resource slash parameters

I am using $resource to make a rest api call.
My call to that resource is like that :
Client.get({parametres : param}
My problem is that param contains "\" character, that make the call fail with
400 Bad Request
response.
How can I escape the "\" character?
Thanks.
encodeURIComponent should do the trick.
The encodeURIComponent() method encodes a Uniform Resource Identifier (URI) component by replacing each instance of certain characters by one, two, three, or four escape sequences representing the UTF-8 encoding of the character (will only be four escape sequences for characters composed of two "surrogate" characters).
As per: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURIComponent
Client.get({ parameters: encodeURIComponent(param) }

OSX: CGGlyph to UniChar

I am currently writing a simple bitmap font generator using CoreGraphics and CoreText. I am retrieving the kerning table of a font with:
CFDataRef kernTable = CTFontCopyTable(m_ctFontRef, kCTFontTableKern, kCTFontTableOptionNoOptions);
and then parse it which works fine. The kerning pairs give me the glyph indices (i.e. CGGlyph) for the kerning pairs, and I need to translate them to unicode (i.e. UniChar), which unfortunately does not seem super easy. The closest I got was using:
CGFontCopyGlyphNameForGlyph
to retrieve the glyph name of the CGGlyph, but I don't know how to convert the name to unicode, as they are really just strings such as quoteleft. Another thing I though about was parsing the kCTFontTableCmap myself to manually do the mapping from the glyph to the unicode id, but that seems to be a ton of extra work for the task. Is there any simple way of doing this?
Thanks!
I don't know a direct method to get the Unicode for a given glyph, but you could
build a mapping in the following way:
Get all characters of the font with CTFontCopyCharacterSet().
Map all these Unicode characters to their glyph with CTFontGetGlyphsForCharacters().
For each Unicode character and its glyph, store the mapping glyph -> Unicode
in a dictionary.

Unicode/special characters in help_text for Django form?

I am trying to add a special character (specifically the ndash) to a Model field's help_text. I'm using it in the Form output so I tried what seemed intuitive for the HTML:
help_text='2 – 30 characters'
Then I tried:
help_text='2 \2013 30 characters'
Still no luck. Thoughts?
django escapes all html by default. try wrapping your string in mark_safe
You almost had it on your second try. First you need to declare the string as Unicode by prefacing it with a u. Second, you wrote the codepoint wrong. It needs a preface as well; like \u.
help_text=u'2\u201330 characters'
Now it will work and has the added benefit of not polluting the string with HTML character entities. Remember that field value could be used elsewhere, not just in the Form display output. This tip is universal for using Unicode characters in Python.
Further reading:
Unicode literals in Python, which mentions other codepoint prefaces (\x and \U)
PEP263 has simple instructions for using actual raw Unicode characters in a source file.

What encoding does the System.Windows.Forms.RichTextBox use for unicode chars?

I've got a WinForms RichTextBox in my application. When I enter the Chinese text "蜜蜜蜜蜜", the control uses the following RTF:
{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fmodern\fprq6\fcharset134 SimSun;}{\f1\fnil\fcharset0 Microsoft Sans Serif;}}
\viewkind4\uc1\pard\f0\fs17\'c3\'db\'c3\'db\'c3\'db\'c3\'db\f1\par
}
The test string is the same character four times. It's Unicode value is 34588 (0x871C). So how is it that the character is being stored as "\'c3\'db" in the RTF? What kind of encoding is that?
RTF is old, older than Job and considerably predates Unicode. I think it using code page 936, a double-byte character set for Simplified Chinese. Your snippet shows it using c3db for the character, it matches the glyph shown in this table.

Resources