Unicode/special characters in help_text for Django form? - django-models

I am trying to add a special character (specifically the ndash) to a Model field's help_text. I'm using it in the Form output so I tried what seemed intuitive for the HTML:
help_text='2 – 30 characters'
Then I tried:
help_text='2 \2013 30 characters'
Still no luck. Thoughts?

django escapes all html by default. try wrapping your string in mark_safe

You almost had it on your second try. First you need to declare the string as Unicode by prefacing it with a u. Second, you wrote the codepoint wrong. It needs a preface as well; like \u.
help_text=u'2\u201330 characters'
Now it will work and has the added benefit of not polluting the string with HTML character entities. Remember that field value could be used elsewhere, not just in the Form display output. This tip is universal for using Unicode characters in Python.
Further reading:
Unicode literals in Python, which mentions other codepoint prefaces (\x and \U)
PEP263 has simple instructions for using actual raw Unicode characters in a source file.

Related

moment js not properly translating the date format for japanese locale

I need to translate the date format to Japanese locale but its showing output wrongly.I also tried by changing the locale of the browser but its not working in both chrome and IE
app.filter('japan', function() {
return function(dateString, format) {
return moment().locale('ja').format('LLLL');
};
})
Output for the format is 2016蟷エ6譛�20譌・蜊亥燕11譎N蛻� 譛域屆譌・
Required output is 2016年6月20日午前11時30分 月曜日
This isn't an issue with Moment. It's an encoding problem known as mojibake and can happen when your page has an encoding that doesn't correctly handle the characters you are using. In general, it's preferable to use a neutral encoding like UTF-8 or UTF-16 (UTF-8 is the de-facto standard), and from the comments above, it sounds like this did indeed fix your issue.
Additionally, it is a good idea to set a lang="" attribute on the element containing your localized content (you can do this as high up as the <html> element), because certain characters can have different appearances depending on the locale.
To take your text as an example, the top-right portion of the character 曜 looks like 羽 with lang="zh", but looks like two side-by-side ヨs with lang="jp".

XmStringGenerate() in XmMULTIBYTE_TEXT or XmWIDECHAR_TEXT mode

I'm trying to display some Unicode (Cyrillic, actually) using XmLabel and a server-side XLFD font (-monotype-arial-medium-r-normal--*-90-*-*-p-*-iso10646-1). Whenever I use XmStringCreate() or XmStringCreateLtoR() as an XmString factory, the result meets my expectations.
When I try to use XmStringGenerate() factory, however, passing in either XmMULTIBYTE_TEXT for a multi-byte Unicode string, or XmWIDECHAR_TEXT for a wide string, garbage is rendered onto the screen, regardless of the font used (I tried both UTF-8 and single-byte Cyrillic server-side fonts).
The result can be seen below (the 1st 2 lines are ok, 2nd through 6th labels were created with XmStringGenerate() and are obviously not ok):
The complete code (requires Motif 2.1+ and a C99-compliant compiler) is here.
Can anyone suggest a working XmStringGenerate() example suitable for displaying Unicode characters (not just ISO-8859-1)?
XmMULTIBYTE_TEXT is locale-dependent, as n.m suggested, and, aside from CJK (i. e. for Roman and Slavic languages), can only be used in UTF-8 locales. Core X11 fonts can be specified as either fonts (XmFONT_IS_FONT):
-monotype-arial-medium-r-normal--*-90-*-*-p-*-iso10646-1
or font sets (XmFONT_IS_FONTSET):
-monotype-arial-medium-r-normal--*-90-*-*-p-*-*-*:
Speaking of XmWIDECHAR_TEXT mode, it seems impossible to specify a proper font with an explicit encoding, but setting a font set instead works perfectly for Motif 2.1 through 2.3.

OSX: CGGlyph to UniChar

I am currently writing a simple bitmap font generator using CoreGraphics and CoreText. I am retrieving the kerning table of a font with:
CFDataRef kernTable = CTFontCopyTable(m_ctFontRef, kCTFontTableKern, kCTFontTableOptionNoOptions);
and then parse it which works fine. The kerning pairs give me the glyph indices (i.e. CGGlyph) for the kerning pairs, and I need to translate them to unicode (i.e. UniChar), which unfortunately does not seem super easy. The closest I got was using:
CGFontCopyGlyphNameForGlyph
to retrieve the glyph name of the CGGlyph, but I don't know how to convert the name to unicode, as they are really just strings such as quoteleft. Another thing I though about was parsing the kCTFontTableCmap myself to manually do the mapping from the glyph to the unicode id, but that seems to be a ton of extra work for the task. Is there any simple way of doing this?
Thanks!
I don't know a direct method to get the Unicode for a given glyph, but you could
build a mapping in the following way:
Get all characters of the font with CTFontCopyCharacterSet().
Map all these Unicode characters to their glyph with CTFontGetGlyphsForCharacters().
For each Unicode character and its glyph, store the mapping glyph -> Unicode
in a dictionary.

WPF Flowdocument - Prevent line break before % sign

I've got a FlowDocument generating a document for a client, and it's getting a line break that they don't like. Is there any way to mark a section of text that it should avoid line breaks? Something like this:
<Paragraph>Here is a paragraph where there should be <span NoLineBreak=True>no line break</span> in a certain part.</Paragraph>
Obviously, a Span doesn't have a NoLineBreak property, but I'm wondering if there's some equivilant functionality available, or if someone can get me started on a way of implementing a SpanWithNoLineBreak class or RunWithNoLineBreak class?
UPDATE
Actually, one issue I'm having is with a percent sign, where there isn't even a space:
<Paragraph>When I print and ½% I want the one-half and '%' symbols to not line break between them.</Paragraph>
The & #x00BD; is the unicode for a ½ symbol. I'm getting a line wrap between the 1/2 and the % even though there's no space between them.
The Unicode character "Word Joiner" (U+2060) is intended for just this purpose. It "does not normally produce any space but prohibits a line break on either side of it" (Wikipedia). You place it between U+00BD and '%' to prevent a line break between them.
Unfortunately, WPF (or perhaps the typical fonts supplied with Windows) don't support it properly, and instead render it as a square box. As an alternative, you could use U+FEFF; the use of this character as a zero-width non-breaking space is now deprecated (it's reserved for use as a byte-order mark), but it worked as a line-break-preventer for me.
Finally, there are some other characters that can also be used for this purpose: U+202F (narrow no-break space) also prevents breaking, but also renders as a very thin space. U+00A0 (no-break space) prevents breaking and displays as a normal space.
Try replacing the spaces with non-breaking spaces.
EDIT: Well there's always the backup plan of just putting in TextBlocks in your FlowDocument with TextWrapping=NoWrap, but I'd try to find a better way...

What encoding does the System.Windows.Forms.RichTextBox use for unicode chars?

I've got a WinForms RichTextBox in my application. When I enter the Chinese text "蜜蜜蜜蜜", the control uses the following RTF:
{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fmodern\fprq6\fcharset134 SimSun;}{\f1\fnil\fcharset0 Microsoft Sans Serif;}}
\viewkind4\uc1\pard\f0\fs17\'c3\'db\'c3\'db\'c3\'db\'c3\'db\f1\par
}
The test string is the same character four times. It's Unicode value is 34588 (0x871C). So how is it that the character is being stored as "\'c3\'db" in the RTF? What kind of encoding is that?
RTF is old, older than Job and considerably predates Unicode. I think it using code page 936, a double-byte character set for Simplified Chinese. Your snippet shows it using c3db for the character, it matches the glyph shown in this table.

Resources