I am currently writing a simple bitmap font generator using CoreGraphics and CoreText. I am retrieving the kerning table of a font with:
CFDataRef kernTable = CTFontCopyTable(m_ctFontRef, kCTFontTableKern, kCTFontTableOptionNoOptions);
and then parse it which works fine. The kerning pairs give me the glyph indices (i.e. CGGlyph) for the kerning pairs, and I need to translate them to unicode (i.e. UniChar), which unfortunately does not seem super easy. The closest I got was using:
CGFontCopyGlyphNameForGlyph
to retrieve the glyph name of the CGGlyph, but I don't know how to convert the name to unicode, as they are really just strings such as quoteleft. Another thing I though about was parsing the kCTFontTableCmap myself to manually do the mapping from the glyph to the unicode id, but that seems to be a ton of extra work for the task. Is there any simple way of doing this?
Thanks!
I don't know a direct method to get the Unicode for a given glyph, but you could
build a mapping in the following way:
Get all characters of the font with CTFontCopyCharacterSet().
Map all these Unicode characters to their glyph with CTFontGetGlyphsForCharacters().
For each Unicode character and its glyph, store the mapping glyph -> Unicode
in a dictionary.
Related
I have a file which is read by STM32 and it displays the contents on a GLCD.
It displays the glyphs of the unicode stream even when there are combining characters (e.g. क + ् + त = क्त ). Here it displays क ् त instead of क्त.
I have done some reading on this and found that every font uses a character mapping( cmap table ) to map character encoding ( e.g. Unicode) with the glyphs. I tried writing a cmap table in C for devanagari but it was an extensive list . Is there any logic I'm missing here which will simplify my cmap table or my objective to map unicodes to the glyphs?
You'll have to do so some work and I'm not even sure that the code will fit in a stm32, maybe a big one, perhaps. Have a look at https://www.freedesktop.org/wiki/Software/HarfBuzz/, it's a text shaper for many languages including indic ones.
I'm trying to display some Unicode (Cyrillic, actually) using XmLabel and a server-side XLFD font (-monotype-arial-medium-r-normal--*-90-*-*-p-*-iso10646-1). Whenever I use XmStringCreate() or XmStringCreateLtoR() as an XmString factory, the result meets my expectations.
When I try to use XmStringGenerate() factory, however, passing in either XmMULTIBYTE_TEXT for a multi-byte Unicode string, or XmWIDECHAR_TEXT for a wide string, garbage is rendered onto the screen, regardless of the font used (I tried both UTF-8 and single-byte Cyrillic server-side fonts).
The result can be seen below (the 1st 2 lines are ok, 2nd through 6th labels were created with XmStringGenerate() and are obviously not ok):
The complete code (requires Motif 2.1+ and a C99-compliant compiler) is here.
Can anyone suggest a working XmStringGenerate() example suitable for displaying Unicode characters (not just ISO-8859-1)?
XmMULTIBYTE_TEXT is locale-dependent, as n.m suggested, and, aside from CJK (i. e. for Roman and Slavic languages), can only be used in UTF-8 locales. Core X11 fonts can be specified as either fonts (XmFONT_IS_FONT):
-monotype-arial-medium-r-normal--*-90-*-*-p-*-iso10646-1
or font sets (XmFONT_IS_FONTSET):
-monotype-arial-medium-r-normal--*-90-*-*-p-*-*-*:
Speaking of XmWIDECHAR_TEXT mode, it seems impossible to specify a proper font with an explicit encoding, but setting a font set instead works perfectly for Motif 2.1 through 2.3.
I am trying to add a special character (specifically the ndash) to a Model field's help_text. I'm using it in the Form output so I tried what seemed intuitive for the HTML:
help_text='2 – 30 characters'
Then I tried:
help_text='2 \2013 30 characters'
Still no luck. Thoughts?
django escapes all html by default. try wrapping your string in mark_safe
You almost had it on your second try. First you need to declare the string as Unicode by prefacing it with a u. Second, you wrote the codepoint wrong. It needs a preface as well; like \u.
help_text=u'2\u201330 characters'
Now it will work and has the added benefit of not polluting the string with HTML character entities. Remember that field value could be used elsewhere, not just in the Form display output. This tip is universal for using Unicode characters in Python.
Further reading:
Unicode literals in Python, which mentions other codepoint prefaces (\x and \U)
PEP263 has simple instructions for using actual raw Unicode characters in a source file.
I've got a WinForms RichTextBox in my application. When I enter the Chinese text "蜜蜜蜜蜜", the control uses the following RTF:
{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fmodern\fprq6\fcharset134 SimSun;}{\f1\fnil\fcharset0 Microsoft Sans Serif;}}
\viewkind4\uc1\pard\f0\fs17\'c3\'db\'c3\'db\'c3\'db\'c3\'db\f1\par
}
The test string is the same character four times. It's Unicode value is 34588 (0x871C). So how is it that the character is being stored as "\'c3\'db" in the RTF? What kind of encoding is that?
RTF is old, older than Job and considerably predates Unicode. I think it using code page 936, a double-byte character set for Simplified Chinese. Your snippet shows it using c3db for the character, it matches the glyph shown in this table.
I'm working on a legacy vb.net winform app, and would like to have have up and down arrows within my button controls.
I would think i need to invoke some sort of escape character sequence to have get the equivalent of &uparr; and &dnarr; ?
Open up "Character Map" (from Programs->Accessories->System Tools on WinXP). You can find all sorts of interesting characters there.
Sometimes, you'll want to use weird fonts like WebDings or WingDings, but be careful to only use fonts that will be on the users's machines.)
You can press ALT and type the unicode value for the character you want. Consult this table, specifically the "arrows" section, and convert from HEX to DEC.