Having problems printing åäö (├Ñ ├à | ├ñ ├ä | ├ ├û) - c

I can't print the swedish letters åäö.
#include <stdio.h>
#include <stdlib.h>
int main(void){
printf("å Å | ä Ä | ö Ö");
return 0;
}
The output I get is:
├Ñ ├à | ├ñ ├ä | ├ ├û
I don't understand what is wrong. I've searched google and stackoverflow, but nothing to be found. Maybe there is something wrong with UTF-8?
Other information that might be usefull:
I'm using Windows 10 and atom.
SOLUTION
Go to:
System language settings -> Administrative language settings -> Change system
locale...
Now check the following box:
[Beta: Use Unicode UTF-8 for worldwide language support]
This fixed my problem and I am now able to use UTF-8 characters.

The Windows command window (terminal, console, whatever you call it) does support UTF-8 since several years, at least with Windows 7 based on my experiences. You need to set the code page:
mode con cp select=65001
Additionally you can set the output code page programmatically:
SetConsoleOutputCP(CP_UTF8);

Related

Compare data in bytes to a hex number

I need to read data from stdin where I need to check if 2 bytes are equal to 0x001 or 0x002 and then depending on the value execute one of two types of code. Here is what i tried:
uint16_t type;
uint16_t type1 = 0x0001;
while( ( read(0, &type, sizeof(type)) ) > 0 ) {
if (type == type1) {
//do smth
}
}
I'm not sure what numeric system the read uses since printing the value of type both as decimal and as hex returns something completely different even when i wrtie 0001 on stdin
The characters 0x0001 and 0x0002 are control characters in UTF-8, you wont be able to type them manually.
If you are trying to compare the character input - use gets and atoi to convert the character-based input of stdin into an integer, or if you want to continue using read - compare against the characters '0' '0' '0' '1'/'2' instead of the values to get proper results
You didn't say what kind of OS you're using. If you're using Unix or Linux or MacOS, at the command line, it can be pretty easy to rig up simple ways of testing this sort of program. If you're using Windows, though, I'm sorry, my answer here is not going to be very useful.
(I'm also going to be making heavy use of the Unix/Linux
"pipe" concept, that is, the vertical bar | which sends the output of one program to the input of the next. If you're not familiar with this concept, my answer may not make much sense at first.)
If it were me, after compiling your program to a.out, I would just run
echo 00 00 00 01 00 02 | unhex | a.out
In fact, I did compile your program to a.out, and tried this, and... it didn't work. It didn't work because my machine (like yours) uses little-endian byte order, so what I should have done was
echo 00 00 01 00 02 00 | unhex | a.out
Now it works perfectly.
The only problem, as you've discovered if you tried it, is that unhex is not a standard program. It's one of my own little utilities in my personal bin directory. It's massively handy, and it's be super useful if something like it were standard, but it's not. (There's probably a semistandard Linux utility for doing this sort of thing, but I don't know what it is.)
What is pretty standard, and you probably can use it, is a base-64 decoder. The problem there is that there's no good way to hand-generate the input you need. But I can generate it for you (using my unhexer). If you can run
echo AAABAAIA | base64 -d | a.out
that will perform an equivalent test of your program. AAABAAIA is the base-64 encoding of the three 16-bit little-endian words 0000 0001 and 0002, as you can see by running
echo AAABAAIA | base64 -d | od -h
On some systems, the base64 command uses a -D option, rather than -d, to request decoding. That is, if you get an error message like "invalid option -- d", you can try
echo AAABAAIA | base64 -D | a.out

In c printf format of numbers will nor work in mrsh

Using printf format in for numbers which will work fine:
printf ("\r\n <%s>\t AmountOfMalloc %'.ld", HostName ,GetMalloc ()) ;
Output is like this, which is fine:
AmountOfMalloc 17.220.149.424
Calling the same app remote via mrsh in a script will cause no number format like this:
AmountOfMalloc 17220149424
Envirenment is Suse Linux Enterprise Server 15 sp2 in VMware Workstation 15.5.7 on Windows 10 ltsc 2019, 4 cores, 6GB ram.
Has someone expierienced this issue and a possible solution?
The difference is caused by different environment locale settings.
You can see the settings that are applied when the program runs the way you want by running locale command. It will output several lines for e.g. money and number format settings, but usually they all have the same value, such as "en_US.UTF-8".
Easiest way to apply the setting to your remotely run command is to prefix the line like this:
LC_ALL=en_US.UTF-8 /path/to/program
From the examples you provide, the locale you want is probably different than en_US.UTF-8, so use the value you get from locale.
Thanks a lot, locale was the problem. Did not see any reason why,
Adding the following to the c source before printf works fine:
setlocale ( LC_NUMERIC, "de_DE.UTF-8" ) ;
setlocale ( LC_ALL, "de_DE.UTF-8" ) ;
Thanks.

Enabling ANSI escape code processing on the modern Windows command line? [C]

I've heard that modern Windows consoles support ANSI escape codes (COLORS), but you have to enable them.
Using the 19042.746 build of Windows 10, it should just be a simple affair of enabling it using the SetConsoleMode(consoleHandle, ENABLE_VIRTUAL_TERMINAL_PROCESSING); // from windows.h, but after setting it, it still doesn't support ANSI escape colors. What am I missing?
Real life examples
HANDLE hConsole = GetStdHandle(STD_OUTPUT_HANDLE);
SetConsoleMode(hConsole, ENABLE_VIRTUAL_TERMINAL_PROCESSING);
printf("\033[32mThis is green");
Prints out
[31mThis is green
Sources:
SetConsoleMode function
Bash tips: Colors and formatting (ANSI/VT100 Control sequences)
You also need to add ENABLE_PROCESSED_OUTPUT. You need to use GetConsoleMode first to get the current mode (it usually contains ENABLE_PROCESSED_OUTPUT), and then add ENABLE_VIRTUAL_TERMINAL_PROCESSING to the mode:
HANDLE hConsole = GetStdHandle(STD_OUTPUT_HANDLE);
DWORD mode = 0;
GetConsoleMode(hConsole, &mode); //
SetConsoleMode(hConsole, mode | ENABLE_VIRTUAL_TERMINAL_PROCESSING);
printf("\033[32mThis is green");
Result:

How are various glyphs encoded inside a PDF content stream?

I am working on a program that outputs PDF documents. Given a sequence of UTF-8 encoded characters and the name of a font that shall be used to render it, I would like to show the appropriate glyphs that make the actual content of the document. I would like to be able to display national characters such as č or ö. It would be great to support ligatures like ae or ffi.
The problem is, I do not know how the actual glyphs to be shown are specified (inside a content stream, for example).
If I, for example, want to display the string "Hello World", I need not to worry about encoding, I simply write (Hello World)Tj. The PDF reader will then use the appropriate font to render this string.
But what if I wanted to show the string
It is difficult to read the PDF specification all day. Prostě dočista nemožné!
with the ligatures ffi, fi and ea and the Czech national symbols ě, č and é in a given font, how would I proceed?
I am trying to get through the PDF specification, but it is not easy.
How do I find out the "code of the glyph" that corresponds to a given character or ligature?
How is this code encoded within a PDF content stream?
Help is much appreciated.
Edit: I may have overestimated the problem. Counting the glyphs that are needed to display a "common European document", I cannot think of a way how this number could exceed 256. If my assumptions are correct, I can remap the encoding of the font completely. This should be sufficient to cover all common symbols of the latin alphabet, numbers, punctuation, common symbols like ( and [ and still I would have plenty of room for national symbols, ligatures and other elements of high-quality typography. (I can implement a priority queue to select the most used ligatures if the total number of glyphs shall exceed 256.)
That being said, I do not think I need to use the CID-keyed fonts.
Still I wander how do I map UTF-8 encoded characters onto glyphs of an arbitrary font. I have the AFM of the font available. For the DejaVu font, for example, character information go like this:
C 63 ; WX 536 ; N question ; B 67 -15 488 743 ;
C 64 ; WX 1000 ; N at ; B 65 -174 930 705 ;
C 65 ; WX 722 ; N A ; B -6 0 732 730 ;
But after the 256th character is mapped, the codes are -1:
C 255 ; WX 564 ; N ydieresis ; B -3 -223 563 767 ;
C -1 ; WX 722 ; N Amacron ; B -6 0 732 899 ;
C -1 ; WX 596 ; N amacron ; B 49 -15 568 746 ;
For example, if I had the sequence 11100010 10000010 10101100 (Euro sign) in my input, how would I know what glyph name it corresponds to so that I can map it in the /Encoding dictionary?
Encoding varies based on the font type. Typically, there is a font resource that is defined as the current font and within that font dictionary is a reference to a base font and a means of describing the encoding (via the /Encoding key). If that key doesn't exist, the encoding will be "standard", but you can use other simple encodings such as /MacRoman and /WinAnsi for the value of the encoding, or you can specify a standard encoding and an encoding delta to show the differences.
Easy so far - as long as you're working with 8-bit characters. For many early apps, they would create a couple different fonts, one with say Roman encoding and another that maps roman characters to unavailable characters. In order to do that, your encoding delta would include references to the ligatures and other typically non-encoded symbols. This works great for Type 1 fonts, but is specifically contraindicated by the spec in the section on TrueType Fonts:
A nonsymbolic font should specify MacRomanEncoding or WinAnsiEncoding as the value of its Encoding entry, with no Differences array
This is vastly different when you want to use, say, Unicode. In which case you would be using a CID font (a font based on character IDs). In that case there is a procedure referenced by the font which is used to map from a character encoding in your string to a character ID in your font (and vice versa). I would strongly recommend that you read and fully understand section 9.7 in the PDF specification on Composite Fonts, which describes everything you need in order to encode UTF16BE into strings to get them to render properly in PDF. It is decidedly non-trivial in that there are a lot of details that if missed will result in a blank rendered page in Acrobat.
As a software engineer who professionally writes code that produces and consumes PDF, let me state that when I get tasked with having to put in special cases in my code to deal with non-spec compliant PDF, a little piece of me dies inside. Please, please, don't even think of releasing any documents you produce into the wild until they pass Preflight at the least. This is not the same as "Acrobat renders it so it must be OK." Let me give you an example - I've seen a number of files in the wild that include fonts that are missing the key elements of the FontDescriptor dictionary, including /Ascent, /Descent, /CapHeight, etc. These render in Acrobat, but are in violation of the spec since each of those is required. I know how Acrobat handles that - it comes with an enormous database of font metrics and looks up the value if it can't find it in the file (heck, it might even ignore the metrics in the file). I don't have that luxury, so I have to do a number of (potentially expensive/invalid) stop gap measures.
You might want to consider using a library to do this work for you - maybe iText which has a decent enough licensing scheme for education because, I get it, you're a student. There are some C based libraries too. Maybe you can figure a way to make GhostScript do your bidding.
If you are unwilling or unable to follow my advice with regards to cleaving to the specification or to use a library which ostensibly does so, please do me the favor of at least filling out the /Creator and /Producer strings in the Document Information Dictionary referenced by the trailer (see sections 14.3.3 and section 7.5.5). That way, when I have to parse/consume/manipulate your documents, I will have a way to directly cast aspersions on your parentage.
Let's go top down and start with the page object - I'm using output from my own library and am stripping out what I think you don't need:
1 0 obj <<
/Type /Page
/Parent 18 0 R
/Resources <<
/Font <<
/U0 13 0 R
>>
/ProcSet [ /PDF /Text ]
>>
/MediaBox [ 0 0 612 792 ]
/Contents 19 0 R
/Dur -1
>>
endobj
U0 is a reference to a font that will be used for unicode text.
The content stream is intended to print the following text: Greek: Γειά σου κόσμος.
BT /U0 24 Tf 72 670 Td
(\000G\000r\000e\000e\000k\000:\000 \003\223\003\265\003\271\003\254\000 \003\303\003\277\003\305\000 \003\272\003\314\003\303\003\274\003\277\003\302)
Tj ET
The font dictionary referenced looks like this:
13 0 obj <<
/BaseFont /DejaVuSansCondensed
/DescendantFonts [ 4 0 R ]
/ToUnicode 14 0 R
/Type /Font
/Subtype /Type0
/Encoding /Identity-H
>>
endobj
Which has the /ToUnicode entry points to a stream containing the following PostScript code:
/CIDInit /ProcSet findresource begin 12 dict begin begincmap /CIDSystemInfo << /Registry (Adobe) /Ordering (UCS) /Supplement 0 >> def /CMapName /Adobe-Identity-UCS def /CMapType 2 def 1 begincodespacerange <0000> <FFFF> endcodespacerange 1 beginbfrange <0000> <FFFF> <0000> endbfrange endcmap CMapName currentdict /CMap defineresource pop end end
which is defined by the CID font specification.
and the DescendantFonts array points to this object:
4 0 obj <<
/Subtype /CIDFontType2
/Type /Font
/BaseFont /DejaVuSansCondensed
/CIDSystemInfo 7 0 R
/FontDescriptor 8 0 R
/DW 1000
/W 9 0 R
/CIDToGIDMap 10 0 R
>>
The CIDToGIDMap is a compressed stream with the actual map, the CIDSystemInfo is <</Registry (Adobe) /Ordering (USC) /Supplement 0>> (it's a reference because I share it among all unicode fonts that I output. The FontDescriptor is a straight forward boiler plate, and the W array is derived from the font metrics.
With all this detail, are you understanding why I don't say lightly, "walk away before you pollute my environment any furhter"?
I'm really beginning to question the nature of the this assignment. Writing a simple PDF is one thing, but writing code that can handle full unicode in any arbitrary OpenType/TrueType font requires you to understand the CID spec and the TrueType spec (hint: I have a full TrueType parser that can extract all the metrics for any glyph in a font so that I can output the /W array).
If, however, you are required to only output to Type 1 fonts, well my friend, your life got a whole lot easier, because you would take your entire UTF8 stream, read it as unicode and for every unique character that comes in, you build a map from a unicode character to a glyph name and an internal character number by using this table. The internal character number essentially the unique index of the character that came in mod. So for example, if you have less than 257 unique characters on the page, you will have exactly one font that is encoded to map to the characters in the order that the arrived. If you had "abcba" for input, the output string in pdf would be (\000\001\002\001\000) and would map to a font with an encoding dictionary with a differences array that would be [0/a/b/c]. If you have n unique characters where n > 256, you're going to have (n / 256) + 1 fonts, each with encodings.
If your teacher/professor wants anything but Type 1 fonts in a short period of time, s/he has unrealistic expectations for the students and/or low expectations for the quality of output. You should ask whether your are required to handle CID fonts and if you are, then your professor is at the very least a sadist. It took me, a seasoned professional, about 4 days to write a TrueType parser for extracting widths. I had the advantage of (1) using a managed language (C#) which cut down on concerns that will be biting your ass in C and was also able to use reflection to automate parsing and (2) when I don't have interruptions, I write solid code about 10-20 times faster than a typical student, so my 32 hours would translate into 320 student hours, more or less (then again, my code has different constraints than yours - it has to consume any crap font it gets gracefully), so let's call it 200 or less if you're allowed to steal something like stb. That's just for getting one particular element in the font descriptor.

Streaming a remote file

i'm new to fmod, and I'm trying to use it for a simple application.
I just need to open a remote music file (mostly mp3, and if that can help I can transcode on the server to always have mp3).
When I try to
FMOD_System_CreateSound(system, "http://somewhere.com/song.mp3", FMOD_SOFTWARE | FMOD_2D | FMOD_CREATESTREAM, 0, &song);
That works fine, it open and play the mp3 fine.
But, when I try to do what I realy need :
FMOD_System_CreateSound(system, "http://somewhere.com/somepage.view?id=4324324324556546456457567456ef3345&var=thing", FMOD_SOFTWARE | FMOD_2D | FMOD_CREATESTREAM, 0, &song);
It just don't works.
That link for example would return a stream.mp3 file, but FMOD just fail on it.
Is there a way to make it works ?
I guess the problem is FMOD just don't find the filename in the link, but I can't change the link :/
If it's not possible, is there a way to make fmod works with curl (curl download the file perfectly), like a function to call for each part of the file ?
Thanks
The main issue with session ID based URLs is they can get quite long. Old versions of FMOD only supported 256 characters (causing truncation and failure to load), but any recent supported version allows up to 1024 characters.
I would recommend updating to a more recent version of FMOD and report back if you have any troubles.

Resources