unwanted output of printf - c

I am working on a legacy source code for computing data.
In order to debug few error conditions I have added the following printf in the code
printf("What???!!!!....\n");
The logs were maintained in a file and I was searching for the string "What???!!!!...." but II never found this because the output of it was coming as:
What??|!!!....
I have already wasted lot of time because of this unwanted output.
Can someone please help me to identify the reason for this?

the output is related to trigraph,
the string
??! corresponds to |
Check your makefile for -trigraphs
Make sure to have more sensible prints now-on :-)

In the olden days, keyboards didn't necessarily include all of the characters required to write C programs. To allow those without the right keyboard to program, the earliest versions of C compilers used trigraphs and digraphs, uncommon two- or three-character combinations that would translate directly to a possibly absent key. Here is a list of digraphs and trigraphs for C:
http://en.wikipedia.org/wiki/Digraphs_and_trigraphs#C
??! is in the list, and it translates to | in the preprocessor.
One way to fix this is in the article I linked above: Separate the question marks with a \, or close the string and reopen it between the question marks. This is likely your best choice, being that you're working with legacy code.
Often, you can also disable digraphs and trigraphs with compiler switches. Consult your documentation for those details.

Trigraphs(3 character sequences) and Digraphs(2 character sequences) were added in C to help someone type some characters that are outside the ISO 646 character set, and don't have keyboard compliant to that.
Here's a paragraph from the Diagraph and Trigraphs Wiki page which specifies it clearly:
The basic character set of the C programming language is a subset of
the ASCII character set that includes nine characters which lie
outside the ISO 646 invariant character set. This can pose a problem
for writing source code when the encoding (and possibly keyboard)
being used does not support any of these nine characters. The ANSI C
committee invented trigraphs as a way of entering source code using
keyboards that support any version of the ISO 646 character set.
To print those two question marks, you can escape the 2nd one, or use string concatenation:
printf("What??\?!!!!....\n");
printf("What??" "?!!!!....\n);

Related

What is the use of \n while scanning two integers like in scanf("%d \n %d")?

I found this the above type of code in a pre-completed portion of a coding question in Hackerrank. I was wondering what \n would do? Does it make any difference?
Read some good C reference website, and perhaps the C11 standard n1570 and probably Modern C.
The documentation of scanf(3) explains what is happening for \n in the format control string. It is handled like a space and matches a sequence of space characters (such as ' ', or '\t', or '\n') in the input stream.
If you explicitly want to parse lines, you would use some parser generator like GNU bison and/or use first fgets(3) or getline(3) and later sscanf(3).
Don't forget to handle error cases. See errno(3). Consider documenting using EBNF notation the valid inputs of your program.
Study for inspiration the source code of existing open source programs, including GNU bash or GNU make. Be aware than in 2020 UTF-8 should be used everywhere (then you might want to use libunistring whose source code you could study and improve, since it is free software).
If you use Linux, consider using gdb(1) or ltrace(1) to understand the behavior of your program. Of course, read the documentation of your C compiler (perhaps GCC) and debugger (perhaps GDB).

Why use `code' for embracing code in a comment?

I just read some glibc 2.22 source code (the source file at /sysdeps/posix/readdir.c) and came across this comment:
/* The only version of `struct dirent*' that lacks `d_reclen' is fixed-size. */
(Newline removed.)
The weird emphasis of the type and identifier bugs me. Why not use just single-quotes or des accents graves? Is there some specific reason behind this? Might it be some character set conversion mistake?
I also searched the glibc style guide but didn't found anything concerning code formatting in comments.
Convention.
As you no doubt know, comments are ignored by the C compiler. They make no difference, but the developer who wrote that comment probably preferred their appearance to plain single quotes.
ASCII
Using non-ASCII characters (unicode) in source code is a relatively new practice (moreso when English-authored source code is concerned), and there are still compatibility issues remaining in many programming language implementations. Unicode in program input/output is a different thing entirely (and that isn't perfect either). In program source code, unicode characters are still quite uncommon, and I doubt we'll see them make much of an appearance in older code like the POSIX header files for some time, yet.
Source code filters
There are some source code filters, such as document generation packages like the the well-known Javadoc, that look for specific comment strings, such as /** to open a comment. Some of these programs may treat your `quoted strings' specially, but that quoting convention is older than most (all?) of the source code filters that might give them special treatment, so that's probably not it.
Backticks for command substutution
There is a strong convention in many scripting languages (as well as StackExchange markdown!) to use backticks (``) to execute commands and include the output, such as in shell scripts:
echo "The current directory is `pwd`"
Which would output something like:
The current directory is /home/type_outcast
This may be part of the reason behind the convention, however I believe Cristoph has a point as well, about the quotes being balanced, similar to properly typeset opening and closing quotation marks.
So, again, and in a word: `convention'.
This goes back to early computer fonts, where backtick and apostrophe were displayed as mirror images. In fact, early versions of the ASCII standard blessed this usage.
Paraphrased from RFC 20, which is easier to get at than the actual standards like USAS X3.4-1968:
Column/Row Symbol Name
2/7 ' Apostrophe (Closing Single Quotation Mark Acute Accent)
6/0 ` Grave Accent (Opening Single Quotation Mark)
This heritage can also be seen in tools like troff, m4 and TeX, which also used this quoting style originally.
Note that syntactically, there is a benefit to having different opening and closing marks: they can be nested properly.

Print Unicode characters in C, using ncurses

I have to draw a box in C, using ncurses;
First, I have defined some values for simplicity:
#define RB "\e(0\x6a\e(B" (ASCII 188,Right bottom, for example)
I have compiled with gcc, over Ubuntu, with -finput-charset=UTF-8 flag.
But, if I try to print with addstr or printw, I get the hexa code.
What I`m doing wrong?
ncurses defines the values ACS_HLINE, ACS_VLINE, ACS_ULCORNER, ACS_URCORNER, ACS_LLCORNER and ACS_LRCORNER. You can use those constants in addch and friends, which should result in your seeing the expected box characters. (There's lots more ACS characters; you'll find a complete list in man addch.)
ncurses needs to know what it is drawing because it needs to know exactly where the cursor is all the time. Outputting console control sequences is not a good idea; if ncurses knows how to handle the sequence, it has its own abstraction for the feature and you should use that abstraction. The ACS ("alternate character set") defines are one of those abstractions.
A few issues:
if your program writes something like "\e(0\x6a\e(B" using addstr, then ncurses (any curses implementation) will translate the individual characters to printable form as described in the addch manual page.
ncurses supports line-drawing for commonly-used pseudo-graphics using symbols (such as ACS_HLINE) which are predefined characters with the A_ALTCHARSET attribute combined. You can read about those in the Line Graphics section of the addch manual page.
the code 0x6a is ASCII j, which (given a VT100-style mapping) would be the lower left corner. The curses symbol for that is ACS_LRCORNER.
you cannot write the line-drawing characters with addstr; instead addch, addchstr are useful. There are also functions oriented to line-drawing (see box and friends).
running in Ubuntu, your locale encoding is probably UTF-8. To make your program work properly, it should initialize the locale as described in the Initialization section of the ncurses manual page. In particular:
setlocale(LC_ALL, "");
Also, your program should link against the ncursesw library (-lncursesw) to use UTF-8, rather than just ncurses (-lncurses).
when compiling on Ubuntu, to use the proper header definitions, you should define _GNU_SOURCE.
BTW, maybe I'm probably arriving somewhat late to the party but I'll give you some insight that might or not shed some light and skills for your "box drawing" needs.
As of 2020 I'm involved in a funny project on my own mixing Swift + Ncurses (under OSX for now, but thinking about mixing it with linux). Apparently it works flawlessly.
The thing is, as I'm using Swift, internally it all reduces to "importing .h and .c" files from some Darwin.ncurses library the MacOS Xcode/runtime offers.
That means (I hope) my newly acquired skills might be useful for you because apparently we're using the very same .h and .c files for our ncurses needs. (or at least they should be really similar)
Said that:
As of now, I "ignored" ACS_corner chars (I can't find them under swift/Xcode/Darwin.ncurses runtime !!!) in favour of pure UTF "corner chars", which also exist in the unicode pointspace, look:
https://en.wikipedia.org/wiki/Box-drawing_character
What does it mean? Whenever I want to use some drawing box chars around I just copy&paste pure UTF-8 chars into my strings, and I send these very strings onto addstr.
Why does it work? Because as someone also answered above, before initializing ncurses with initscr(), I just claimed "I want a proper locale support" in the form of a setlocale(LC_ALL, ""); line.
What did I achieve? Apparently pure magic. And very comfortable one, as I just copy paste box chars inside my normal strings. At least under Darwin.ncurses/OSX Mojave I'm getting, not only "bounding box chars", but also full UTF8 support.
Try the "setlocale(LC_ALL, ""); initscr();" approach and tell us if "drawing boxes" works also for you under a pure C environment just using UTF8 bounding box chars.
Greetings and happy ncursing!

Does any C standard header include ASCII aliases?

Would help to avoid doing things like,
#define ESC (27)
#define DEL (127)
Edit: Looking for either a C standard header or a POSIX C header with this.
Sadly, no. C is described abstractly in terms of execution character set and implementation character set, both of which may vary. The characters it uses are not the complete set offered by ASCII. In fact the version of ascii current at the time of the first C compilers didn't even have '#' yet.
Searching for "posix character set" turned this up. http://pubs.opengroup.org/onlinepubs/009696899/basedefs/xbd_chap06.html

Getting a Dev-C++ built program to output UNICODE characters to the Windows command line

If you can answer any of my questions, that would be awesome.
Here's the scoop: I'm teaching an intro to programming class in Thailand to 11th graders. It's been going great so far, their level of English is high enough that I can teach in English and have them write programs in English and everything is fine and dandy.
However, as speakers of a language with non-Latin characters, I feel that they should at least learn what UNICODE is. I won't test them on it or bog them down with implementation details, but I want to show them an example of a UNICODE program that can do I/O with Thai characters.
I'm operating under the following constraints, none of which can be changed (at least for this semester):
The program must run on Windows 7
The program must be in C (not C++)
We must use Dev-C++ (v. 4.9.9.3) as our IDE (I'm going to try and convince the admins to change for next semester, but they may not want to)
The program should output to the Command Line (I'd like it to "look like" the programs we've been writing so far)
I want it to be easy to set up and run, though I'm not opposed to including a Batch file to do some setup work for the kids.
Here's how far I've gotten, and the questions I have:
In Control Panel > Regions > Administrative > Language for non-UNICODE programs is set to Thai.
I used "chcp 874" to set the Thai codepage in the Command Line, but characters from the keyboard come appear as garbage characters. Is this maybe because the keyboard mappings are wrong or do I have to change something else?
I wrote a program with the line: printf("\u0E01\n"); which prints ก, the first letter in the Thai alphabet. Is that the right syntax?
I received a compiler warning that "Universal Characters are only supported in C++ and C99." Does Dev-C++ not compile to C99? Is there a way I could get a C99 compiler for it?
I ran the code and got garbage characters. I imagine this could be because of the compiler, the command line, or any number of other things.
I'd love to end this course with a program that outputs สวัสดีโลก, the Thai equivalent of "Hello World!" I've done tons of googling, but every answer I've found either doesn't work in this specific case or involved a different IDE.
Ok, here's my bit of help. I don't use Dev-C++ as my IDE, so I can't help you with IDE specific things, but the following is standard to most C/C++ compilers:
wprintf is the printf implementation for wide characters (unicode).
When using wide characters you will use wchar_t in place of char for defining strings.
so you might do something like this
#include <wchar.h>
int main(int argc, char** argv) {
wchar_t* str = L"สวัสดีโลก";
wprintf(L"%s", str);
system("pause");
return 0;
}
wprintf is most likely what you're looking for.
Other functions for printing and manipulating wide strings can be found by researching the wchar.h header file.
Reference:
wprintf - C++ Reference
Using L before the quotations means you intend to define a wide string. (unicode)
Hope that helps,
-Dave
I have never used DEV-C++ IDE :-) However, after reading up on it a bit I see that
dev-c++ version 4.9.9.3 uses gcc-3.5.4 mingw port. Which has universal character support status of "Done" see http://gcc.gnu.org/gcc-3.4/c99status.html for details. You have to change the IDE configuration such that the compiler uses -std=c99 as part of the compiler flags.
Hopefully that will do the trick.
I will try to fiddle with it on my own system and see how far we can get. Will update the answer if I find more clues :-)
If you need to change the code page in a console C program, you can add the header <stdlib.h> and the line system("CHCP 874"); at the beginning of the program.
If you need a free compiler conforming with C99 under windows, you can try Pelles C:
http://www.christian-heffner.de/index.php?page=download&lang=en
It is conforming at all with C99.
You have to use wide-string constants, that have the following syntax:
L"Wide string\n"
Instead of printf(), you need to use wprintf() and the like.
http://pubs.opengroup.org/onlinepubs/7908799/xsh/wchar.h.html

Resources