I'm trying to replicate some printf functionality for education purposes, but I've encountered some printf behavior that I'm unable to understand. I'll try explaining with a simple example:
have this call:
printf(" %c %c %c", 0, 1, 2); // yes, parameters are ints not chars.
The output seems normal, only 3 spaces, numbers are ignored.
But taking printf output to a file, then using "cat -e file" does this:
^# ^A ^B
^# for 0, ^A for 1, ^B for 2 and so on.
Here is my question, what are those symbols? How they relate to the values?
Also my own printf, does this as well except for the 0 which is treated as a '\0' char...I need to mimic printf exactly so I need to understand what is going on there...
I've searched about those symbols, but can't find anything. They are not memory garbage because the results are always same.
The -e option to cat tells it to use a particular notation for printing non-printable characters. ^# represents the value 0, ^A represents the value 1, and ^B represents the value 2. Those are exactly the values you gave.
Simply cat uses caret notation to display not printable characters. ^A represents 1 and ^Z represents 26. ^# is 0 and
27 - ^[
28 - ^\
29 - ^]
30 - ^^
31 - ^_
127 - ^?
What i meant is why printf prints ^# while im getting '\0'(?)
printf("%c", 0); prints the null character to stdout. What you see when viewing the output of stdout is not part of stdout, but an implementation detail of the shell/terminal program. Print the return value of printf() to get its understanding of how many characters it printed. Perhaps on stderr to separate it from stdout.
int count = printf(" %c %c %c", 0, 1, 2);
fprintf(stderr, "<%d>\n", count);
The output seems normal, only 3 spaces, numbers are ignored.
"Seems" deserves more detail. How was only 3 determined?
But taking printf output to a file, ...
What was the length of the file? 6?
... then using "cat -e file" does this:
Refer to #dbush good answer as to why you now see " ^# ^A ^B".
I'm using memset to replace the character.
is unclear as there is no memset() in the question.
Got it solved. Thanks to the explanations here, i realized that even i was printing the resulting string with write() i was using a pointer to iterate it so never gave a chance to actually pass over that null character.
write(STDOUT_FD, delta, d_length);
Then write() does the job correcty:
make test > check.txt && cat -e check.txt
own: ^# ^A ^B$
lib: ^# ^A ^B$
Also now i know about the caret notation, thanks everyone!
Related
I just had a question about C. I have a file that has text in the format of:
7034327874
5408438437
3267239807
1824566789
I was wondering how I would read in the data and in another file, print out:
703-432-7874
540-843-8437
326-723-9807
182-456-6789
Just consider these numbers as strings of characters (each character happens to be a digit). Then do character processing on them.
For example:
char str[16];
memset (str, 0, sizeof(str));
if (scanf(" %15[0-9]", &str) > 0)
printf("%.3s", str);
should, if fed with 0734327574, output 073 (notice that your example don't explain what should have happened in that case, and I am guessing one way of doing it. My guess could be wrong if 0734327574 is actually meant as an octal number for the decimal number 124891004).
The rest is an exercise to the reader. Of course you need to carefully read the documentation of memset, of scanf, of printf. Don't forget to end printf format strings with \n or to call fflush on stdout (which is often line-buffered).
Remember that numbers don't have digits. Only their notation have digits. The number 20, written in Roman notation XX, in English twenty, in French vingt, in hexadecimal 0x14, in octal 024 (or even 248), in binary 10100, as the simple arithmetic expression 3*7-1, is still the same number (which happens to be twice the number of fingers I have on my hands, and is also the number of arrondissements in Paris).
Here is the code
#include <stdio.h>
int main()
{
printf ("\t%d\n",printf("MILAN"));
printf ("\t%c",printf("MILAN"));
}
Here is the output
$gcc -o main *.c
$main
MILAN 5
MILAN |
Now the question is
Why printf return | when we are printing characters (formatter as %c) ?
What is the relation between 5 and | here?
Your question really boils down to the behaviour of printf("%c", 5);.
The actual output is a function of the character encoding used by your platform. If it's ASCII (very common) then it will output the control character ENQ. What you actually get as an output will depend on your command shell, and a vertical bar is not an unreasonable choice.
printf returns the number of characters printed. In this case, that number is 5, as you've seen. The second print you're doing tries to typecast that int to a char, which C lets you do because it's C. On your computer, it shows up as a |. I see it rendered as a blank character. As #Bathsheba says, the integer 5 corresponds to a control character in ASCII, and the rendering for those is system-dependent.
Here's an ascii table, if you're curious about other numbers.
I have this piece of C code
#include"stdio.h"
main()
{
int x;
char c;
for(x=1;(c=getchar())!=EOF;printf("%d ",x++));
}
It prints x from to 1 to n, where n is nth character taken from standard input by getchar().
For any normal input like - single char 'a', it reads 2 chars 'a' and '\n'. So the count is 2. I understood that.
But, when I press PgUp or PgDn from keyboard the terminal have "^[[5~" or "^[[6~" stream of characters, so the output count should be 6.
5 from "^[[5~" and 1 by the '\n' terminater.
But the count is only 5, So what is actually happening?
How ^[[5~ is 4 characters?
I thought '~' might be considered as sentinel characters or something but if I provide "1~" the count is 3.
Friendly note - I am in danger of being blocked from asking anymore. I searched for the answer but couldn't find it. If you think this qustion must be removed, please tell me in comments.
The caret character (^) is traditionally used to print ASCII control characters, which don't have a standard visual representation. ^[ is a single character, ESC (0x1b).
As a note, you could print out the numeric values for each of the characters you read to see if there are any surprises (in this case, you'd find the low value that doesn't match the display).
I have a character in a char-Array which I get with fputs(). But it contains a char which is getting count by the function strlen(). I decide to give me out the int value of this char to see where the problem is.
As char I can see nothing. Thought its a Whitespace but not sure. Would like if someone could tell me what it is and explain why it is there.
printf("%d",(int) input[6]); //--> give me the value of 10 out.
The value 10 is the ASCII value for the newline character (LF, or linefeed). Closely related is character 13, which is CR, or carriage return, which, on Windows systems, often precedes the LF character. I would suggest getting a copy of the ASCII table (they're all over the web) and referencing it from time to time.
Character 10 can be represented by '\n' in C code, as well as '\012', '\x0a', and '\u000a'
Character 13 (carriage return) can be represented by '\r', '\015', '\x0d', and '\u000d'.
It is the newline (LF (NL line feed, new line)) in ASCII. See all of the values here.
As already pointed out by the others, the character 10 in ASCII is LF (line feed).
If you wanted printf to output the character (not see its ordinal value), you could use the %c format specifier to pass a single character.
Example:
printf("-%c-", input[6]);
should yield:
-
-
I.e. two dashes separated by a line feed. Please keep in mind that the outcome on Windows depends on how your C runtime handles a single LF without CR as on Windows a line break is customarily represented by CRLF instead of just LF which is the standard on unixoid systems. The only exception to that rule were old Mac systems which used to use only CR to encode a line break.
Background: My current project is coding a tokenizer using C.
The program takes a single string as an argument a command line argument . The program ignores whitespace, then proceeds to read each character left to right, and categorizes valid tokens by their type as they come. If they're found not to be valid, outputs them as "error".
If an escape character is encountered, they should be labelled as such, then their value should be printed in hex.
Since the program reads characters in a String one at a time, I believe the string is simply reading '\' alone, instead of '\t' together as a single character. I've tried using a lookahead character:
if(currentCharacter == '\')
{
//then using an index-lookahead with strcat to combine '\' + 't'
}
but I'm not getting the output I expect.
I've also tried using:
if( iscntrl(currentCharacter) ){
printf("escapeChar [0x02%x]\n", currentCharacter);
}
But like I said, since the string is reading one character at a time, the output is not correct. I'm still learning C, so any guidance would be very appreciated. Let me know if any clarifications are needed.
Example Input:
" Hello2 0x234 2asdf \t"
Example Output:
word "Hello2"
hex "0x234"
error "2asdf"
escapeChar "0x09"
\t when it appears in a C string, is 1 character not 2. It is the tab character with ASCII code 0x09
if you want to check if a character is equal to \ in C you should escape it e.g.
if (currentCharacter == '\\')