What does \X mean in C - c

i know that \x is hexadecimal representation of ASCII literal
printf("%c",'\x41'); // A
printf("%c",'\X41'); // 1
why?
reference: http://c.comsci.us/etymology/literals.html (see most bottom table)

This is not standard, so results will vary.
Given that \X is not a valid control sequence, your particular compiler is choosing the last character in the single-quotes and using that.
So, if you did: '\X12345' you would probably get 5. There is nothing special about \X. If you take that out of your dodgey character literal, you'll get the same result.

I request you compile with -Wall option.
aaa.c: In function ‘main’:
aaa.c:6:14: warning: unknown escape sequence: '\X' [enabled by default]
aaa.c:6:14: warning: multi-character character constant [-Wmultichar]
From the above message X is ignored.
And considering '41', this is multi-character character constant.
As #paddy said, last character in the sinle-quotes used. so character '1' is printed.
set of escape sequences
\a alert (bell) character \\ backslash
\b backspace \? question mark
\f formfeed \’ single quote
\n newline \" double quote
\r carriage return \ooo octal number
\t horizontal tab \xhh hexadecimal number
\v vertical tab

The first line \x41 is correct, and 0x41 is the ASCII value of 'A' ans so 'A' is printed.
The second line is a syntax error, as \X (upper case) is not a valid escape sequence. What happens now is dependent on your compiler, but GCC will take \X4 as invalid and ignore it, leaving the '1' as a literal character to be printed.

Related

C - format specifier for scanf?

float lat, lon;
char info[50];
scanf("%f, %f, %49[^\n]", &lat, &lon, info);
In the above snippet, what kind of format specifier is %49[^\n].
I do understand that it is the format specifier for the character array which is going to accept input upto 49 characters (+ the sentinal \0), and [^\n] looks like its a regex (although I had read somewhere that scanf doesn't support regex) OR a character set which is to expand to "any other character" that is NOT "newline" \n. Am I correct?
Also, why is there no s in the format specifier for writing into array info?
The program this snippet is from works. But is this good C style?
The specifier %[ is a different conversion specifier from %s, even if it also must be paired with an argument of type char * (or wchar_t *). See e.g. the table here
[set] matches a non-empty sequence of character from set of characters.
If the first character of the set is ^, then all characters not in the set are matched. If the set begins with ] or ^] then the ] character is also included into the set. It is implementation-defined whether the character - in the non-initial position in the scanset may be indicating a range, as in [0-9]. If width specifier is used, matches only up to width. Always stores a null character in addition to the characters matched (so the argument array must have room for at least width+1 characters)
My apologies, I incorrectly answered below. If you can skip to the end, I'll give you the correct answer.
*** Incorrect Answer Begins ***
It would not be a proper format specifier, as there is no type.
%[parameter][flags][width][.precision][length]type
are the rules for a format statement. As youc an see, the type is non-optional. The author of this format item is thinking they can combine regex with printf, when the two have entirely different processing rules (and printf doesn't follow regex's patterns)
*** Correct Answer Begins ***
scanf uses different format string rules than printf Within scanf's man page is this addition to printf's rules
[
Matches a nonempty sequence of characters from the specified set
of accepted characters; the next pointer must be a pointer to char,
and there must be enough room for all the characters in the string,
plus a terminating null byte. The usual skip of leading white space is
suppressed. The string is to be made up of characters in (or not in) a
particular set; the set is defined by the characters between the open
bracket [ character and a close bracket ] character. The set excludes
those characters if the first character after the open bracket is a
circumflex (^). To include a close bracket in the set, make it the
first character after the open bracket or the circumflex; any other
position will end the set. The hyphen character - is also special;
when placed between two other characters, it adds all intervening
characters to the set. To include a hyphen, make it the last character
before the final close bracket. For instance, [^]0-9-] means the set
"everything except close bracket, zero through nine, and hyphen". The
string ends with the appearance of a character not in the (or, with a
circumflex, in) set or when the field width runs out.
Which basically means that scanf can scan with a subset of regex's rules (the character set subset) but not all of regex's rules

\\\ prints just one backslash?

In C, on mentioning three backslashes, like so:
#include <stdio.h>
int main() {
printf(" \\\ ");
}
prints out just one backslash in the output. Why and how does this work?
That sequence is:
A space
A double backslash, which encodes a single backslash in the runtime string
A backslash followed by a space, which is not a standard escape sequence and should give you a diagnostic
The C11 draft says (in note 77):
The semantics of these characters were discussed in 5.2.2. If any other
character follows a backslash, the result is not a token and a diagnostic is
required.
On godbolt.org I got:
<source>:8:14: warning: unknown escape sequence '\ ' [-Wunknown-escape-sequence]
So you seem to be using a non-conforming compiler, which chooses to implement undefined backslash sequences by just letting the character through.
That is printing:
space
slash
escaped space
The 3rd slash is being interpreted as "slash space"
C11; 6.4.4.4 Character constants:
The double-quote " and question-mark ? are representable either by themselves or by the
escape sequences \" and \?, respectively, but the single-quote ' and the backslash \
shall be represented, respectively, by the escape sequences \' and \\.
So, To represent a single backslash, it’s necessary to place double backslashes \\ in the source code. To print two \\ you need four backslash \\\\. In your code extra \ is a space character, which isn't valid.
It is indeed a very simple operation in c. A \ is just a escape sequences. Hence below statement will print two slash.
printf(" \\\\ ");
For example some characters in c are represented with a slash like end of a line character \n or end of a string character \0 etc. But if you want to print such a character as it is what will you do? Hence you need to add a escape sequence character in front of it:
printf("\\n"); // will print \n
But
printf("\n"); // will print end of character hence you don't see anything in output

Multi character constant warning for escaped \t

If I write putchar('\\t'); while trying to print "\t" instead of an actual tab, I get the multi character constant warning. On the other hand, if I write putchar('\\'); I get no warning. Upon looking in the ASCII table, there is no character '\\', only '\'. So why is there no warning? Why is '\\' one character but '\\t' is more than one? Can a backslash only be used to escape one following character?
You cannot print \ and t with one putchar invocation, since putchar puts one and exactly only one character into the standard output. Use 2:
putchar('\\');
putchar('t');
Another option would be to use the string "\\t" with fputs:
fputs("\\t", stdout);
There is no warning for '\\' because that is one way how you enter the character literal for the character \. On ASCII this is synonymous with '\134' and '\x5c'.
From C11 6.4.4.4 paragraphs 2 and 4:
2
An integer character constant is a sequence of one or more multibyte characters enclosed in single-quotes, as in 'x'. [...] With a few exceptions detailed later, the elements of the sequence are any members of the source character set; they are mapped in an implementation-defined manner to members of the execution character set.
[...]
4
The double-quote " and question-mark ? are representable either by themselves or by the escape sequences \" and \?, respectively, but the single-quote ' and the backslash \ shall be represented, respectively, by the escape sequences \' and \\.
The reason why you get a warning for this is that the behaviour is wholly implementation-defined. In C11 J.3.4 the following is listed as implementation-defined behaviour:
The value of an integer character constant containing more than one character or containing a character or escape sequence that does not map to a single-byte execution character (6.4.4.4).
Since '\\' contains an escape sequence that maps to a single-byte execution character \, there is no implementation-defined pitfalls, and nothing to warn about; but \\t contains 2 characters: \ and t, and it wouldn't do what you want portably.
\\ is one character, t is one character, so that is clearly two characters.
\\ is an escape sequence, just like \t; it means \.
If you want to print the two characters \ and t, you clearly need either two calls to putch() or a function that takes a string argument "\\t".
https://en.wikipedia.org/wiki/Escape_sequences_in_C#Table_of_escape_sequences

sizeof(string) not including a "\" sign

I've been going through strlen and sizeof for strings (char arrays) and I don't quite get one thing.
I have the following code:
int main() {
char str[]="gdb\0eahr";
printf("sizeof=%u\n",sizeof(str));
printf("strlen=%u\n",strlen(str));
return 0;
}
the output of the code is:
sizeof=9
strlen=3
At first I was pretty sure that 2 separate characters \ followed by 0 wouldn't actually act as a NUL (\0) but I managed to figure that it does.
The thing is that I have no idea why sizeof shows 9 and not 10.
Since sizeof counts the amount of used bytes by the data type why doesn't it count the byte for the \?
In a following example:
char str[]="abc";
printf("sizeof=%u\n",sizeof(str));
that would print out "4" because of the NUL value terminating the array so why is \ being not counted?
In a character or string constant, the \ character marks the beginning of an escape sequence, used to represent character values for which there isn't a symbol in the source character set. For example, the escape sequence \n represents the newline character, \b represents the backspace character, \0 represents the zero-valued character (which is also the string terminator), etc.
In the string literal "gdb\0eahr", the escape sequence \0 maps to a single 0-valued character; the actual contents of str are {'g', 'd', 'b', 0, 'e', 'a', 'h', 'r', 0}.
It seems you already have the answer:
At first I was pretty sure that 2 separate characters "\" followed by "0" wouldn't actually act as a NULL "\0" but I managed to figure that it does.
The sequence \0 is an octal escape sequence for the byte 0. So while there are two characters in the code to denote this, it translated to a single byte in the string.
So you have 7 alphabetic characters, a null byte in the middle, and a null byte at the end. That's 9 bytes.
Why should char str[]="gdb\0eahr"; be 10 bytes with sizeof operator? It is 9 bytes because there are 8 string elements + trailing zero.
\0 is only 1 character, not 2. \'s purpose is to escape characters, therefore you might see some of these: \t, \n, \\ and others.
Strlen returns 3 because you have string termination at position str[3].
Single sequence of \ acts as escape character and is not part of string size. If you want to literally use \ in your string, you have to write it twice in sequence like \\, then this is single char of \ printable char.
The C compiler scans text strings as a part of compiling the source code and during the scan any special, escape sequences of characters are turned into a single character. The symbol backslash (\) is used to indicate the start of an escape sequence.
There are several formats for escape sequences. The most basic is a backslash followed by one of several special letters. These two characters are then translated into a single character. Some of these are:
'\n' is turned into a line feed character (0x0A or decimal 10)
'\t' is turned into a tab character (0x09 or decimal 9)
'\r' is turned into a carriage return character (0x0D or decimal 13)
'\\' is turned into a backslash character (0x5C)
This escape sequence idea was used back in the old days so that when a line of text was printed to a teletype machine or printer or a CRT terminal, the programmer could use these and other special command code characters to set where the next character would be printed or to cause the device to do some physical action like ring a bell or feed the paper to the next line.
The escape character also allowed you to embed a double quote (") or a single quote (') into a text string so that you could print text that contain quote marks.
In addition to the above special sequences of backslash followed by a letter there was also a way to specify any character by using a backslash followed by one up to three octal digits (0 through 7). So you could specify a line feed character by either using '\n' or you could use '\12' where 12 is the octal representation of the hexadecimal value A or the decimal value 10.
Then the ability to use a hexadecimal escape sequence was introduced with the backslash followed by the letter x followed by one or more hexadecimal digits. So you can write a line feed character with '\n' or '\12' or '\xa'.
See also Escape sequences in C in Wikipedia.

Concatenating two characters to create escape sequence

Idea is to try to create escape sequences 'human' way. For example, I use two characters to create '\n', the '\' and 'n'.
What I'm thinking about is char array[3]={'\\','n','\0'};
so I can change 'n' character and still use it as an escape sequence.
When I printf(array) it now prints:
\n
and I'd like it to go to next line.
For example:
what if I wanted to check manually what every letter in alphabet does when used as escape sequence with a loop?
for(char='a';char<='z';char++)
{
/* create escape sequence with that letter */
/* print that escape sequence and see what it does */
}
It's not an assignment,has no practical use (at least not yet), but just a theoretical question that I couldn't find answer anywhere, nor figure it out myself.
The escape sequence represents a single character and is evaluated at compile time. You cannot have a literal string interpreted as an escape sequence at run time.
For example '\n' is a newline (or line-feed character - 0x0A in ASCII)
Note that:
char array[3]={'\\','n','\0'};
is equivalent to:
char array[3] = "\\n" ;
so perhaps unsurprisingly when you printf(array) it prints \n - that is what you have asked it to do.
Undefined escape sequences simply won't compile, so you might simply:
char = '\a' ;
char = '\b' ;
... // etc.
and see which lines the compiler baulks at. However that is not the complete story because some escape sequences require operands, for example \x on its own has no meaning, whereas \xab is the character represented by 0xab (171 decimal). Others are not even letters. Most are related to white-space, and their effect may be dependent on the terminal or console capabilities of the execution platform. So a naive investigation may not generate the results you seek, because it does not account for the language semantics or platform capabilities.
All supported escape sequences are in fact well defined - you'll find few surprises except perhaps those related to platform capabilities (for example if your target has no means to generate a beep, \a will have no useful effect):
\a Beep
\b Backspace
\f Form-feed
\n Newline
\r Carriage return
\t Horizontal tab
\v Vertical tab
\\ Backslash
\' Single quotation mark
\" Double quotation mark
\0 ASCII 0x00 (null terminator)
\ooo Octal representation
\xdd Hexadecimal representation
What about writing your own printf()?
Where you can check for a '\' followed by a 'n' and than only print from char[0] to '\''n'. Finally add "printf("\n");
mfg

Resources