How to write ascii symbol into char table? - c

Let's assume that I've got such a table:
char *table = "abcdef\n";
How can I put '\n' as decimal subsituide (eg. in hex I put \x0D, in oct I put \015, but what about decimal?)

'/n' is not a character control, but a multi-character constant, so I assume you're talking about '\n' instead.
There is no escape notation in decimal, so simply use \n, the octal (\012 or \12) or the hexadecimal (\x0a) escape notation.
Learn more about escape sequences.

Or you can define one Macro #define NEW_LINE "\n".
then use char *table = "abcdef"NEW_LINE;
And if you cout <<table; out put will be:
abcdef
_
It means the value of table is "abcdef\n"

Related

sizeof(string) not including a "\" sign

I've been going through strlen and sizeof for strings (char arrays) and I don't quite get one thing.
I have the following code:
int main() {
char str[]="gdb\0eahr";
printf("sizeof=%u\n",sizeof(str));
printf("strlen=%u\n",strlen(str));
return 0;
}
the output of the code is:
sizeof=9
strlen=3
At first I was pretty sure that 2 separate characters \ followed by 0 wouldn't actually act as a NUL (\0) but I managed to figure that it does.
The thing is that I have no idea why sizeof shows 9 and not 10.
Since sizeof counts the amount of used bytes by the data type why doesn't it count the byte for the \?
In a following example:
char str[]="abc";
printf("sizeof=%u\n",sizeof(str));
that would print out "4" because of the NUL value terminating the array so why is \ being not counted?
In a character or string constant, the \ character marks the beginning of an escape sequence, used to represent character values for which there isn't a symbol in the source character set. For example, the escape sequence \n represents the newline character, \b represents the backspace character, \0 represents the zero-valued character (which is also the string terminator), etc.
In the string literal "gdb\0eahr", the escape sequence \0 maps to a single 0-valued character; the actual contents of str are {'g', 'd', 'b', 0, 'e', 'a', 'h', 'r', 0}.
It seems you already have the answer:
At first I was pretty sure that 2 separate characters "\" followed by "0" wouldn't actually act as a NULL "\0" but I managed to figure that it does.
The sequence \0 is an octal escape sequence for the byte 0. So while there are two characters in the code to denote this, it translated to a single byte in the string.
So you have 7 alphabetic characters, a null byte in the middle, and a null byte at the end. That's 9 bytes.
Why should char str[]="gdb\0eahr"; be 10 bytes with sizeof operator? It is 9 bytes because there are 8 string elements + trailing zero.
\0 is only 1 character, not 2. \'s purpose is to escape characters, therefore you might see some of these: \t, \n, \\ and others.
Strlen returns 3 because you have string termination at position str[3].
Single sequence of \ acts as escape character and is not part of string size. If you want to literally use \ in your string, you have to write it twice in sequence like \\, then this is single char of \ printable char.
The C compiler scans text strings as a part of compiling the source code and during the scan any special, escape sequences of characters are turned into a single character. The symbol backslash (\) is used to indicate the start of an escape sequence.
There are several formats for escape sequences. The most basic is a backslash followed by one of several special letters. These two characters are then translated into a single character. Some of these are:
'\n' is turned into a line feed character (0x0A or decimal 10)
'\t' is turned into a tab character (0x09 or decimal 9)
'\r' is turned into a carriage return character (0x0D or decimal 13)
'\\' is turned into a backslash character (0x5C)
This escape sequence idea was used back in the old days so that when a line of text was printed to a teletype machine or printer or a CRT terminal, the programmer could use these and other special command code characters to set where the next character would be printed or to cause the device to do some physical action like ring a bell or feed the paper to the next line.
The escape character also allowed you to embed a double quote (") or a single quote (') into a text string so that you could print text that contain quote marks.
In addition to the above special sequences of backslash followed by a letter there was also a way to specify any character by using a backslash followed by one up to three octal digits (0 through 7). So you could specify a line feed character by either using '\n' or you could use '\12' where 12 is the octal representation of the hexadecimal value A or the decimal value 10.
Then the ability to use a hexadecimal escape sequence was introduced with the backslash followed by the letter x followed by one or more hexadecimal digits. So you can write a line feed character with '\n' or '\12' or '\xa'.
See also Escape sequences in C in Wikipedia.

Hex value in C string

I need to prepare constant array of ANSI C strings that contains bytes from range of 0x01 to 0x1a. I made custom codepage, so those values represents different characters (i.e. 0x09 represents Š). I'd like to initialise the array in that way:
static const char* brands[] = {
"Škoda",
//etc...
};
How can I put 0x09 instead of Š in "Škoda"?
Recommend not using "\x09koda", use octal or spaced strings.
The problem is that hexadecimal escape sequences are not limited in their length. So if the next char is a hexadecimal character, problems occur. Use octal, which is limited to 3. Or use separated strings. The compiler will concatenate then, but the escape sequence will not accidentally run too far.
// problematic
"\x09Czech"
^^^^^--- The escape sequence is \x09C, but \0x09 was hoped for
// recommend octal
"\0111234"
^^^^--- The escape sequence is \011
// recommend spaced strings
"\x09" "Czech"
Very simple
"Škoda" -> "\x09koda"
How can I put 0x09 instead of Š in "Škoda"?
"\x09koda"
Have a look at escape squences - i.e \x09
For hex escape you want to use the \Xnnn, for octal just \nnn and for unicode \Unnnn

Confused about C string constants

When I came across this C language implementation of Porters Stemming algorithm I found a C-ism I was confused about.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void test( char *s )
{
int len = s[0];
printf("len= %i\n", len );
printf("s[len] = %c\n", s[len] );
}
int main()
{
test("\07" "abcdefg");
return 0;
}
and output:
len = 7
s[len] = g
However, when I input
test("\08" "abcdefgh");
or any string constant that is longer than 7 with the corresponding length in the first pair of parenthesis ( i.e. test("\09" "abcdefghi"); the output is
len = 0
s[len] =
But any input like test("\01" "abcdefgh"); prints out the character in that position ( if we call the first character position 1 and not 0 for the moment )
It appears if test( char *s ) reads the number in the first pair of parenthesis ( how it does this I am not sure since I thought s[0] would be able to only read a single char, i.e. the '\' ) and prints the last character at that index + 1 of the string constant in the second pair of parenthesis.
My question is this: It seems as if we are passing two string constants into test( char *s ). What exactly is happening here, meaning, how does the compiler seem to "split" up the string over two pairs of parenthesis? Another question one might have is, is a string of the form "blah" "abcdefg" one consecutive block of memory? It may be the case that I have overlooked something elementary, but even so I would like to know what I overlooked. I know this is a basic concept but I could not find a clear example or situation on the web that explains this and in all honesty I don't follow the output. Any helpful comments are welcomed.
There are at least three things going on here:
Literal strings juxtaposed against one another are concatenated by the compiler. "a" "b" is exactly the same as "ab".
The backslash is an escape character, which means it is not copied literally into the resulting string. The notation \01 means "the character with ASCII value 1".
The notation \0... means an octal character constant. Octal numbers are base 8, made up from digits that range from 0 through 7 inclusive. 8 is not a valid octal constant, so "\08" does not follow "\07".
The problem is not in the length of the string, but in the \o syntax for specifying non-printable values in string literals. \o, \oo, and \ooo denote octal constants, i.e. a single character whose value is written in base 8. Since 08 in \08 doesn't represent a valid base 8 number, it is interpreted as \0 followed by the ASCII character 8.
To fix the problem, represent 8 as \10 or \010:
test("\007" "abcdefg");
test("\010" "abcdefgh");
...or switch to hexadecimal, where the \x prefix makes the base more explicit to the casual reader:
test("\x07" "abcdefg");
test("\x08" "abcdefgh");
test("\x09" "abcdefghi");
test("\x0a" "abcdefghij");
...
\number in a character or string literal is means the character whose code is the value number. number is interpreted in octal, so the first non-octal digit terminates the number. So "\07" is a one-character string containing the character with code 7, but \08 is a two-character string containing the character with code 0 followed by the digit 8.
Additionally, code 0 the null terminator that's used in C to indicate the end of the string. So that second string ends at the beginning, because its first byte is the terminator. This why the length of the string in your second example is 0.
When two or more string literals are adjacent (separated only by white-space), the compiler will join them into a single string. Therefore "\07" "abcdefg" is equivalent to "\07abcdefg".
"\07" is an octal escape. An octal escape ends after three digits or with first non-octal character. So, when you enter "\08", 8 is a non octal character therefore escape ends and 0 is stored at s[0].
Now, len is 0 and printing s[len] will try to print the character at s[0] which has a non printable ASCII code (Only character above ASCII value above 32 are printable).

Concatenating two characters to create escape sequence

Idea is to try to create escape sequences 'human' way. For example, I use two characters to create '\n', the '\' and 'n'.
What I'm thinking about is char array[3]={'\\','n','\0'};
so I can change 'n' character and still use it as an escape sequence.
When I printf(array) it now prints:
\n
and I'd like it to go to next line.
For example:
what if I wanted to check manually what every letter in alphabet does when used as escape sequence with a loop?
for(char='a';char<='z';char++)
{
/* create escape sequence with that letter */
/* print that escape sequence and see what it does */
}
It's not an assignment,has no practical use (at least not yet), but just a theoretical question that I couldn't find answer anywhere, nor figure it out myself.
The escape sequence represents a single character and is evaluated at compile time. You cannot have a literal string interpreted as an escape sequence at run time.
For example '\n' is a newline (or line-feed character - 0x0A in ASCII)
Note that:
char array[3]={'\\','n','\0'};
is equivalent to:
char array[3] = "\\n" ;
so perhaps unsurprisingly when you printf(array) it prints \n - that is what you have asked it to do.
Undefined escape sequences simply won't compile, so you might simply:
char = '\a' ;
char = '\b' ;
... // etc.
and see which lines the compiler baulks at. However that is not the complete story because some escape sequences require operands, for example \x on its own has no meaning, whereas \xab is the character represented by 0xab (171 decimal). Others are not even letters. Most are related to white-space, and their effect may be dependent on the terminal or console capabilities of the execution platform. So a naive investigation may not generate the results you seek, because it does not account for the language semantics or platform capabilities.
All supported escape sequences are in fact well defined - you'll find few surprises except perhaps those related to platform capabilities (for example if your target has no means to generate a beep, \a will have no useful effect):
\a Beep
\b Backspace
\f Form-feed
\n Newline
\r Carriage return
\t Horizontal tab
\v Vertical tab
\\ Backslash
\' Single quotation mark
\" Double quotation mark
\0 ASCII 0x00 (null terminator)
\ooo Octal representation
\xdd Hexadecimal representation
What about writing your own printf()?
Where you can check for a '\' followed by a 'n' and than only print from char[0] to '\''n'. Finally add "printf("\n");
mfg

character constant:\000 \xhh

Can anyone please explain the usage of the character constant \000 and \xhh ie octal numbers and hexadecimal numbers in a character constant?
In C, strings are terminated by a character with the value zero (0). This could be written like this:
char zero = 0;
but this doesn't work inside strings. There is a special syntax used in string literals, where the backslash works as an escape sequence introduction, and is followed by various things.
One such sequence is "backslash zero", that simply means a character with the value zero. Thus, you can write things like this:
char hard[] = "this\0has embedded\0zero\0characters";
Another sequence uses a backslash followed by the letter 'x' and one or two hexadecimal digits, to represent the character with the indicated code. Using this syntax, you could write the zero byte as '\x0' for instance.
EDIT: Re-reading the question, there's also support for such constants in base eight, i.e. octal. They use a backslash followed by the digit zero, just as octal literal integer constants. '\00' is thus a synonym for '\0'.
This is sometimes useful when you need to construct a string containing non-printing characters, or special control characters.
There's also a set of one-character "named" special characters, such as '\n' for newline, '\t' for TAB, and so on.
Those would be used to write otherwise nonprintable characters in the editor. For standard chars, that would be the various control characters, for wchar it could be characters not represented in the editor font.
For instance, this compiles in Visual Studio 2005:
const wchar_t bom = L'\xfffe'; /* Unicode byte-order marker */
const wchar_t hamza = L'\x0621'; /* Arabic Letter Hamza */
const char start_of_text = '\002'; /* Start-of-text */
const char end_of_text = '\003'; /* End-of-text */
Edit: Using octal character literals has an interesting caveat. Octal numbers can apparantly not be more than three digits long, which artificially restricts the characters we can enter.
For instance:
/* Letter schwa; capital unicode code point 0x018f (octal 0617)
* small unicode code point 0x0259 (octal 1131)
*/
const wchar_t Schwa2 = L'\x18f'; /* capital letter Schwa, correct */
const wchar_t Schwa1 = L'\617'; /* capital letter Schwa, correct */
const wchar_t schwa1 = L'\x259'; /* small letter schwa, correct */
const wchar_t schwa2 = L'\1131'; /* letter K (octal 113), incorrect */
Octal is base 8 (using digits 0-7) so each digit is 3 bits:
\0354 = 11 101 100
Hexadecimal is base 16 (using digits 0-9,A-F) and each digit is 4 bits:
\x23 = 0010 0011
Inside C strings (char arrays/pointers), they are generally used to encode bytes that can't be easily represented.
So, if you want a string which uses ASCII codes like STX and ETX, you can do:
char *msg = "\x02Here's my message\x03";

Resources