Why *strptr = 0 truncate the string?(C)
Why the ascii value 30 of 0 become 0 of null?
Here I'm confused with the number 0, string 0 and the string termianl 0.
your explaination will be appreciated.
More precisely, there are three lexical elements that contain a zero character: 0 (unquoted), '0' (quoted, typically (but not always) equal to 48 or 0x30 unquoted) and '\0' (equal to 0, but in character notation).
The question is talking about two distinct values...'0' != '\0'. Forget about 30, 48, etc. Just remember '0' and '\0' are different characters, and '\0' is a string terminator that has a value of 0...
I think you meant to use '0' (emphasis on the quotation marks).
All standard library string routines treat character '\0' as string terminator, so if you put it at the beginning of the string - they all see no data to process, because first character is a terminator so effectively string is empty. And yes, per standard '\0' is a character that has value 0. As result: '\0' == 0 is true.
Related
I have a simple question that I couldn't find the answer for by googling -- is it possible to escape the C language's string null terminator '\0', so that we can include it in a string?
Note that a "string" with an embedded NUL is no longer a string. You cannot safely use it as an argument to functions declared in <string.h>, for example
char embeddednul[] = "zero\0one\0two\0"; // embeddednul[12] = embeddednul[13] = 0
printf("len: %d\n", strlen(embeddednul)); // 4??
char tmp[1000] = {0};
strcpy(tmp, embeddednul); // copies 'z', 'e', 'r', 'o', and 0
char *p = embeddednul;
while (*p) {
while (*p) putchar(*p++); // prints zero
putchar('\n'); // then one
p++; // then two
}
Use the \ to escape the \ like so
printf("\\0");
You can embed a null character in a string.
But there is no way to escape the null character.
To "escape" something means to remove its usual or its special interpretation, or to give it some other interpretation.
In C string and character constants, the backslash character \ gives a special meaning to the character following it. For example, n is normally an ordinary letter, but \n is the newline character. 0 is normally an ordinary digit character, but \0 is the character with the numeric value 0, and is the NUL or end-of-string character.
So if you write something like
char *twostrings = "two\0strings";
you successfully embed a null character into the string. But you haven't "escaped the null character" — you have in fact escaped the zero, to turn it into a null character.
Now, what if you wanted to "escape the null character" — that is, to remove its special meaning as the end-of string terminator? And the answer is, no, there is no way to do that. A null character is always treated as the end of a string, by any normal C function that deals with strings. Given the twostrings variable initialized as above, if you write
printf("%zu\n", strlen(twostrings));
you're going to get 3, because strlen stops at the first null character it finds. If you write
char onestring[10];
strcpy(onestring, twostrings);
printf("%s\n", onestring);
you're going to get "one", because strcpy stops at the first null character it finds. You'll get the same thing if you write
printf("%s\n", twostrings);
If you tried to "escape the null character" by writing
twostrings = "two\\0strings";
what would actually happen is that you would escape the backslash, removing its special meaning. You'd get an actual backslash character and an actual 0 character in the string, with no extra null character at all.
See also this question. See also this question and particularly the second part of this answer.
What is the difference between the '\0' character and the '\n' character in the C programming language?
'\0' is a NULL character (ASCII 0), which is actually a string terminator too... (C strings are NULL-terminated): if you have a string like "this is\0a string", the part after the '\0' will be ignored (even if it will actually be inside the generated code).
'\n' is a newline (ASCII 10). It is noteworthy that in some circumstances, this newline character can actually be transformed. For example, on Windows, where the newline in files is indicated by the "\r\n" sequence (two bytes: ASCII 13, carriage return, followed by ASCII 10, line feed), if you write to a file (e.g. using fprintf()) a string containing a '\n' character, it will be automatically converted to a "\r\n" sequence if the file is open in ASCII mode (which is generally the default).
'\0' is a null: this terminates a string. '\n' is a newline
'\0' is a NULL character, which indicates the end of a string in C. (printf("%s") will stop printing at the first occurence of \0 in the string.
'\n' is a newline, which will simply make the text continue on the next line when printing.
\0 is the null byte, used to terminate strings.
\n is the newline character, 10 in ASCII, used (on Unix) to separate lines.
'\0' is a character constant that is written as octal-escape-sequence. Its value is 0. It is not the same as '0'. The last has value 48 in ASCII or 240 in EBCDIC
'\n' is a character constant that is written as simple-escape-sequence and denotes the new line character. Its value is equal to 10.
#include <stdio.h>
#include <string.h>
main()
{
printf("%d \n ",sizeof(' '));
printf("%d ",sizeof(""));
}
output:
4
1
Why o/p is coming 4 for 1st printf and moreover if i am giving it as '' it is showing error as error: empty character constant but for double quote blank i.e. without any space is fine no error?
The ' ' is example of integer character constant, which has type int (it's not converted, it has such type). Second is "" character literal, which contains only one character i.e. null character and since sizeof(char) is guaranteed to be 1, the size of whole array is 1 as well.
' ' is converted to an integer character constant(hence 4 bytes on your machine), "" is empty character array, which is still 1 byte('\0') terminated.
Here in below check the difference
#include<stdio.h>
int main()
{
char a= 'b';
printf("%d %d %d", sizeof(a),sizeof('b'), sizeof("a"));
return 0;
}
here a is defined as character whose data type size is 1 byte.
But 'b' is character constant. A character constant is an integer,The value of a character constant is the numeric value of the character in the machine's character set. sizeof char constant is nothing but int which is 4 byte
this is string literals "a" ---> array character whose size is number of character + \0 (NULL). Here its 2
This is answered in Size of character ('a') in C/C++
In C, the type of a character constant like 'a' is actually an int, with size of 4 (or some other implementation-dependent value). In C++, the type is char, with size of 1. This is one of many small differences between the two languages.
The 'space', or 'any single character', is actually of type integer, equal to the ASCII value of that character. So it's size will be 4 bytes.
If you create a character variable and store a character in it, then only it is stored in 1 byte memory.
char ch;
ch=' ';
printf("%d",sizeof(ch));
//outputs 1
For anything to be a string, it must be terminated with a null character represented as '\0'.
If we write a string "hello", it is actually stored as 'h' 'e' 'l' 'l' 'o' '\0', so that the system knows string ends after the 'o' in "hello" and it stops reading when null character comes. The length of this string is still 5 if you use strlen() function but actually the sizeof(string) is 6 bytes.
When we create an empty string, like "", it's length is 0 but size is 1 byte as it must terminate where it starts, i.e. at 0th character.
Hence an empty string consists of only one character, that is null character, giving size 1 byte.
From C Traps and Pitfalls
Single and double quotes mean very different things in C.
A Character enclosed in single quotes is just a another way of writing the integer that corresponds to the given character in ASCII implementation. Thus ' ' means exactly same thing as 32.
On the other hand, A string enclosed in double quotes is a short-hand way of writing a pointer to the initial character of a nameless array that has been initialized with the characters between the quotes and an extra character whose binary value is zero. Thus writing "" that is empty string still has '\0' character whose size is one.
because of in 1st case there is a character that's why sizeof operator is take the SACII value of character and it's take as an integer so in 1st case it will give you 4.
in 2nd case sizeof operator take as a string and in string there is no data means it's understood NULL string , so NULL string size is 1, that's why it will give you answer as a 1.
When I came across this C language implementation of Porters Stemming algorithm I found a C-ism I was confused about.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void test( char *s )
{
int len = s[0];
printf("len= %i\n", len );
printf("s[len] = %c\n", s[len] );
}
int main()
{
test("\07" "abcdefg");
return 0;
}
and output:
len = 7
s[len] = g
However, when I input
test("\08" "abcdefgh");
or any string constant that is longer than 7 with the corresponding length in the first pair of parenthesis ( i.e. test("\09" "abcdefghi"); the output is
len = 0
s[len] =
But any input like test("\01" "abcdefgh"); prints out the character in that position ( if we call the first character position 1 and not 0 for the moment )
It appears if test( char *s ) reads the number in the first pair of parenthesis ( how it does this I am not sure since I thought s[0] would be able to only read a single char, i.e. the '\' ) and prints the last character at that index + 1 of the string constant in the second pair of parenthesis.
My question is this: It seems as if we are passing two string constants into test( char *s ). What exactly is happening here, meaning, how does the compiler seem to "split" up the string over two pairs of parenthesis? Another question one might have is, is a string of the form "blah" "abcdefg" one consecutive block of memory? It may be the case that I have overlooked something elementary, but even so I would like to know what I overlooked. I know this is a basic concept but I could not find a clear example or situation on the web that explains this and in all honesty I don't follow the output. Any helpful comments are welcomed.
There are at least three things going on here:
Literal strings juxtaposed against one another are concatenated by the compiler. "a" "b" is exactly the same as "ab".
The backslash is an escape character, which means it is not copied literally into the resulting string. The notation \01 means "the character with ASCII value 1".
The notation \0... means an octal character constant. Octal numbers are base 8, made up from digits that range from 0 through 7 inclusive. 8 is not a valid octal constant, so "\08" does not follow "\07".
The problem is not in the length of the string, but in the \o syntax for specifying non-printable values in string literals. \o, \oo, and \ooo denote octal constants, i.e. a single character whose value is written in base 8. Since 08 in \08 doesn't represent a valid base 8 number, it is interpreted as \0 followed by the ASCII character 8.
To fix the problem, represent 8 as \10 or \010:
test("\007" "abcdefg");
test("\010" "abcdefgh");
...or switch to hexadecimal, where the \x prefix makes the base more explicit to the casual reader:
test("\x07" "abcdefg");
test("\x08" "abcdefgh");
test("\x09" "abcdefghi");
test("\x0a" "abcdefghij");
...
\number in a character or string literal is means the character whose code is the value number. number is interpreted in octal, so the first non-octal digit terminates the number. So "\07" is a one-character string containing the character with code 7, but \08 is a two-character string containing the character with code 0 followed by the digit 8.
Additionally, code 0 the null terminator that's used in C to indicate the end of the string. So that second string ends at the beginning, because its first byte is the terminator. This why the length of the string in your second example is 0.
When two or more string literals are adjacent (separated only by white-space), the compiler will join them into a single string. Therefore "\07" "abcdefg" is equivalent to "\07abcdefg".
"\07" is an octal escape. An octal escape ends after three digits or with first non-octal character. So, when you enter "\08", 8 is a non octal character therefore escape ends and 0 is stored at s[0].
Now, len is 0 and printing s[len] will try to print the character at s[0] which has a non printable ASCII code (Only character above ASCII value above 32 are printable).
Is it right to say that the null terminating C string is automatically added by the compiler in general?
So in the following example:
char * str = "0124";
printf("%x", str[str[3] - str[2] + str[4]]);
the output is always 32?
Thanks.
First question: yes
Second question: yes on a ASCII system: you calculate '4' - '2' + '\0' which is in integers: 0x34 - 0x32 + 0 = 2 so you get str[2] which is '2' which is 0x32.
'4' - '2' to be 2 is defined in C, but if you ran your code on an EBCDIC system, '2' was 0xf2
Yes, the compiler does add the null terminator. Thus there is 5 bytes of memory allocated to str off the stack.
By the looks of it, with that string literal, (str[3] - str[2] + str[4]) evaluates to (52 - 50 + 0), so you are acessing str[2], which will print 0x32 in hex.
The terminating null character is added by the compiler; 6.4.5p6:
6 - In translation phase 7, a byte or code of value zero is appended to each multibyte
character sequence that results from a string literal or literals. The multibyte character
sequence is then used to initialize an array of static storage duration and length just
sufficient to contain the sequence. [...]
The printf output will be the character code of the 2 character on your system. The characters 0 to 9 are guaranteed to have contiguous codes (5.2.1p3), but not to have any particular value.