In C one can do this
printf("%c", *("hello there"+7));
which prints h
How can a read-only string literal like "hello there" be used almost like a pointer? How does this work exactly?
Using 'anonymous' string literals can be fun.
It's common to express dates with the appropriate ordinal suffix. (Eg "1st of May" or "25th of December".)
The following 'collapses' the 'Day of Month' value (1-31) down to values 0-3, then uses that value to index into a "segmented" string literal. This works!
// Folding DoM 'down' to use a compact array of suffixes.
i = DoM;
if( i > 20 ) i %= 10; // Change 21-99 to 0-9.
if( i > 3 ) i = 0; // Every other one ends with "th"
// 0 1 2 3
suffix = &"th\0st\0nd\0rd"[ i * 3 ]; // Acknowledge 3byte regions.
A string literal is a character array (char[]) and is thus implicitly cast to a char pointer (char *) to the first element of the array.
Thus, in the example in the question ("hello there"+7), 7 is added to a pointer to the first character (h) giving a pointer to the 7th character (counting zero based) which also happens to be a h (the "h" in "there").
Notice that the pointer is to char, not const char. However, it is important to know that writing at the location pointed at by a string literal is undefined behavior which means that each compiler implementation is free to define its own behavior in that case. Depending on the compiler implementation, it may be impossible (the string literal may be stored in read-only memory), it may have unforeseen side-effects, it may change the string string literal without any side-effects or ... basically anything.
It is allowed for two identical or overlapping string literals such as "hello there" and "there" to share the same memory location. Hence, the following expressions may be either true or false depending on the compiler implementation:
"hello" == "hello"
"hello there" + 6 == "there"
While you know how it is stored you will understand.
String constants are stored in .rodata section, seperate from code which stored in .text section. So when the program is running, it need to know the address of those string constants when using them, and, length of strings and arrays are not all the same, there is no simple way to get them (integer and float can be stored in and passed by register), thus "strings are visited thought pointer same as arrays".
Actually values not able to be hard encodered in instructions are all stored in sections such as .data and .rodata.
Related
This question already has answers here:
what should strlen() really return in this code?
(4 answers)
Closed 6 years ago.
The C code:
char c = 'a';
char *p = &c;
printf("%lu\n",strlen(p));
And I get a result 7 and I have no idea how this 7 come out.
The variable p points to a single character, not to a null terminated string. So when you call strlen on it, it attempts to access whatever memory is after c. This invokes undefined behavior.
What's happening in this particular case is that after a in memory there are six non-zero bytes followed by one zero byte, so you get 7. You can't however depend on this behavior. For example, if you add more local variables before and after a, even unused ones, you'll probably get a different result.
Remember that strings in C are really called null-terminated byte strings. All strings are terminated with a single '\0' character, meaning that a single-character string actually is two characters: The single character plus the terminator.
When you have the pointer pointing to c in your code, you don't have two characters, only the single characters contained in c. You don't know if there is a terminator after that character in memory, so when strlen looks for that terminator it will pass the character and go out into memory not belonging to any string to look for it, and you will have undefined behavior.
To try an illustrate what you have, take a look at this "graphical" representation:
+---+ +-----+----------------------
| p | --> | 'a' | indeterminate data...
+---+ +-----+----------------------
That's basically how it looks like in memory. The variable p points to the location where your character is stored, but after that in memory is just indeterminate data. This will be seemingly random, and you can not tell where there will be a byte corresponding to a string terminator character.
There's no way to say why strlen get the value 7 from, except that it finds six non-terminator bytes in the indeterminate data after your character. Next time you run it, or if you run it on a different system, you might get a completely different result.
Because strlen finds first null character starting from passed address. You pass address of only character, so strlen tries to look forward in memory until first null char in memory after variable c.
Anyway, you cannot be sure about result and it depends on compiler and all code you wrote. Moreover, you program can even fail with memory exception.
I am applying printf and/or other functions to a certain string of characters, read from a file. I want to skip the first 5 characters under certain conditions. Now I thought to be clever by, if the conditions apply, increasing the string pointer by 5:
if (strlen(nav_code) == 10 ) {nav_code = 5+nav_code;}
but the compiler refuses this:
error: assignment to expression with array type
What have I misunderstood? How to make my idea work - or is it a bad idea anyway?
It's probably becuase nav_code is not a pointer but a character array like char nav_code[50]. Try the following:
char nav_code[50];
char *nav_code_ptr = nav_code;
if (strlen(nav_code_ptr) == 10 ) {nav_code_ptr += 5;}
// forth on, use nav_code_ptr instead of nav_code
I am applying printf and/or other functions to a certain string of characters, read from a file. I want to skip the first 5 characters under certain conditions.
If printf is all what you need, then sure you can skip the first 5 characters.
Given nav_code is string (either char array or char pointer), then:
printf( "%s", nav_code + 5 ); // skip the first 5 characters
Of course you need to make sure your string has more than 5 characters, otherwise it's flat out illegal as out-of-bound access.
In your code, nav_code is an array and arrays cannot be assigned.
Instead, use a pointer, initialize that with the address of the first element of the array, make pointer arithmetic on that pointer and store the updated result back to the pointer.
I am studying now C with "C Programming Absolute Beginner's Guide" (3rd Edition) and there was written that all character arrays should have a size equal to the string length + 1 (which is string-termination zero length). But this code:
#include <stdio.h>
main()
{
char name[4] = "Givi";
printf("%s\n",name);
return 0;
}
outputs Givi and not Giv. Array size is 4 and in that case it should output Giv, because 4 (string length) + 1 (string-termination zero character length) = 5, and the character array size is only 4.
Why does my code output Givi and not Giv?
I am using MinGW 4.9.2 SEH for compilation.
You are hitting what is considered to be undefined behavior. It's working now, but due to chance, not correctness.
In your case, it's because the memory in your program is probably all zeroed out at the beginning. So even though your string is not terminated properly, it just so happens that the memory right after it is zero, so printf knows when to stop.
+-----------------------+
|G|i|v|i|\0|\0|... |
+-----------------------+
| your | rest of |
| stuff | memory (stack)|
+-----------------------+
Other languages, such as Java, have safeguards against this sort of situations. Languages like C, however, do less hand holding, which, on the one hand, allows more flexibility, but on the other, give you much, much more ways to shoot you in the foot with subtle issues such as this one. In other words, if your code compiles, that doesn't mean it's correct and it won't blow up now, in 5 minutes or in 5 years.
In real life, this is almost never the case, and your string might end up getting stored next to other things, which would always end up getting printed out together with your string. You never want this. Situations like this might lead to crashes, exploits and leaked confidential information.
See the following diagram for an example. Imagine you're working on a web server and the string "secret"--a user's password or key is stored right next to your harmless string:
+-----------------------+
|G|i|v|i|s|e|c|r|e|t |
+-----------------------+
| your | rest of |
| stuff | memory (stack)|
+-----------------------+
Every time you would output what you would think is "Givi", you'd end up printing out the secret string, which is not what you want.
The byte after the last character always has to be 0, otherwise printf would not know when the string is terminanted and would try to access bytes (or chars) while they are not 0.
As Andrei said, apparently it just happened, that the compiler put at least one byte with the value 0 after your string data, so printf recognized the end of the string.
This can vary from compiler to compiler and thus is undefined behaviour.
There could, for instance, be a chance to have printf accessing an address, which your program is not allowed to. This would result in a crash.
In C text strings are stored as zero terminated arrays of characters. This means that the end of a text string is indicated by a special character, a numeric value of zero (0), to indicate the end of the string.
So the array of text characters to be used to store a C text string must include an array element for each of the characters as well as an additional array element for the end of string.
All of the C text string functions (strcpy(), strcmp(), strcat(), etc.) all expect that the end of a text string is indicated by a value of zero. This includes the printf() family of functions that print or output text to the screen or to a file. Since these functions depend on seeing a zero value to terminate the string, one source of errors when using C text strings is copying too many characters due to a missing zero terminator or copying a long text string into a smaller buffer. This type of error is known as a buffer overflow error.
The C compiler will perform some types of adjustments for you automatically. For instance:
char *pText = "four"; // pointer to a text string constant, compiler automatically adds zero to an additional array element for the constant "four"
char text[] = "four"; // compiler creates a array with 5 elements and puts the characters four in the first four array elements, a value of 0 in the fifth
char text[5] = "four"; // programmer creates array of 5 elements, compiler puts the characters four in the first four array elements, a value of 0 in the fifth
In the example you provided a good C compiler should issue at the minimum a warning and probably an error. However it looks like your compiler is truncating the string to the array size and is not adding the additional zero string terminator. And you are getting lucky in that there is a zero value after the end of the string. I suppose there is also the possibility that the C compiler is adding an additional array element anyway but that would seem unlikely.
What your book states is basically right, but there is missing the phrase "at least". The array can very well be larger.
You already stated the reason for the min length requirement. So what does that tell you about the example? It is crap!
What it exhibits is called undefined behaviour (UB) and might result in daemons flying out your nose for the printf() - not the initializer. It is just not covered by the C standard (well ,the standard actually says this is UB), so the compiler (and your libraries) are not expected to behave correctly.
For such cases, no terminator will be appended explicitly, so the string is not properly terminated when passed to `printf()".
Reason this does not produce an error is likely some legacy code which did exploit this to safe some bytes of memory. So, instead of reporting an error that the implicit trailing '\0' terminator does not fit, it simply does not append it. Silently truncating the string literal would also be a bad idea.
The following line:
char name[4] = "Givi";
May give warning like:
string for array of chars is too long
Because the behavior is Undefined, still compiler may pass it. But if you debug, you will see:
name[0] 'G'
name[1] 'i'
name[2] 'V'
name[3] '\0'
And so the output is
Giv
Not Give as you mentioned in the question!
I'm using GCC compiler.
But if you write something like this:
char name[4] = "Giv";
Compiles fine! And output is
Giv
I construct an array with:
char *state[] = {"California", "Oregon", "Texas"};
I want to get the length of California which should be 10 but when I do sizeof(state[0]), it just gives me 8 ( I think this means 8 bytes since the size of a char is 1 byte). But why 8 though instead of 10? I'm still able to print out each chars of California by looping through state[0][i].
I'm new to C, can someone please explain this to me?
The simplest explanation is that sizeof is a compile-time evaluated expression. Therefore it knows nothing about the length of a string which is essentially something that needs to be evaluated at run-time.
To get the length of a string, use strlen. That returns the length of a string not including the implicit null-terminator that tells the C runtime where the end of the string is.
One other thing, it's a good habit to get into using const char* [] when setting up a string array. This reinforces the fact that it's undefined behaviour to modify any of the array contents.
Hi everybody and thanks in advance for any help, this is the situation:
#define N 12
[..]
char vect[N][2];
char strng[2];
[..]
vect[i][0]=strng[2]; //this two lines are in a simple for cycle
vect[i][2]=strng[0];
Now, if in string[2] I have "c 2", what I expect in vect[i][0] is '2' and in vect[i][1] 'c'.
I use code::blocks and watching vect I have instead "2#", but it could be "2À" as well.
Can you help me? Where am I wrong?
Array indexes goes from zero up to the size minus one. So using e.g. strng[2] you access the third entry in the two-entry array. Accessing an array out of bounds leads to undefined behavior and the data will be indeterminate.
You should also remember that all strings in C are one more character than reported by e.g. strlen, and that extra character is a special terminator character. So if you want a two-character string, you really need three characters: Two for the string, and one for the terminator.
Rewrite these statements
vect[i][0]=strng[2]; //this two lines are in a simple for cycle
vect[i][2]=strng[0];
the following way
vect[i][0]=strng[1]; //this two lines are in a simple for cycle
vect[i][1]=strng[0];
provided that string contains two characters { 'c', '2' }.
Take into account that array string can not have string literal "c 2", because you defined it as
char strng[2];
that is it can contain only two characters.
If you want that the array would contain indeed "c 2" then you have to define it either as
char strng[3];
or as
char strng[4];
if you want to include the terminating zero.
In this case you may write
vect[i][0]=strng[2]; //this two lines are in a simple for cycle
vect[i][1]=strng[0];
Assuming strng literally contains "c 2", then your memory is the issue. strng[2] contains 3 cells iirc. 2 for holding chars and then a null terminator (ie \0). so when you try to access strng[2], (which you cant because you can only go to N-1 cells, where N is the number allocated for it) it contains undefined results, since it isnt null terminated and you are reaching beyond memory you allocated