needed explanation for a c-program - c

In a c-book I bought, an exercise program is given as
what is the output for the following code snippet?
printf(3+"Welcome"+2);
the answer I got is me (by executing it in TC++)
But I can't get the actual mechanism.
please explain me the actual mechanism behind it.

It's called pointer arithmetic: 2+3=5, and "me" is the rest of the string starting at offset 5.
PS: throw away that book.

When this is compiled the "Welcome" string becomes a const char *, pointing to the first character of the string. In C, with character strings (like any pointer), you can do pointer arithmetic. This means pointer + 5 points to 5 places beyond pointer.
Therefore ("Welcome" + 5) will point 5 characters past the "W", to the substring "me."
On a side note, as other have suggested, this doesn't sound like a good book.

A string (like "Welcome") is an array of characters terminated by the NUL-character (so it's actually "Welcome\0").
What you are doing, is accessing the fifth character of it (3 + 2 = 5). This character is 'm' (array indices start at 0).
printf will continue to read till it hits the NUL-character.

Related

How char array behaves for longer strings?

I asked this question as one of multiple questions here. But people asked me to ask them separately. So why this question.
Consider below code lines:
char a[5] = "geeks"; //1
char a3[] = {'g','e','e','k','s'}; //d
printf("a:%s,%u\n",a,sizeof(a)); //5
printf("a3:%s,%u\n",a3,sizeof(a3)); //j
printf("a[5]:%d,%c\n",a[5],a[5]);
printf("a3[5]:%d,%c\n",a3[5],a3[5]);
Output:
a:geeksV,5
a3:geeks,5
a[5]:86,V
a3[5]:127,
However the output in original question was:
a:geeks,5
a3:geeksV,5
The question 1 in original question was:
Does line #1 adds \0? Notice that sizeof prints 5 in line #5 indicating \0 is not there. But then, how #5 does not print something like geeksU as in case of line #j? I feel \0 does indeed gets added in line #1, but is not considered in sizeof, while is considered by printf. Am I right with this?
Realizing that the output has changed (for same online compiler) when I took out only those code lines which are related to first question in original question, now I doubt whats going on here? I believe these are undefined behavior by C standard. Can someone shed more light? Possibly for another compiler?
Sorry again for asking 2nd question.
char a[5] = "geeks"; //1
Here, you specify the array's size as '5', and initialize it with 5 characters.
Therefore, you do not have a "C string", which by definition is ended by a NUL. (0).
printf("a:%s,%u\n",a,sizeof(a)); //5
The array itself still has a size of 5, which is correctly reported by the sizeof operator, but your call to printf is undefined behaviour and could print anything after the arrray's contents - it will just keep looking at the next address until it finds a 0 somewhere. That could be immediately, or it could print a 1000000 garbage characters, or it could cause some sort of segfault or other crash.
char a3[] = {'g','e','e','k','s'}; //d
Because you don't specify the array's size, the compiler will, through the initialization syntax, determine the size of the array. However, the way you chose to initialize a3, it will still only provide 5 bytes of length.
The reason for that is that your initialization just is an initialization list, and not a "string". Therefore, your subsequent call to printf also is undefined behaviour, and it is just luck that at the position a3[5] there seems to be a 0 in your case.
Effectively, both examples have the very same error.
You could have it different thus:
char a3[] = "geeks";
Using a string literal for initialization of the array with unspecified size will cause the compiler to allocate enough memory to hold the string and the additional NUL-terminator, and sizeof (a3) will now yield 6.
"geeks" here is a string literal in C.
When you define "geeks" the compiler automatically adds the NULL character to the end. This makes it 6 characters long.
But you are assigning it to char a[5]. This will cause undefined behaviour.
As mentioned by #DavidBowling, in this case the following condition applies
(Section 6.7.8.14) C99 standard.
An array of character type may be initialized by a character string literal, optionally enclosed in braces. Successive characters of the character string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array
the elements "geeks" will be copied into the array 'a' but the NULL character will not be copied.
So in this case when you try to print the array, it will continue printing until it encounters a \0 in the memory.
From the further print statements it is seen that a[5] has the value V. Presumably the next byte on your system is \0 and the array print stops.
So, in your system, at that instance, "geeksV" is printed.

Why I get a length of 7 when I apply strlen() to a pointer to a single char? [duplicate]

This question already has answers here:
what should strlen() really return in this code?
(4 answers)
Closed 6 years ago.
The C code:
char c = 'a';
char *p = &c;
printf("%lu\n",strlen(p));
And I get a result 7 and I have no idea how this 7 come out.
The variable p points to a single character, not to a null terminated string. So when you call strlen on it, it attempts to access whatever memory is after c. This invokes undefined behavior.
What's happening in this particular case is that after a in memory there are six non-zero bytes followed by one zero byte, so you get 7. You can't however depend on this behavior. For example, if you add more local variables before and after a, even unused ones, you'll probably get a different result.
Remember that strings in C are really called null-terminated byte strings. All strings are terminated with a single '\0' character, meaning that a single-character string actually is two characters: The single character plus the terminator.
When you have the pointer pointing to c in your code, you don't have two characters, only the single characters contained in c. You don't know if there is a terminator after that character in memory, so when strlen looks for that terminator it will pass the character and go out into memory not belonging to any string to look for it, and you will have undefined behavior.
To try an illustrate what you have, take a look at this "graphical" representation:
+---+ +-----+----------------------
| p | --> | 'a' | indeterminate data...
+---+ +-----+----------------------
That's basically how it looks like in memory. The variable p points to the location where your character is stored, but after that in memory is just indeterminate data. This will be seemingly random, and you can not tell where there will be a byte corresponding to a string terminator character.
There's no way to say why strlen get the value 7 from, except that it finds six non-terminator bytes in the indeterminate data after your character. Next time you run it, or if you run it on a different system, you might get a completely different result.
Because strlen finds first null character starting from passed address. You pass address of only character, so strlen tries to look forward in memory until first null char in memory after variable c.
Anyway, you cannot be sure about result and it depends on compiler and all code you wrote. Moreover, you program can even fail with memory exception.

C character array and its length

I am studying now C with "C Programming Absolute Beginner's Guide" (3rd Edition) and there was written that all character arrays should have a size equal to the string length + 1 (which is string-termination zero length). But this code:
#include <stdio.h>
main()
{
char name[4] = "Givi";
printf("%s\n",name);
return 0;
}
outputs Givi and not Giv. Array size is 4 and in that case it should output Giv, because 4 (string length) + 1 (string-termination zero character length) = 5, and the character array size is only 4.
Why does my code output Givi and not Giv?
I am using MinGW 4.9.2 SEH for compilation.
You are hitting what is considered to be undefined behavior. It's working now, but due to chance, not correctness.
In your case, it's because the memory in your program is probably all zeroed out at the beginning. So even though your string is not terminated properly, it just so happens that the memory right after it is zero, so printf knows when to stop.
+-----------------------+
|G|i|v|i|\0|\0|... |
+-----------------------+
| your | rest of |
| stuff | memory (stack)|
+-----------------------+
Other languages, such as Java, have safeguards against this sort of situations. Languages like C, however, do less hand holding, which, on the one hand, allows more flexibility, but on the other, give you much, much more ways to shoot you in the foot with subtle issues such as this one. In other words, if your code compiles, that doesn't mean it's correct and it won't blow up now, in 5 minutes or in 5 years.
In real life, this is almost never the case, and your string might end up getting stored next to other things, which would always end up getting printed out together with your string. You never want this. Situations like this might lead to crashes, exploits and leaked confidential information.
See the following diagram for an example. Imagine you're working on a web server and the string "secret"--a user's password or key is stored right next to your harmless string:
+-----------------------+
|G|i|v|i|s|e|c|r|e|t |
+-----------------------+
| your | rest of |
| stuff | memory (stack)|
+-----------------------+
Every time you would output what you would think is "Givi", you'd end up printing out the secret string, which is not what you want.
The byte after the last character always has to be 0, otherwise printf would not know when the string is terminanted and would try to access bytes (or chars) while they are not 0.
As Andrei said, apparently it just happened, that the compiler put at least one byte with the value 0 after your string data, so printf recognized the end of the string.
This can vary from compiler to compiler and thus is undefined behaviour.
There could, for instance, be a chance to have printf accessing an address, which your program is not allowed to. This would result in a crash.
In C text strings are stored as zero terminated arrays of characters. This means that the end of a text string is indicated by a special character, a numeric value of zero (0), to indicate the end of the string.
So the array of text characters to be used to store a C text string must include an array element for each of the characters as well as an additional array element for the end of string.
All of the C text string functions (strcpy(), strcmp(), strcat(), etc.) all expect that the end of a text string is indicated by a value of zero. This includes the printf() family of functions that print or output text to the screen or to a file. Since these functions depend on seeing a zero value to terminate the string, one source of errors when using C text strings is copying too many characters due to a missing zero terminator or copying a long text string into a smaller buffer. This type of error is known as a buffer overflow error.
The C compiler will perform some types of adjustments for you automatically. For instance:
char *pText = "four"; // pointer to a text string constant, compiler automatically adds zero to an additional array element for the constant "four"
char text[] = "four"; // compiler creates a array with 5 elements and puts the characters four in the first four array elements, a value of 0 in the fifth
char text[5] = "four"; // programmer creates array of 5 elements, compiler puts the characters four in the first four array elements, a value of 0 in the fifth
In the example you provided a good C compiler should issue at the minimum a warning and probably an error. However it looks like your compiler is truncating the string to the array size and is not adding the additional zero string terminator. And you are getting lucky in that there is a zero value after the end of the string. I suppose there is also the possibility that the C compiler is adding an additional array element anyway but that would seem unlikely.
What your book states is basically right, but there is missing the phrase "at least". The array can very well be larger.
You already stated the reason for the min length requirement. So what does that tell you about the example? It is crap!
What it exhibits is called undefined behaviour (UB) and might result in daemons flying out your nose for the printf() - not the initializer. It is just not covered by the C standard (well ,the standard actually says this is UB), so the compiler (and your libraries) are not expected to behave correctly.
For such cases, no terminator will be appended explicitly, so the string is not properly terminated when passed to `printf()".
Reason this does not produce an error is likely some legacy code which did exploit this to safe some bytes of memory. So, instead of reporting an error that the implicit trailing '\0' terminator does not fit, it simply does not append it. Silently truncating the string literal would also be a bad idea.
The following line:
char name[4] = "Givi";
May give warning like:
string for array of chars is too long
Because the behavior is Undefined, still compiler may pass it. But if you debug, you will see:
name[0] 'G'
name[1] 'i'
name[2] 'V'
name[3] '\0'
And so the output is
Giv
Not Give as you mentioned in the question!
I'm using GCC compiler.
But if you write something like this:
char name[4] = "Giv";
Compiles fine! And output is
Giv

C language, char=char with unexpected results

Hi everybody and thanks in advance for any help, this is the situation:
#define N 12
[..]
char vect[N][2];
char strng[2];
[..]
vect[i][0]=strng[2]; //this two lines are in a simple for cycle
vect[i][2]=strng[0];
Now, if in string[2] I have "c 2", what I expect in vect[i][0] is '2' and in vect[i][1] 'c'.
I use code::blocks and watching vect I have instead "2#", but it could be "2À" as well.
Can you help me? Where am I wrong?
Array indexes goes from zero up to the size minus one. So using e.g. strng[2] you access the third entry in the two-entry array. Accessing an array out of bounds leads to undefined behavior and the data will be indeterminate.
You should also remember that all strings in C are one more character than reported by e.g. strlen, and that extra character is a special terminator character. So if you want a two-character string, you really need three characters: Two for the string, and one for the terminator.
Rewrite these statements
vect[i][0]=strng[2]; //this two lines are in a simple for cycle
vect[i][2]=strng[0];
the following way
vect[i][0]=strng[1]; //this two lines are in a simple for cycle
vect[i][1]=strng[0];
provided that string contains two characters { 'c', '2' }.
Take into account that array string can not have string literal "c 2", because you defined it as
char strng[2];
that is it can contain only two characters.
If you want that the array would contain indeed "c 2" then you have to define it either as
char strng[3];
or as
char strng[4];
if you want to include the terminating zero.
In this case you may write
vect[i][0]=strng[2]; //this two lines are in a simple for cycle
vect[i][1]=strng[0];
Assuming strng literally contains "c 2", then your memory is the issue. strng[2] contains 3 cells iirc. 2 for holding chars and then a null terminator (ie \0). so when you try to access strng[2], (which you cant because you can only go to N-1 cells, where N is the number allocated for it) it contains undefined results, since it isnt null terminated and you are reaching beyond memory you allocated

Pointers in c (how to point to the first char in a string with a pointer pointing somewhere else in the same string)

If I have a pointer that is pointing somewhere in a string, let's say it is pointing at the third letter (we do not know the letter position, basically we don't know it is the third letter), and we want it to point back to the first letter so we can make the string to be NULL how do we do that?
For example:
if we have ascii as a pointer
ascii is pointing now somewhere in the string, and i want it to point at the first char of the string how do i do that?
(Note:
I tried saying
int len = strlen(ascii);
ascii -= len;
ascii = '0';
but it is not working, it changes wherever the pointer is to 0 but not the first char to 0)
You cannot. C and C++ go by the "no hidden costs" rule, which means, among other things, noone else is going to secretly store pointers to beginnings of your strings for you. Another common thing is array sizes; you have to store them yourself as well.
First of all, in your code, if ascii is a pointer to char, that should be
*ascii = '\0';
not what you wrote. Your code sets the pointer itself to the character '0'. Which means it's pointing to a bad place!
Second of all, strlen returns the length of the string you are pointing to. Imagine the string is 10 characters long, and you are pointing at the third character. strlen will return 8 (since the first two characters have been removed from the calculation). Subtracting this from where you are pointing will point you to 6 characters before the start of the string. Draw a picture to help see this.
IMHO, without having some other information, it is not possible to achieve what you are wanting to do.
From what I said above, you should be able to work out what you need to do as long as you keep the original length of the string for example.
No. You, however, can make it point at the last character of the string, which is right before '\0' in every string (it's called zero-terminated string).
What you could do is instead of a char* you could use a pointer to a struct which contains the information you need and the string.
It's not guaranteed to work, but you might get away with backing up until you find a null character, and then moving forward one.
while(*ascii) ascii--;
ascii++;

Resources