Here is a small function from the book Head First on C.
This function should display a string backward on the screen.
void print_reverse(char *s)
{
size_t len = strlen(s);
char *t = s + len - 1;
while ( t >= s )
{
printf("%c", *t);
t -- ;
}
puts("");
}
Unfortunately, I don't understand how it reverses
string.
size_t len = strlen(s); // computes the number of
characters in the string
char *t = s + len - 1; // 't' is a pointer to
a type char
I don't understand what values are used for 's' in this equation; I've read that when the name of an array variable is assigned to a pointer, it actually refers to the address
of the fist character, i.e. array [0]; thus does 's'
have the value 0 here, or does it have the integer value
of a particular character?
For example, the word is s[] = "hello". Then if s[0] = 'h'. Adding strlen(s) to s[0] should yield 104 (in decimal), thus s + len - 1 = 104 + 6 - 1 = 109 (-1 because I assume
I have to subtract '\0' character that strlen takes
into account). But 109 is 'm'. I don't see the way
this equation traverses the string.
while (t >= s); I assume this means that while
t is not equal to zero, is it correct?
Thank you!
First things first, s is a pointer and not a normal char variable.
So, when you assign memory address of a string to s, it contains the address of the first location.
Pointer arithmetic: by adding 1 to a pointer, you make it point to the next memory location. Recall that strings are stored in contiguous memory locations.
So, if s points to "Hello",
printf(*s) will print 'H'
printf(*(s+1)) will print 'e'.
Now, we have the length (=5) in len. When we add len - 1 to s, we make it point 5 locations ahead. It now points to 'o'.
Then by doing while(t >= s) we compare two pointers (t and s) and print the value at address pointed by t and decrement it, till it becomes equal to s which is the first element.
Illustration:
Initial condition:
H e l l o
*s *t
Now we print *t and decrement it.
Output: o
H e l l o
*s *t
We continue it further:
Output: lleH
H e l l o
*s
*t
Since now t == s, we stop.
void print_reverse(char *s)
here s is the pointer to the beginning of the string
size_t len = strlen(s);
this is equals to the number of characters of the string (\0 not counted)
char *t = s + len - 1;
At this point, t is a new pointer which points at the last element of the string (read something about pointer arithmetic if this is not clear to you)
while ( t >= s )
{
printf("%c", *t);
t -- ;
}
In this loop t gets decremented at each iteration, so that every time it points at the previous character in the string.
At the last iteration, t==s, which means that you are printing the first element of the string.
s contains the address of the first character of the array, not the first character itself. An address is not an offset from the start of the array, but an arbitrary (to the user) value. So when you add len - 1 to s, the result is a pointer to the character at index len - 1.
Put another way, this:
char *t = s + len - 1;
Is the same as this:
char *t = &s[len - 1];
The condition while (t >= s) evaluates to true as long as t points to a memory location at or after the start of the array.
It reverses the string by setting the pointer t to the last character of the string and then doing the following:
Print the character that t points to.
Decrease t by one (make it move one char to the left)
Goto (1) unless t now points before the first character of the string.
s is a pointer to the first character of the string. A string is simply a sequence of characters in memory terminated by the NUL character ('\0'). When t == s, t points to the first character, when t < s, t points to the first byte before the first character and this byte is no part of the string.
And when the loop terminates, t won't be zero. t and s are pointers, that means their values are memory addresses. The value of s is the memory address of the first character of the string and this address will certainly not be zero.
Related
I want to read my string backwards. I did that, but I don't understand how two parts of my code work.
char s1[] = "ABC";
printf("%s", s1);
size_t len = strlen(s1);
printf("\n%d", len);
char *t = s1 + len - 1;
printf("\n%s\n", t);
while (t>=s1)
{
printf("%c", *t);
t = t - 1;
}
First: How does t point to letter C?
Second: How is it possible to add variable len which holds integer with an array that holds literals? Is it because pointer t adds their addresses by using pointer arithmetic?
This
char s1[] = "ABC";
looks like
0x100 0x101 0x102 0x103 . . . .Assume 0x100 is base address of s1
--------------------------------
| A | B | C | \0 |
--------------------------------
s1
Here s1 which is char array, points to base address 0x100(let's assume).
I want to read my string reversely ?
For this you need someone to point to 0x102 location i.e last element of an array, for that
size_t len = strlen(s1); /* finding length of array s1 i.e it returns 3 here */
char *t = s1 + len - 1; /* (0x100 + 3*1) - 1 i.e char pointer t points to 0x102 */
above two lines of code is written. Now it looks like
0x100 0x101 0x102 0x103 . . . .
--------------------------------
| A | B | C | \0 |
--------------------------------
s1 t <-- t points here
Now when you do *t it prints the char value at t location i.e value at 0x102 i.e it prints C, in the next iteration you need to print the char one position back, so for that you are doing t = t - 1;.
Note : Here
char *t = s1 + len - 1;
s1 is char pointer and len is an integer variable, so when you are doing pointer arithmetic it will automatically increment by the size of data being pointed by pointer. For e.g
char *t = s1 + len;
evaluated as
t = 0x100 + 3*sizeof(*s1); ==> 0x100 + 3*1 ==> 0x103
char s1[] = "ABC";
s1 is an array of 4 characters char[4] with values {'A','B','C','\0'}
size_t len = strlen(s1);
s1 "decays" (read: is automagically converted) from an array of type to a pointer to type. So s1 decays from an array of 4 characters into a pointer to the first character of the array.
The strlen counts the number of bytes before encountering the null byte delimiter '\0'. Starting from 'A' we can count 'A', 'B', 'C' - that's len = 3.
Pointers in C are normal integers (ok, on most architectures). You can add to them and substract to them, and use uintptr_t to convert them to an integer. But adding to them without a cast will use "pointer arithmetics", that means that (int*)5 + 2 is to the value equal to 5 + 2 * sizeof(int).
char *t = s1 + len - 1;
s1 decays to the pointer to the first character inside the s1 array, that's 'A'. We add + (len = 3), that means that s1 + 3 points to the byte holding '\0' inside the s1 = (char[4]){'A','B','C','\0'} array. Then we subtract - 1, so t will now point to a byte holding the character 'C' inside the s1 array.
while (t >= s1) {
... *t ...
t = t - 1;
}
Start: s1 points to 'A'. t points to 'C'.
while: t is greater then s1. By two. t - s1 = 2, ie. s1 + 2 = t
loop: *t is equal to 'C'.
decrement: t--, so now t will point to 'B'.
while: t is greater then s1. By onw.
loop: *t is equal to 'B'.
decrement: Then t--, so now t will point to 'A'.
while: Now t is equal to then s1. Both point to the first character of the array.
loop: *t is equal to 'B'.
decrement: Then t--, so now t will point to an unknown location before the array. As pointers
(on most architectures) are simple integers, you can decrement and increment them as normal variables.
while: t is now lower then s1. Loop terminates.
Notes:
printf("\n%d", len); is undefined behavior and spawns nasal demons. Use printf("\n%zu", len); to print a size_t variable.
you can print pointer value by using the %p specifier and casting to void printf("%p", (void*)t)
t = s1 - 1. Assigning a pointer to one element before an array is undefined behavior in C. That happens in the end condition of the loop when t = t - 1. Change to do { .. } while loop.
In the language C arrays are a bit strange. At array-of-x can degrade in to a pointer-to-x, very easily. For example if passing to another routine, or as in your case when adding to it. So you are correct it is pointer arithmetic. (In pointer arithmetic you can add pointers and integers, to get pointers.)
Okay, say I have a sequence "abcdefg" and I do
char* s = strdup("abcdefg");
char* p;
char* q;
p = strchr(s, 'c');// -> cdefg
q = strchr(p, 'd');// -> defg
And I want to display s - p basically abcdefg - cdefg = ab, can I do this using pointer arithmetic?
You can do:
printf("%.*s", (int)(p - s), s);
This prints s with a maximum length of p - s which is the number of characters from s to p.
You cannot. The string is saved as a sequence of letters ending in a zero byte. If you print a pointer as a string, you will tell it to print the sequence up to the zero byte, which is how you get cdefg and defg. There is no sequence of bytes that has 'a', 'b', 0 - you would need to copy that into a new char array. s - p will simply give you 2 (the distance between the two) - there is no amount of pointer arithmetic that will copy a sequence for you.
To get a substring, see Strings in C, how to get subString - copy with strncpy, then place a null terminator.
Okay, if you need to cut p from s, the following code should work:
char s[100]; // Use static mutable array
strcpy(s, "abcdefg"); // init array (ok, ok I dont remember the syntax how to init arrays)
char* p = strchr(s, 'c');// -> cdefg
*p = 0; // cut the found substring off from main string
printf("%s\n", s); // -> ab
So I was studying a code about a custom linux shell and I am having a hard time understanding this section:
// add null to the end
char *end;
end = tokenized + strlen(tokenized) - 1;
end--;
*(end + 1) = '\0';
I don't understand what decreasing a char pointer yields and how this section functions in general, I get that it is pointing end at the last position of the tokenized array but I don't understand the following two lines. If anything similar has been posted I don't mind supplying me the links (although I did a good amount of research). Thank you!
Also a quick question is: I don't believe end is an array. Am I wrong on this?
Decreasing the pointer moves its location in memory to the preceding address. In the case of a char * string, end will now point to its preceding character.
// add null to the end
// declare a `char *`
char *end;
// set `end` to point to the last character of `tokenized`
end = tokenized + strlen(tokenized) - 1;
// decrease `end`; now points to the character before the character is was pointing to
end--;
// set the character after the one `end` points to to `NUL`
*(end + 1) = '\0';
I commented your code as I understand it...
char *end;
end = tokenized + strlen(tokenized) - 1; //
end--;
*(end + 1) = '\0';
strlen(tokenized)
This is the offset location of the null-terminator in the tokenized string.
Meaning, if you increment the pointer by this offset (the amount of non-null characters) you end up with a pointer to the index right after the last character. In order to get the index right on the last character you substract one from the offset.
Let offset = strlen(tokenized) - 1
tokenized + offset
This means the pointer tokenized is moved by an offset. If the pointer would reference 1 byte that means it just increments by 1, if its 2 bytes by 2, and so on. This is because if you have, e.g. an array of ints you want to access only integers when taking offsets of that array pointer. The int is at least 2 bytes in size so the pointer will move at least 2 bytes further when being incremented.
end--
Same thing as above, this decrements the pointer by one, since we moved to the last character of the string using our offset we are now at the second to last character in the string, not much to say other than this is equivalent to end = end - 1;.
*(end + 1) = '\0'
Again, we move an offset of 1 ahead with the pointer, so we once again point at the last character of the string. This is rather redundant since we only just decremented the pointer by the same offset. The only difference here is that the pointer end itself is not changed.
Then we dereference the pointer and write to it, this means we change the value the pointer is currently pointing at, namely the last character of the string. We change this to '\0' because that means we move the terminating null-byte to this location, effectively shortening the string by cutting of the last character.
The code here is equivalent to
size_t len = strlen(tokenized);
tokenized[len - 1] = '\0';
char *end = tokenized + len - 2; // we still have this pointer
Note that we do -2 now because we include the end--; statement.
The current end remains pointing at the last character of the now shortened string.
An illustration of what is happening:
tokenized = "hello world"; // [h e l l o w o r l d \0 ]
tokenized = "hello worl"; // [h e l l o w o r l \0 \0 ]
I don't believe end is an array. Am I wrong on this?
Arrays are essentially just pointers to memory locations. There are a few differences like sizeof results and write access but you can mostly say arrays are pointers.
I got this code snippet from here:
int main(int argc, char *argv[])
{
for (int i = 1; i < argc; ++i) {
char *pos = argv[i];
while (*pos != '\0') {
printf("%c\n", *(pos++));
}
printf("\n");
}
}
I have two questions:
Why are we starting the iterations of for loop at i=1, why not
start it at i=0, especially when we are ending the iterations at
i<argc and Not at i<=argc?
The second, third and fourth last lines of code! In char *pos =
argv[i];, we declare a pointer type variable and assign it a
pointer to a commandline parameter passed when running the program.
Then in while (*pos != '\0'), *pos dereferences the pointer
stored in pos, so *pos contains the actual value pointed by the
pointer stored in pos.
Then in printf("%c\n", *(pos++));, we have *(pos++), and
that is the actual question: (a) Why did he increment pos, and
(b) what is the meaning of dereferencing (pos++) with the
dereference operator *?
Why are we starting the iterations of for loop at i=1, why not start it at i=0, especially when we are ending the iterations at
i
We start at 1 because argv[0] holds the name of the program itself which we don't care about. Ignoring the first element of an array does not move the index of the last array.
We have argc elements stored in argv[]. Therefore we mustn't run until i==argc but need to stop one element earlier, just as with every other array.
The second, third and fourth last lines of code! In char *pos = argv[i];, we declare a pointer type variable and assign it a pointer
to a commandline parameter passed when running the program.
Correct. pos is a pointer and points to the first string passed via command line.
Then in while (*pos != '\0'), *pos dereferences the pointer stored in
pos, so *pos contains the actual value pointed by the pointer stored
in pos.
*pos contains the first character of the string we are currently inspecting.
Then in printf("%c\n", *(pos++));, we have *(pos++), and that is the
actual question: (a) Why did he increment pos, and (b) what is the
meaning of dereferencing (pos++) with the dereference operator *?
You have 2 things here:
1. (pos++): pos is a pointer to char and with ++ increment the pointer to point to the next element, i.e. to the next char after taking its value.
2. The value of pos (before the post-increment) is taken and dereferenced to read the char at that position.
As a result the while loop will read all characters, while the for loop handles all strings.
For the first part,
Why are we starting the iterations of for loop at i=1, why not start it at i=0, especially when we are ending the iterations at i<argc and Not at i<=argc?
Because, for hosted environment, argv[0] represents the executable name. Here, we're only interested in supplied command line arguments other than the executable name itself.
Quoting C11, chapter ยง5.1.2.2.1
If the value of argc is greater than zero, the string pointed to by argv[0]
represents the program name; [....] If the value of argc is
greater than one, the strings pointed to by argv[1] through argv[argc-1]
represent the program parameters.
Point to note: using i<=argc as the loop condition would be wrong, as C arrays use 0-based indexing.
For the second part,
(a) Why did he increment pos, and (b) what is the meaning of dereferencing (pos++) with the dereference operator *?
*(pos++), can also be read as *pos; pos++; which, reads the current value from the memory location pointed to by pos and then advances pos by one element.
To elaborate, at the beginning of each iteration of the for loop, by saying
char *pos = argv[i];
pos holds the pointer to the starting of the string which holds the supplied program parameter and by continuous increment (upto NULL), we're basically traversing the string and by dereferencing, we're reading the value at those locations.
Just for the sake of completeness, let me state, that the whole for loop body
char *pos = argv[i];
while (*pos != '\0') {
printf("%c\n", *(pos++));
can be substituted using
puts(argv[i]);
Why are we starting the iterations of for loop at i=1, why not start
it at i=0, especially when we are ending the iterations at i
Because first parameter is name of the program - and it seems author is not interested in it.
Second case is basically similar to the following (argv[i] basically being a char *):
So you have something like this:
char * c = "Hello"
and then char * p = c;
Now you have
+------------------+
| H e l l o /0 |
| ^ |
| | |
+------------------+
|
|
+
p
When you do p++ you have
+------------------+
| H e l l o /0 |
| ^ |
| | |
+------------------+
|
+-+
+
p
If you do now *p - the value you get is 'e'.
*(p++) is basically same as above two steps, just due to post increment, first the value where p points will be retrieved (before increment), and then p will advance.
Then in printf("%c\n", *(pos++));, we have *(pos++), and that is the
actual question: (a) Why did he increment pos, and (b) what is the
meaning of dereferencing (pos++) with the dereference operator *?
So in the while loop the author is traversing through the whole string until he meets null terminator '\0' and printing each character.
This on the other hand is repeated for each parameter in argv using the for loop.
Why are we starting the iterations of for loop at i=1, why not start
it at i=0, especially when we are ending the iterations at i < argc and
Not at i <= argc?
Note that the argc contains the name of the program being executed too which will be the first (given the number zero) thing to be counted. So the actual arguments starts from 1 and ends at total - 1
A note here, the command line arguments are stored in array of pointer to char, ie
argv[0] -> "YourFirstArguement"
argv[1] -> "YourSecondArguement"
.
.
argv[argc-1] -> "YourLastArguement" //Remember argc-1 is the last argument
and so. Note that each of the argument is a null terminated string
So in
char *pos = argv[i]; // Create another pointer to each string
while (*pos != '\0') {
printf("%c\n", *(pos++)); // Note %c, you're printing char by char.
}
You're just printing character by character using the format specifier %c in printf. So you need to dereference character by character in the while loop and this answers
a) Why did he increment pos, and
b) what is the meaning of dereferencing (pos++) with the dereference operator *?
char *start = str;
char *end = start + strlen(str) - 1; /* -1 for \0 */
char temp;
How does that find the end of the string? If the string is giraffe, start holds that string, then you have:
char *end = "giraffe" + 7 - 1;
How does that give you the last char in giraffe? (e)
Here's how "giraffe" is laid out in memory, with each number giving that character's index.
g i r a f f e \0
0 1 2 3 4 5 6 7
The last character e is at index 6. str + 6, alternatively written as &str[6], yields address of the last character. This is the address of the last character, not the character itself. To get the character you need to dereference that address, so *(str + 6) or str[6] (add a *, or remove the &).
In English, here are various ways to access parts of the string:
str == str + 0 == &str[0] address of character at index 0
== str + 6 == &str[6] address of character at index 6
*str == *(str + 0) == str[0] character at index 0 ('g')
== *(str + 6) == str[6] character at index 6 ('e')
If the string is giraffe, start holds that string
Not exactly, start is a char *, so it holds a pointer, not a string. It points to the first character of "giraffe": g.
start + 1 is also a char * pointer and it points to the next element of size char, in this case the i.
start + 5 is a pointer to the second f.
start + 6 is a pointer to the e.
start + 7 is a pointer to the special character \0 or NUL, which denotes the end of a string in C.
start is a pointer to the string "giraffe" in memory, not the string itself.
Likewise end is a pointer to the start of the string + 7 bytes (minus one to account for the null terminator).
There is no string type in C, just char, and arrays of char.
I would tell you to google "c strings", but since that's apparently a term that you need safesearch for these days, here's a link: http://www.cprogramming.com/tutorial/lesson9.html