What is the outcome of decreasing a char pointer? - c

So I was studying a code about a custom linux shell and I am having a hard time understanding this section:
// add null to the end
char *end;
end = tokenized + strlen(tokenized) - 1;
end--;
*(end + 1) = '\0';
I don't understand what decreasing a char pointer yields and how this section functions in general, I get that it is pointing end at the last position of the tokenized array but I don't understand the following two lines. If anything similar has been posted I don't mind supplying me the links (although I did a good amount of research). Thank you!
Also a quick question is: I don't believe end is an array. Am I wrong on this?

Decreasing the pointer moves its location in memory to the preceding address. In the case of a char * string, end will now point to its preceding character.
// add null to the end
// declare a `char *`
char *end;
// set `end` to point to the last character of `tokenized`
end = tokenized + strlen(tokenized) - 1;
// decrease `end`; now points to the character before the character is was pointing to
end--;
// set the character after the one `end` points to to `NUL`
*(end + 1) = '\0';
I commented your code as I understand it...

char *end;
end = tokenized + strlen(tokenized) - 1; //
end--;
*(end + 1) = '\0';
strlen(tokenized)
This is the offset location of the null-terminator in the tokenized string.
Meaning, if you increment the pointer by this offset (the amount of non-null characters) you end up with a pointer to the index right after the last character. In order to get the index right on the last character you substract one from the offset.
Let offset = strlen(tokenized) - 1
tokenized + offset
This means the pointer tokenized is moved by an offset. If the pointer would reference 1 byte that means it just increments by 1, if its 2 bytes by 2, and so on. This is because if you have, e.g. an array of ints you want to access only integers when taking offsets of that array pointer. The int is at least 2 bytes in size so the pointer will move at least 2 bytes further when being incremented.
end--
Same thing as above, this decrements the pointer by one, since we moved to the last character of the string using our offset we are now at the second to last character in the string, not much to say other than this is equivalent to end = end - 1;.
*(end + 1) = '\0'
Again, we move an offset of 1 ahead with the pointer, so we once again point at the last character of the string. This is rather redundant since we only just decremented the pointer by the same offset. The only difference here is that the pointer end itself is not changed.
Then we dereference the pointer and write to it, this means we change the value the pointer is currently pointing at, namely the last character of the string. We change this to '\0' because that means we move the terminating null-byte to this location, effectively shortening the string by cutting of the last character.
The code here is equivalent to
size_t len = strlen(tokenized);
tokenized[len - 1] = '\0';
char *end = tokenized + len - 2; // we still have this pointer
Note that we do -2 now because we include the end--; statement.
The current end remains pointing at the last character of the now shortened string.
An illustration of what is happening:
tokenized = "hello world"; // [h e l l o w o r l d \0 ]
tokenized = "hello worl"; // [h e l l o w o r l \0 \0 ]
I don't believe end is an array. Am I wrong on this?
Arrays are essentially just pointers to memory locations. There are a few differences like sizeof results and write access but you can mostly say arrays are pointers.

Related

display a string backward on the screen

Here is a small function from the book Head First on C.
This function should display a string backward on the screen.
void print_reverse(char *s)
{
size_t len = strlen(s);
char *t = s + len - 1;
while ( t >= s )
{
printf("%c", *t);
t -- ;
}
puts("");
}
Unfortunately, I don't understand how it reverses
string.
size_t len = strlen(s); // computes the number of
characters in the string
char *t = s + len - 1; // 't' is a pointer to
a type char
I don't understand what values are used for 's' in this equation; I've read that when the name of an array variable is assigned to a pointer, it actually refers to the address
of the fist character, i.e. array [0]; thus does 's'
have the value 0 here, or does it have the integer value
of a particular character?
For example, the word is s[] = "hello". Then if s[0] = 'h'. Adding strlen(s) to s[0] should yield 104 (in decimal), thus s + len - 1 = 104 + 6 - 1 = 109 (-1 because I assume
I have to subtract '\0' character that strlen takes
into account). But 109 is 'm'. I don't see the way
this equation traverses the string.
while (t >= s); I assume this means that while
t is not equal to zero, is it correct?
Thank you!
First things first, s is a pointer and not a normal char variable.
So, when you assign memory address of a string to s, it contains the address of the first location.
Pointer arithmetic: by adding 1 to a pointer, you make it point to the next memory location. Recall that strings are stored in contiguous memory locations.
So, if s points to "Hello",
printf(*s) will print 'H'
printf(*(s+1)) will print 'e'.
Now, we have the length (=5) in len. When we add len - 1 to s, we make it point 5 locations ahead. It now points to 'o'.
Then by doing while(t >= s) we compare two pointers (t and s) and print the value at address pointed by t and decrement it, till it becomes equal to s which is the first element.
Illustration:
Initial condition:
H e l l o
*s *t
Now we print *t and decrement it.
Output: o
H e l l o
*s *t
We continue it further:
Output: lleH
H e l l o
*s
*t
Since now t == s, we stop.
void print_reverse(char *s)
here s is the pointer to the beginning of the string
size_t len = strlen(s);
this is equals to the number of characters of the string (\0 not counted)
char *t = s + len - 1;
At this point, t is a new pointer which points at the last element of the string (read something about pointer arithmetic if this is not clear to you)
while ( t >= s )
{
printf("%c", *t);
t -- ;
}
In this loop t gets decremented at each iteration, so that every time it points at the previous character in the string.
At the last iteration, t==s, which means that you are printing the first element of the string.
s contains the address of the first character of the array, not the first character itself. An address is not an offset from the start of the array, but an arbitrary (to the user) value. So when you add len - 1 to s, the result is a pointer to the character at index len - 1.
Put another way, this:
char *t = s + len - 1;
Is the same as this:
char *t = &s[len - 1];
The condition while (t >= s) evaluates to true as long as t points to a memory location at or after the start of the array.
It reverses the string by setting the pointer t to the last character of the string and then doing the following:
Print the character that t points to.
Decrease t by one (make it move one char to the left)
Goto (1) unless t now points before the first character of the string.
s is a pointer to the first character of the string. A string is simply a sequence of characters in memory terminated by the NUL character ('\0'). When t == s, t points to the first character, when t < s, t points to the first byte before the first character and this byte is no part of the string.
And when the loop terminates, t won't be zero. t and s are pointers, that means their values are memory addresses. The value of s is the memory address of the first character of the string and this address will certainly not be zero.

What does this block of C code do in order to find the end character?

char *start = str;
char *end = start + strlen(str) - 1; /* -1 for \0 */
char temp;
How does that find the end of the string? If the string is giraffe, start holds that string, then you have:
char *end = "giraffe" + 7 - 1;
How does that give you the last char in giraffe? (e)
Here's how "giraffe" is laid out in memory, with each number giving that character's index.
g i r a f f e \0
0 1 2 3 4 5 6 7
The last character e is at index 6. str + 6, alternatively written as &str[6], yields address of the last character. This is the address of the last character, not the character itself. To get the character you need to dereference that address, so *(str + 6) or str[6] (add a *, or remove the &).
In English, here are various ways to access parts of the string:
str == str + 0 == &str[0] address of character at index 0
== str + 6 == &str[6] address of character at index 6
*str == *(str + 0) == str[0] character at index 0 ('g')
== *(str + 6) == str[6] character at index 6 ('e')
If the string is giraffe, start holds that string
Not exactly, start is a char *, so it holds a pointer, not a string. It points to the first character of "giraffe": g.
start + 1 is also a char * pointer and it points to the next element of size char, in this case the i.
start + 5 is a pointer to the second f.
start + 6 is a pointer to the e.
start + 7 is a pointer to the special character \0 or NUL, which denotes the end of a string in C.
start is a pointer to the string "giraffe" in memory, not the string itself.
Likewise end is a pointer to the start of the string + 7 bytes (minus one to account for the null terminator).
There is no string type in C, just char, and arrays of char.
I would tell you to google "c strings", but since that's apparently a term that you need safesearch for these days, here's a link: http://www.cprogramming.com/tutorial/lesson9.html

String reverse in C

I have code for reversing a string. Let's say I type 'ABC', the output will be 'CBA'. However, there are some code lines I quite don't understand.
1 #include <stdio.h>
2 #include <string.h>
3
4 void print_reverse(char *s) {
5 size_t len = strlen(s);
6
7 char *t = s + len-1;
8 while(t >= s) {
9 printf("%c", *t);
10 t = t-1;
11 }
12 puts("");
13 }
14
15 int main()
16 {
17 char charinput[100];
18 printf("Enter character you want to reverse:");
19 fgets(charinput, 100, stdin);
20 print_reverse(charinput);
21 getchar();
22 }
What does line 7 and 8 do? What would be the output for the pointer t?
The posted code uses the following algorithm:
Line 7: set the pointer t to the last character in the string (note: it will be a newline character if the user entered a string fewer than 99 characters). The -1 is to move one character back from the terminating nil-char
Lines 8-10: This is the core of the reversal reporting loop. The pointer t is repeatedly tested against the address at the beginning of the string. The condition clause checks to see if the t value (an address) is greater-or-equal to the beginning address of the string. So long as it is, the loop-body is entered and the character currently residing at the address held in t is sent to stdout via printf(). The address in t is then decremented by one type-width (one-byte on most-all systems with a single-byte char) and the loop repeats. Only when t contains an address before s does the loop break (and note: this is not within the standard; see below for why).
Something you should know about this loop (and should point out to the author if it isn't you). The final pointer comparison is not standard-compliant. The standard specifies comparison between non-null, like-type, pointers is valid from the base address of a valid sequence (charinput in this code, the address parameterized through s) up to and including one type-element past the allocated region of memory. This code compares t against s, breaking the loop only when t is "less". But as soon as t is less-than-s its value is no longer legally range-comparable against s. In accordance with the standard, this is so because t no longer contains a valid address that falls in the range from charinput through 1-past the size of the charinput memory block.
One way to do this correctly is the following:
t = s + len;
while (t-- > s)
printf("%c", *t);
Edit: after a journey into the standard after prodding from Paul Hankin the prior code has been rewritten to account for an unnoticed UB condition. The updated code is documented below:
t = s + len;
while (t != s)
printf("%c", *--t);
This will also work for zero-length strings. How it works is as follows:
t is set to the address of the terminating nulchar of the string.
Enter the loop, the condition being continue so long as the address in t is not equivalent to the base address of s.
Decrement t, then dereference the resulting address to obtain the current character, sending the result to printf.
Loop around for next iteration.
Let's understand it step by step:
len = strlen(s) will assign size of string s pointing to in bytes to the len (say this len is 10).
s is pointing to the first character of the string. Let's assume the address of first element of this string is 100, then s contains 100.
Adding len-1 to s will give 109.
Now, the line 7
char *t = s + len-1;
tells the compiler that t is pointing to the element at address 109, i.e, last element of string.
Line 8
while(t >= s) {
tells the compiler that loop will continue until t points to something before the first element of the string.
line 7: pointer t is pointing to the last character (s+len-1).
line 8: repeat the step when the address of the t equals or greater than the address of the s. suppose if s pointing to address of the first input string is 1101, the address of the next character is 1101+1=1102 and third is 1102+1=1103 and so on. so t point to 1101 + len-1 in line 7 would be 1101+10-1 (1110) if you input has 10 characters long.
line 9:print the character hold by address pointing by t.
line 10: t is decremented by 1 and now point to the immediate left character.
9 and 10 repeated while the address is greater or equal (1110 in my illustration)
t starts to point at the last character of the string s and in the following loop is decreased until it points to the first character. For each loop iteration the character is printed.
Line 7 sets the pointer t to point to the end of the string s. Line 8 is a while loop (which will go backward through the string, until the beginning). The pointer t is the current position in the string and is output on line 9.
char *t = s + len-1; : To point to the last character of string s
while(t >= s) : To scan all the characters of string s in reverse order (as s points to first character and we have made t point to last character in line 7).
Hope this helps.

Reseting a char pointer to the top of an array

I am writing a function and I need to count the length of an array:
while(*substring){
substring++;
length++;
}
Now when I exit the loop. Will that pointer still point to the start of the array? For example:
If the array is "Hello"
when I exit the loop with the pointer be pointed at:
H or the NULL?
If it is pointing at NULL how do I make it point at H?
Strings in C are stored with a null character (denoted \0) at the end.
Thus, one might declare a string as follows.
char *str="Hello!";
In memory, this will look like Hello!0 (or rather, a string of numbers corresponding to each letter followed by a zero).
Your code looks like this:
substring=str;
length=0;
while(*substring){
substring++;
length++;
}
When you reach the end of this loop, *substring will be equal to 0 and substring will contain the address of the 0 character mentioned above. The value of substring will not change unless you explicitly do so.
To make it point at the beginning of the string you could use substring-length, since pointers are integers and may be manipulated as such. Alternatively, you could memorize the location before you begin:
beginning=str;
substring=str;
length=0;
while(*substring){
substring++;
length++;
}
substring=beginning;
It's pointing at the NULL-terminator of the array. Just remember the position in another variable, or subtract length from the pointer.
Pointer once moved will not automatically move to any another location. So once the while loop gets over the pointer would be pointing to NULL or precisely '\0' which is a termination sequence for the string.
In order to move back to the length of string just calculate the string length, which you already are doing by incrementing the length variable.
Sample code:
#include<stdio.h>
int main()
{
char name1[10] = "test program";
char *name = '\0';
name = name1;
int len = strlen(name);
while(*name)
{
name++;
}
name=name-len;
printf("\n%s\n",name);
}
Hope this helps...
At the end of the loop, *substring will be 0. That's the condition for the loop to end:
while(*substring)
So while( (the value pointed to by substring) is not equal to 0), do stuff
But then *substring becomes 0 (i.e. end of string), so *substring will point to NULL.
If you want to bring it back to H, do substring - length
However, the function you are writing already exists. It's in string.h and it's size_t strlen(const char*) size_t is an integer the size of a pointer (i.e. 32 bits on 32 bit OS and 64 bits on 64 bit OS).

C - start traversing from the middle of a string

Just double checking because I keep mixing up C and C++ or C# but say that I have a string that I was parsing using strcspn(). It returns the length of the string up until the first delimiter it finds. Using strncpy (is that C++ only or was that available in C also?) I copy the first part of the string somewhere else and have a variable store my position. Let's say strcspn returned 10 (so the delimiter is the 10th character)
Now, my code does some other stuff and eventually I want to keep traversing the string. Do I have to copy the second half of the string and then call strncspn() from the beginning. Can I just make a pointer and point it at the 11th character of my string and pass that to strncspn() (I guess something like char* pos = str[11])? Something else simpler I'm just missing?
You can get a pointer to a location in the middle of the string and you don't need to copy the second half of the string to do it.
char * offset = str + 10;
and
char * offset = &str[10];
mean the same thing and both do what you want.
You mean str[9] for the 10th char, or str[10] for the 11th, but yes you can do that.
Just be careful that you are not accessing beyond the length of the string and beyond the size of memory allocated.
It sounds like you are performing tokenization, I would suggest that you can directly use strtok instead, it would be cleaner, and it already handles both of what you want to do (strcspn+strncpy and continue parsing after the delimiter).
you can call strcspn again with (str + 11) as first argument. But make sure that length of str is greater than 11.
n = strcspn(str, pattern);
while ((n+1) < strlen(str))
{
n2 = strcspn((str+n), pattern);
n += n2;
}
Note : using char *pos = str[11] is wrong. You should use like char *pos = str + 11;

Resources