C string removal - how does this code work? - c

I have a working piece of C code that completely removes every second char from a character array, making the original array half the size (half+1 if the size was odd)
..but I cannot figure out how it works.
void del_str(char string[]) {
int i,j;
for(i=0, j=0; string[i]!=0; i++) {
if(i%2==0) {
string[j++]=string[i];
}
}
string[j]=0;
}
//
example input: 'abcdefgh'
output from that: 'aceg'
what I thought the output would be: 'aacceegg'
The line I don't understand is
string[j++]=string[i];
I can write code that omits every second char, so the output would be:
'a c e g '
but I can't wrap my head around this.
Alternatively, how would you write a program that completely deletes every n-th char and their space in the original array? (producing the same output as the above code)

This code uses two position indexes, i and j. Initially, both indexes are initialized to zero. Index i is used for reading; index j is used for writing.
Index i is incremented at each step of the loop, because the increment i++ is in the header of the loop. Index j, on the other hand, is incremented every other iteration, because j++ happens only when i is even. Index i is always as close or closer to the end of the string than index j, because it moves "faster".
Null terminator is placed at the final position of j at the end of the loop to indicate the new position of string's end.
Perhaps it would be easier to see with a small example. Consider the initial string of "abcdef". Its representation in memory at the beginning of the algorithm is as follows:
'a' 'b' 'c' 'd' 'e' 'f' '\0'
Here is how it would change after each step of the loop:
'a' 'b' 'c' 'd' 'e' 'f' '\0'
'a' 'c' 'c' 'd' 'e' 'f' '\0'
'a' 'c' 'e' 'd' 'e' 'f' '\0'
'a' 'c' 'e' '\0' 'e' 'f' '\0'
Since C strings ignore everything after '\0', the "tail" of 'e' 'f' '\0' is not considered part of the string.

You are forgetting the if(i%2==0) which is crucial. This line basically skips every second character. Then, the line you have identified can be split in 2 parts:
string[j]=string[i]; // Overwrite the character at position j
j++; // Now increase j
The variable i keeps track of the position of the string you have reached while reading it; j is used for writing. In the end, you write 0 at position j to terminate the string.

i and j are not the clearest variable names. Perhaps renamig j to something like output would help.
Note when j is updated, only when you're at an odd position. So i is walking through the whole list while j brings up the read copying something from i back to the jth position, overwriting what was there before. The last line, string[j]=0, terminates your string.

Related

What index the i-th element of any array refer to in c?

i've stumbled upon an assignment in a piece of code, in which we add a null character to an array line[i] = '\0' to explicitly declare it's a string, the latter rose in me the question: as the null character is exactly at the end of any string, well how do we know that adding \0 to the i-th element of line would be added to the last position in it, in my eyes i in line, could be any element with any index ,so do the i-th index of any array refer to the last position or what ?
Code like this usually appears just after code that has used the same index variable i to construct the string.
For example:
char string[10];
int i = 0;
string[i++] = 'a';
string[i++] = 'b';
string[i++] = 'c';
string[i] = '\0';
Or, more realistically:
char line[100];
int i = 0;
int c;
while((c = getchar()) != EOF && c != '\n')
line[i++] = c;
line[i] = '\0';
This second example reads one line of text from standard input and stores it in the line array as a proper, null-terminated string.
(In real code, of course, you also have to worry about the possibility of overflowing the array.)
To make things really clear, you can imagine writing code like this more explicitly, with a separate variable to hold the length of the string. For example:
i = 0;
while((c = getchar()) != EOF && c != '\n')
line[i++] = c;
int length_of_string = i;
line[length_of_string] = '\0';
When you see that line
line[length_of_string] = '\0';
it makes it more obvious that the \0 terminator is being stored at a spot in the string that someone has actually determined to be the length of the string. But as you can see, since the variable length_of_string has just been set based on the value of i after the loop, it's perfectly equivalent to just write
line[i] = '\0';
There's sort of an academic-sounding term called loop invariant, but code like this ends up being a perfect example of what it means, and it's worth thinking about for a moment. A loop invariant is something you can say about a loop that's true at all times, for every trip through the loop, at the beginning or the end or in the middle of the loop. For the read-a-line loop I've just shown, the loop invariant is:
i always contains the number of characters that have been read into the string line.
Let's look at all of the ways this "loop invariant" is true. To make things very clear, I'm going to write the loop again, with some comments to make it clear what I mean by the "top" and "bottom" of the loop:
i = 0;
while((c = getchar()) != EOF && c != '\n') {
/* top of loop */
line[i++] = c;
/* bottom of loop */
}
Before the loop runs, the string is empty, so i starts out as 0.
At the top of the loop, before the line[i++] = c step, i still has the value it did last time through the loop.
In the middle of the loop, the line line[i++] = c simultaneously stores the character c into the line array (and at the right spot!), and increments i.
At the bottom of the loop, after the line[i++] = c step, i contains the updated number of characters in the string.
After the loop (and this was your question), since i still contains the number of characters that have been read and stored into line, it's precisely the right index to use to null-terminate the string, with the line line[i] = '\0'.
The other thing that's worth paying attention to here is that the line in the middle of the loop, that simultaneously stores the next character into the line array, at the right spot, and increments i at the same time, is, once again:
line[i++] = c;
My question for you to think about is, what if I had instead written
line[++i] = c; /* WRONG */
It can be hard, at first, to really understand the difference between i++ and ++i, to understand why you would care, to understand why you might pick one over the other. This code here, I think, is an example that really makes the point.
(For extra credit, think about this: What if arrays in C were 1-based, instead of 0-based? What parts of the read-one-line loop would change, and is it still possible to maintain all facets of the loop invariant?)
If you have an already existing string and you just want it to be terminated with \0 on the last+1 index with a correct value, write a function to determine this position. E.g. check the char on the current position and check if the next position contains a legit value. You can then go trough the whole string and determine the last position, then return a pointer to the last position+1 and set your terminator. If you work with a variety of predefined strings this would be the most scalable approach for me.

How does this C iteration work?

Sorry if this seems like a stupid question, but I came across this code that transforms a mixed-case string to a lower-case one, I understand it except the string iteration:
for (int i=0; str[i]; i++) {
str[i] = tolower(str[i]);
}
In my understanding the expression str[i] means continue iterating if str[i] exists, is that correct? And does C not check the boundaries of an array which means that the loop code go on forever?
Having the guard as str[i] is the same as str[i] != '\0', whereby the '\0' is the null-terminating character of a string. The guard of for loops either evaluates to true or false, or 0 and 1 in this case. Simply using str[i] checks if the character is valid(true), and not a null-terminating character(false), which marks the end of a string.
If your new to C strings, you can also just use strlen() from <string.h> for your guard. This function just returns the length of the string. Your code would then look like this:
for (int i=0; i < strlen(str); i++) {
str[i] = tolower(str[i]);
}
Although this is valid, using the first approach is much easier to use and more C like.
The condition str[i] tests for the end of the string. C-strings are null-terminated, so when the character '\0' is reached, the loop terminates.
No, C does not check array bounds.
It will not go forever because every string (char*) has to end with '\0'. So it loops until str[i] is not 0.
In C every C-string ie. constant you would type is represented as for example:
string a = {'x', 'y', 'z', 0}
for
a = "xyz"
Therefore, the loop terminates when meets the last character since the last element is NULL (0) which is obviously false.
You are right about the expression meaning "if(str[i])" exists, then continue iterating. However, the loop will NOT go on forever because once the value of "i" becomes greater than or equal to (>=) length of the array "str[]", the condition "if(str[i])" will fail. Thus, the loop will only execute as many times as the number of elements in the "str[]" array.

Augmenting 2D array value crashes program

I'm creating a letter frequency counter in C that keeps track of how many times a character is used in a given string. A 2d array keeps track of the data while the program loops through each character:
char* input = "The cat jumped over the fence";
int inputlength = (int) strlen(input);
//keeps track of how many times each character is used
int letterfrequencies[26][2] = {
{'a',0},
{'b',0},
{'c',0},
{'d',0},
{'e',0},
{'f',0},
{'g',0},
{'h',0},
{'i',0},
{'j',0},
{'k',0},
{'l',0},
{'m',0},
{'n',0},
{'o',0},
{'p',0},
{'q',0},
{'r',0},
{'s',0},
{'t',0},
{'u',0},
{'v',0},
{'w',0},
{'x',0},
{'y',0},
{'z',0}
};
int currentchar=0;
int letternum=0; //character position in char counting array
for (int i=0; i<inputlength; i++) {
currentchar=input[i];
letternum=0;
while (currentchar!=letterfrequencies[letternum][0]) {
letternum++;
}
printf(" Found a character ");
letterfrequencies[letternum][1]++; //Add to char counting array
printf("\n");
}
On the first iteration of the loop (I'm using the xCode debugger with break points), everything works as expected. However, after the first iteration, the line:
letterfrequencies[letternum][1]++;
crashes the program, saying Thread 1: EXC_BAD_ACCESS (code=2, address=0x7fff5fc2e84c). If I comment the line out, everything runs through without an issue.
What could be causing this?
The input sentence contain both upper- and lower-case letters, as well as spaces, but your letterfrequencies array only contain lower-case letters. So think about what would happen when you have an upper-case letter or a space and search for it, it won't be found so the while will just continue and go out of bounds of your array leading to undefined behavior.
Use the isspace function to check for spaces, and use tolower convert capical letters to lower-case letters.
The problem rises when you handle the space character, you should check if the character is a (lowercase) letter.
As a side note, you don't need a 2-d array to store the frequencey of letters. Instead, use:
int letterfrequencies[26] = {0};
Assuming the letter is currentchar, increment letterfrequencies[currentchar - 'a']. For instance, if the letter is 'z', letterfrequencies[25] is incremented because 'z' - 'a' is 25.
Variable letternum is bigger than 25... becouse you have for example ´T´ in the input and then you are comparing in the while loop chars...so it is changing chars onto code so it looks like while(84 == 97) and so on. In ascii code there are codes for small and capital letters.
And also you don't have space there '' there is an error at most.

Meaning of a C statement involving char arrays

I am working on an algorithm for a project and I ran across some code that I think may be helpful. However, as I try to read the code, I am having some difficulty understanding a statement in the code. Here is the code.
int firstWord[MAX_WORD_SIZE] = {0}, c = 0;
while (word1[c] != '\0') //word1 is char[] sent as a parameter
{
firstWord[word1[c]-'a']++;
c++;
}
I understand (I hope correctly) that the first line is setting up an integer array of my max size, and initializing elements to zero along with making the initial counter value "c" zero.
I understand that the while loop is looping through all of the characters of the word1[] array until it hits the final char '\0'
I am confused on the line firstWord[word1[c]-'a']++;
word1[c] should give a char, so what does doing the -'a' do? Does this cast the char to an integer which would allow you to access an element of the firstWord[] array and increment using ++? If so, which element or integer is given by word1[c]-'a'
word1[c]-'a' means the difference between the character in cth position of word1 and the integer value of 'a'. Basically it calculates the number of occurences of letters in a word.
So if word1[c] is 'b', then value of word1[c]-'a' will be ('b' - 'a') = 1. So the number of occurences of 'b' in the word will be incremented by 1.
This is a program that counts the number of letters a to z from a word. The key point here is, 'a' - 'a' has a value of 0, and 'b' - 'a' has a value of 1, etc.
For instance, if word1[c] is the letter 'd', then 'd' - 'a' is 3, so it would increment firstWord[3]. When the word has been iterated character by character, firstWord[3] contains the number of letter 'd' in the word.
it seems this code is doing a letter count
1 so what does doing the -'a' do?
if word1[c] is 'a' then word1[c]-'a' is 0
2 . Does this cast the char to an integer which would allow you to access an element of the firstWord[] array and increment using ++?
yes, it is integer promotion
3 .If so, which element or integer is given by word1[c]-'a'
if word1[c] is 'a' then word1[c]-'a' is 0

C pointers: difference between while(*s++) { ;} and while(*s) { s++;}

I'm going through K & R, and am having difficulty with incrementing pointers. Exercise 5.3 (p. 107) asks you to write a strcat function using pointers.
In pseudocode, the function does the following:
Takes 2 strings as inputs.
Finds the end of string one.
Copies string two onto the end of string one.
I got a working answer:
void strcats(char *s, char *t)
{
while (*s) /* finds end of s*/
s++;
while ((*s++ = *t++)) /* copies t to end of s*/
;
}
But I don't understand why this code doesn't also work:
void strcats(char *s, char *t)
{
while (*s++)
;
while ((*s++ = *t++))
;
}
Clearly, I'm missing something about how pointer incrementation works. I thought the two forms of incrementing s were equivalent. But the second code only prints out string s.
I tried a dummy variable, i, to check whether the function went through both loops. It did. I read over the sections 5.4 and 5.5 of K & R, but I couldn't find anything that sheds light on this.
Can anyone help me figure out why the second version of my function isn't doing what I would like it to? Thanks!
edit: Thanks everyone. It's incredible how long you can stare at a relatively simple error without noticing it. Sometimes there's no better remedy than having someone else glance at it.
This:
while(*s++)
;
due to post-increment, locates the nul byte at the end of the string, then increments it once more before exiting the loop. t is copied after then nul:
scontents␀tcontents␀
Printing s will stop at the first nul.
This:
while(*s)
s++;
breaks from the loop when the 0 is found, so you are left pointing at the nul byte. t is copied over the nul:
scontentstcontents␀
It's an off-by-one issue. Your second version increments the pointer every time the test is evaluated. The original increments one fewer time -- the last time when the test evaluates to 0, the increment isn't done. Therefore in the second version, the new string is appended after the original terminating \0, while in the first version, the first character of the new string overwrites that \0.
This:
while (*s)
s++;
stops as soon as *s is '\0', at which point it leaves s there (because it doesn't execute the body of the loop).
This:
while (*s++)
;
stops as soon as *s is '\0', but still executes the postincrement ++, so s ends up pointing right after the '\0'. So the string-terminating '\0' never gets overwritten, and it still terminates the string.
There's one less operation in while (*s) ++s; When *s is zero, then the loop breaks, while the form while (*s++) breaks but still increments s one last time.
Strictly speaking, the latter form may be incorrect (i.e. UB) if you attempt to form an invalid pointer. This is contrived, of course, but here's an example: char x = 0, * p = &x; while (*x++) { }.
Independent of that, it's best to write clean, readable and deliberate code rather than trying to outsmart yourself. Sometimes you can write nifty code in C that is actually elegant, and other times it's better to spell something out properly. Use your judgement, and ask someone else for feedback (or watch their faces as they look at your code).
let's assume the following characters in memory:
Address 0x00 0x01 0x02 0x03
------- ---- ---- ---- ----
0x8000 'a' 'b' 'c' 0
0x8004 ...
While executing loop, it happens in memory.
1. *s = 'a'
2. s = 0x8001
3. *s = 'b'
4. s = 0x8002
5. *s = 'c'
6. s = 0x8003
7. *s = 0;
8. s = 0x8004
9. end loop
While evaluating, *s++ advances the pointer even if the value of *s is 0.
// move s forward until it points one past a 0 character
while (*s++);
It doesn't work at all because s ends up pointing to a different place.
As it summarizes, we get a garbage value as last character in our target string. That garbage string is because of while loop exceed the limit of '\0' by one step forward.
You can eliminate it by using the below code, I think it is efficient
while (*s)
s++;
It execute as below in memory perspective.
1. *s = 'a'
2. s = 0x8001
3. *s = 'b'
4. s = 0x8002
5. *s = 'c'
6. s = 0x8003
7. *s = 0
8. end loop

Resources