Pointer De-referencing

Pointer De-referencing - c

#include<stdlib.h>
#include<stdio.h>
#define NO_OF_CHARS 256
/* Returns an array of size 256 containg count
of characters in the passed char array */
int *getCharCountArray(char *str)
{
int *count = (int *)calloc(sizeof(int), NO_OF_CHARS);
int i;
for (i = 0; *(str+i); i++)
count[*(str+i)]++;
return count;
}
/* The function returns index of first non-repeating
character in a string. If all characters are repeating
then returns -1 */
int firstNonRepeating(char *str)
{
int *count = getCharCountArray(str);
int index = -1, i;
for (i = 0; *(str+i); i++)
{
if (count[*(str+i)] == 1)
{
index = i;
break;
}
}
free(count); // To avoid memory leak
return index;
}
/* Driver program to test above function */
int main()
{
char str[] = "geeksforgeeks";
int index = firstNonRepeating(str);
if (index == -1)
printf("Either all characters are repeating or string is empty");
else
printf("First non-repeating character is %c", str[index]);
getchar();
return 0;
}
I really can't grasp the following lines:
count[*(str+i)]++;
amd
int *getCharCountArray(char *str)
{
int *count = (int *)calloc(sizeof(int), NO_OF_CHARS);
int i;
for (i = 0; *(str+i); i++)
count[*(str+i)]++;
return count;
}
The program is used to find the first Non-Repeating character in the string.

*(str+i) is same as str[i]. The line:
for (i = 0; *(str+i); i++)
is the same as:
for (i = 0; str[i]; i++)
The statements in the loop will be executed as long as str[i] evaluates to non-zero. Since C strings are arrays of characters that are terminated by a null character, the for loop will be executed for each character in str. It will stop when the end of the string is reached.
count[*(str+i)]++;
is the same as:
count[str[i]]++;
If str[i] is 'a', this line will increment the value of count['a'], which is count[97] in ASCII encoding.
At the end of the loop, count will be filled with integers that represent the number of times a particular character appears in str.

I really can't grasp the following lines:
count[*(str+i)]++;
Work from the outside in:
since str is a pointer to char and i is an int, str + i is a pointer to the char that is i chars after the one str itself points to
*(str+i) dereferences pointer str+i, meaning it evaluates to the char the pointer points to. This is exactly equivalent to str[i].
count[*(str+i)] uses the char at index i in string str as an index into dynamic array count. The expression designates the int at that index (since count points to an array of ints). See also below.
count[*(str+i)]++ evaluates to the int at index *(str+i) in the array count points to. As a side effect, it increments that array element by one after the value of the is determined expression. This overall expression is present in your code exclusively for its side effect.
It is important to note that although space is reserved in array count for counting appearances of 256 distinct char values, the expression you asked about is not a safe way to count all of them. That's because type char can be implemented as a signed type (at the C implementer's discretion), and it is common for it to be implemented that way. In that case, only the non-negative char values correspond to array elements, and undefined behavior will result if the input string contains others. Safer would be:
#include <stdint.h>
# ...
count[(uint8_t) *(str+i)]++;
i.e. the same as the original, except for explicitly casting each character of the input string to an unsigned 8-bit value.
Overall, the function simply creates an array of 256 ints, one for each possible char value, and scans the string to count the number of occurrences of each char value that appears in it. It then returns this array of occurrence counts.

This code is equivalent to the confusing loop you posted. Does it help?
*(str + i) is confusing way of expressing str[i] and IMO inappropriate here.
for (i = 0; str[i] != '\0'; ++i)
{
char curr_char = str[i];
++count[curr_char];
}

In for loop there are three things we need to consider :
Explanation of for loop
Initialization of counter variable( i in your eg.). 2) Condition (*(str+i)) 3) Increment/decrement part (i++).
the for loop gets executed till the condition is true(i.e any non zero value) . so *(str+i) is providing a non zero value until there is any character in the array..
count[*(str+i)]++; // it is counting the number of characters in the array by incrementing the string character by character.

count[*(str+i)]++ =>count[*(str+i)]=count[*(str+i)]+1
Now consider one scenario:
char str[] = "aaab";
*(str+i)/str[i] Will show char like 'a','b'...etc.
So
count[*(str+i)]++=count['a']++ Mean;
count['a']=count['a']+1 // Will store iteration of a=1
count['a']=count['a']+1 // Will Update iteration of a=2
count['a']=count['a']+1 // Will Update iteration of a=3
and like other character.
So count[*(str+i)]++ will update occrance of charcarter in updated count.

Related

How to count the number of distinct characters in common between two strings?

How can a program count the number of distinct characters in common between two strings?
For example, if s1="connect" and s2="rectangle", the count is being displayed as 5 but the correct answer is 4; repeating characters must be counted only once.
How can I modify this code so that the count is correct?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
int i,j,count=0;
char s1[100],s2[100];
scanf("%s",s1);//string 1 is inputted
scanf("%s",s2);//string 2 is taken as input
for(i=1;i<strlen(s1);i++)
{
for(j=1;j<strlen(s2);j++)
{
if(s1[i]==s2[j])//compare each char of both the strings to find common letters
{
count++;//count the common letters
break;
}
}
}
printf("%d",count);//display the count
}
The program is to take two strings as input and display the count of the common characters in those strings. Please let me know what's the problem with this code.

If repeating characters must be ignored, the program must 'remember' the character which were already encountered. You could do this by storing the characters which were processed into a character array and then consult this array while processing the other characters.
You could use a counter variable to keep track of the number of common characters like
int ctr=0;
char s1[100]="connect", s2[100]="rectangle", t[100]="";
Here, t is the character array where the examined characters will be stored. Its size is made to be same as the size of the largest of the other 2 character arrays.
Now use a loop like
for(int i=0; s1[i]; ++i)
{
if(strchr(t, s1[i])==NULL && strchr(s2, s1[i])!=NULL)
{
t[ctr++]=s1[i];
t[ctr]=0;
}
}
t initially has an empty string. Characters which were previously absent in t are added to it via the body of the loop which will be executed only if the character being examined (ie, s1[i]) is not in t but is present in the other string (ie, s2).
strchr() is a function with a prototype
char *strchr( const char *str, int c );
strchr() finds the first occurrence of c in the string pointed to by str. It returns NULL if c is not present in str.
Your usage of scanf() may cause trouble.
Use
scanf("%99s",s1);
(where 99 is one less than the size of the array s1) instead of
scanf("%s",s1);
to prevent overflow problems. And check the return value of scanf() and see if it's 1. scanf() returns the number of successful assignment that it made.
Or use fgets() to read the string.
Read this post to see more about this.
And note that array indexing starts from 0. So in your loops, the first character of the strings are not checked.
So it should've been something like
for(i=0;i<strlen(s1);i++)
instead of
for(i=1;i<strlen(s1);i++)

Here's a solution that avoids quadratic O(N²) or cubic O(N³) time algorithms — it is linear time, requiring one access to each character in each of the input strings. The code uses a pair of constant strings rather than demanding user input; an alternative might take two arguments from the command line and compare those.
#include <limits.h>
#include <stdio.h>
int main(void)
{
int count = 0;
char bytes[UCHAR_MAX + 1] = { 0 };
char s1[100] = "connect";
char s2[100] = "rectangle";
for (int i = 0; s1[i] != '\0'; i++)
bytes[(unsigned char)s1[i]] = 1;
for (int j = 0; s2[j] != '\0'; j++)
{
int k = (unsigned char)s2[j];
if (bytes[k] == 1)
{
bytes[k] = 0;
count++;
}
}
printf("%d\n",count);
return 0;
}
The first loop records which characters are present in s1 by setting an appropriate element of the bytes array to 1. It doesn't matter whether there are repeated characters in the string.
The second loop detects when a character in s2 was in s1 and has not been seen before in s2, and then both increments count and marks the character as 'no longer relevant' by setting the entry in bytes back to 0.
At the end, it prints the count — 4 (with a newline at the end).
The use of (unsigned char) casts is necessary in case the plain char type on the platform is a signed type and any of the bytes in the input strings are in the range 0x80..0xFF (equivalent to -128..-1 if the char type is signed). Using negative subscripts would not lead to happiness. The code does also assume that you're working with a single-byte code set, not a multi-byte code set (such as UTF-8). Counts will be off if you are dealing with multi-byte characters.
The code in the question is at minimum a quadratic algorithm because for each character in s1, it could step through all the characters in s2 only to find that it doesn't occur. That alone requires O(N²) time. Both loops also use a condition based on strlen(s1) or strlen(s2), and if the optimizer does not recognize that the value returned is the same each time, then the code could scan each string on each iteration of each loop.
Similarly, the code in the other two answers as I type (Answer 1 and Answer 2) are also quadratic or worse because of their loop structures.
At the scale of 100 characters in each string, you probably won't readily spot the difference, especially not in a single iteration of the counting. If the strings were bigger — thousands or millions of bytes — and the counts were performed repeatedly, then the difference between the linear and quadratic (or worse) algorithms would be much bigger and more easily detected.
I've also played marginally fast'n'loose with the Big-O notation. I'm assuming that N is the size of the strings, and they're sufficiently similar in size that treating N₁ (the length of s1) as approximately equal to N₂ (the length of s2) isn't going to be a major problem. The 'quadratic' algorithms might be more formally expressed as O(N₁•N₂) whereas the linear algorithm is O(N₁+N₂).

Based on what you expect as output you should keep track which char you used from the second string. You can achieve this as follows:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
int i, j, count = 0, skeep;
char s1[100], s2[100], s2Used[100]{0};
scanf("%s", s1); //string 1 is inputted
scanf("%s", s2); //string 2 is taken as input
for (i = 0; i<strlen(s1); i++)
{
skeep = 0;
for (j = 0; j < i; j++)
{
if (s1[j] == s1[i])
{
skeep = 1;
break;
}
}
if (skeep)
continue;
for (j = 0; j<strlen(s2); j++)
{
if (s1[i] == s2[j] && s2Used[j] == 0) //compare each char of both the strings to find common letters
{
//printf("%c\n", s1[i]);
s2Used[j] = 1;
count++;//count the common letters
break;
}
}
}
printf("%d", count);//display the count
}

Explanation with arrays and strings

I have found a code that shows the frequency of a character in a string. Specifically,
#include <stdio.h>
int main(){
char string[100];
int i, frequency[256] = {0};
printf("Enter a String\n");
gets(string);
for(i=0; string[i]!=0; i++){
frequency[string[i]]=frequency[string[i]]+1;
}
printf("\nCharacter Frequency\n");
for(i=0; i < 256; i++){
if(frequency[i] != 0){
printf("%5c%10d\n", i, frequency[i]);
}
}
return 0;
}
However, I do not understand this:
frequency[string[i]]=frequency[string[i]]+1;
What does it do? How does it behave? I believe that string[i] is the length of frequency? But I am not sure.

Here, the value of string[i] serves as the index for the array frequency.
By saying
frequency[string[i]]=frequency[string[i]]+1;
you're trying to increment the value of the frequency[string[i]] element by 1.
This can also be re-written as
frequency[string[i]]++;
Having said that,
Never use gets(), it seriously suffers from buffer overflow issues. Use fgets() instead.
int main() should be int main(void) at least to conform to the standard.
It is a good practice to always initialize your local variables, like char string[100] = {0};
Link to the ASCII table, for your reference.

char values can be used as array index. string[i] is a char and it is being used as array index in the statement
frequency[string[i]]=frequency[string[i]]+1;
So, if string[i] = 'c' and it's occurrence in the string for ith iteration is 1, then the above expression will increment frequency of character 'c', i.e. frequency[string[i]] by 1.
In this case frequency[string[i]] is equivalent to frequency['c'] which in turn equivalent to frequency[99], where 99 is ASCII equivalent of character 'c'.

concatenating (adding) two charcter strings

I was told to write a program containing a concatenate function. This program should collect the input strings using fgets (&s1[0], len1+1, stdin)
and then add the two to each other to produce a final product.
My problem falls in that the program compiles but it doesn't display anything on the screen whatsoever, here's what I've got. I couldn't see how I could get it solved without this method of approach.
//function to terminate the program incase reach of 0
int str_len (char s[])
{
int i=0;
while (s[i]= NULL)
++i;
return i+1;
}
char string_cat (char*s1, char*s2)
{
//ADDING THE TWO STRINGS
int str_len(char s[])
char *s1 [80]= {'\0'};
char *s2 [40]= {'\0'};
int len1=str_len(s1);
int len2=str_len(s2);
if (int x=0; len1+len2<80; \0;
return;
}
int main ()
{
char string_cat(char*s1,char*s2)
int str_len(char s[])
//RECIVING THE STRINGS TO ADD
char s1 [80];
char s2 [40];
int i=0;
for (i; i !=0; ++i)
{
printf("What is the first sentence?: ")
fgets(*s1[0], 75+1, stdin);
printf("What is the second sentence?:")
fgets(*s2[0],35+1,stdin);
string_cat(*s1,*s2);
printf("The two sentences added together produce the following: %c",s1 )
}
++i
return 0;
}

aside from the mistake with the for loop that others have pointed out, the while loop in your str_len function is wrong.
you should've used while(s[i] != NULL) instead of s[i] = null. one equal sign, "=", is assignment; two equal signs, "==", is comparisons; and exclamation equals, "!=", means not equal.
Secondly, you reassign your s1 and s2 to different memory locations in your string_cat function with their first character as NULL, "\0". this will always give your str_len a length of 0 if corrected your str_len function as pointed out above, and a length of random number if not corrected based on what's occupying your memory at run time.
thirdly [still in the string_cat function], your if(int x = 0; len1 + len2 < 80; \0; doesn't make sense. you're not doing any concatenations in this function at all.
Sorry for not providing you with the solution as this is a simple exercise. I feel like spoiling you if I were to provide you with the code.

First problem is here
int i=0;
for (i; i !=0; ++i)
You set value 0 to the variable i, and then you check if it does not equal 0. This check does not obviosly pass because i equals 0.
The second problem is also the loop. I can't really get the reason you need the loop it at all, because i is not used at all, exept the increment. So as far as i get it, the loop is not needed at all.

In your code having lot of compilation error. Copy paste the code what you have compiled.
Check this line of code
int i=0;
for (i; i !=0; ++i)
Because of this you are not getting any thing. In for loop you have condition i !=0 which always fail so it's not entering inside the loop.

C—Infinite loop, I think?

I'm having an issue with a program in C and I think that a for loop is the culprit, but I'm not certain. The function is meant to take a char[] that has already been passed through a reverse function, and write it into a different char[] with all trailing white space characters removed. That is to say, any ' ' or '\t' characters that lie between a '\n' and any other character shouldn't be part of the output.
It works perfectly if there are no trailing white space characters, as in re-writing an exact duplicate of the input char[]. However, if there are any, there is no output at all.
The program is as follows:
#include<stdio.h>
#define MAXLINE 1000
void trim(char output[], char input[], int len);
void reverse(char output[], char input[], int len);
main()
{
int i, c;
int len;
char block[MAXLINE];
char blockrev[MAXLINE];
char blockout[MAXLINE];
char blockprint[MAXLINE];
i = 0;
while ((c = getchar()) != EOF)
{
block[i] = c;
++i;
}
printf("%s", block); // for debugging purposes
reverse(blockrev, block, i); // reverses block for trim function
trim(blockout, blockrev, i);
reverse(blockprint, blockout, i); // reverses it back to normal
// i also have a sneaking suspicion that i don't need this many arrays?
printf("%s", blockprint);
}
void trim(char output[], char input[], int len)
{
int i, j;
i = 0;
j = 0;
while (i <= len)
{
if (input[i] == ' ' || input[i] == '\t')
{
if (i > 0 && input[i-1] == '\n')
for (; input[i] == ' ' || input[i] == '\t'; ++i)
{
}
else
{
output[j] = input[i];
++i;
++j;
}
}
else
{
output[j] = input[i];
++i;
++j;
}
}
}
void reverse(char output[], char input[], int len)
{
int i;
for (i = 0; len - i >= 0; ++i)
{
output[i] = input[len - i];
}
}
I should note that this is a class assignment that doesn't allow the use of string functions, hence why it's so roundabout.

Change
for (i; input[i] == ' ' || input[i] == '\t'; ++i);
to
for (; i <= len && (input[i] == ' ' || input[i] == '\t'); ++i);
With the first method, if the whitespace is at the end, the loop will iterate indefinitely. Not sure how you didn't get an out of bounds access exception, but that's C/C++ for you.
Edit As Arkku brought up in the comments, make sure your character array is still NUL-terminated (the \0 character), and you can check on that case instead. Make sure you're not trimming the NUL character from the end either.

Declaring your main() function simply as main() is an obsolete style that should not be used. The function must be declared either as int main(void) or as int main(int argc, char *argv[]).
Your input process does not null-terminate your input. This means that what you're working with is not a "string", because a C string, by definition, is an array of char that the last element is a null character ('\0'). Instead, what you've got are simple arrays of char. This wouldn't be a problem as long as you're expecting that, and indeed your code is passing array lengths about, but you're also trying to print it with printf(), which requires C strings, not simple arrays of char.
Your reverse() function has an off-by-one error, because you aren't accounting for the fact that C arrays are zero-indexed, so what you're reversing is always one byte longer than your actual input.
What this means is that if you call reverse(output, input, 10), your code will start by assigning the value at input[10] to output[0], but input[10] is one past the end of your data, and since you didn't initialize your arrays before starting to fill them, that's an indeterminate value. In my testing, that indeterminate value happens, coincidentally, to have zero values much of the time, which means that output[0] gets filled with a null ('\0').
You need to be subtracting one more from the index into the input than you actually are. The loop-termination condition in the reverse() function is also wrong, in compensation, that condition should be len - i > 0, not len - i >= 0.
Your trim() function is unnecessarily complex. Additionally, it too has an incorrect loop condition to compensate for the off-by-one error in reverse(). The loop should be while ( i < len ), not while ( i <= len ).
Additionally, the trim() function has the ability to reduce the size of your data, but you don't provide a way to retain that information. (I see in the comments of Arkku's answer that you've corrected for this already. Good.)
Once you've fixed the issue with not keeping track of your data's size changes, and the off-by-one error which is copying indeterminate data (which happens, coincidentally, to be a null) from the end of the blockout array to the beginning of the blockprint array when you do the second reverse(), and you fix the incorrect <= condition in trim() and the incorrect >= condition in reverse(), and you null-terminate your byte array before passing it to printf(), your program will work.

(Moved from comments to an answer)
My guess is that the problem is outside this function, and is caused by the fact that in the described problem cases the output is shorter than the input. Since you are passing the length of the string as an argument, you need to calculate the length of the string after trim, because it may have changed...
For instance, passing an incorrect length to reverse can cause the terminating NUL character (and possibly some leftover whitespace) to end up at the beginning of the string, thus making the output appear empty.
edit: After seeing the edited question with the code of reverse included, in addition to the above problem, your reverse puts the terminating NUL as the first character of the reversed string, which causes it to be the empty string (in some cases your second reverse puts it back, so you don't see it without printing the output of the first reverse). Note that input[len] contains the '\0', not the last character of the string itself.
edit 2: Furthermore, you are not actually terminating the string in block before using it. It may be the case that the uninitialised array often happens to contain zeroes that serve to terminate the string, but for the program to be correct you absolutely need to terminate it with block[i] = '\0'; immediately after the input loop. Similarly you need ensure NUL-termination of the outputs of reverse and trim (in case of trim it seems to me that this already happens as a side-effect of having the loop condition i <= len instead of i < len, but it's not a sign of good code that it's hard to tell).

memcmp with arrays of arrays

In C, I want to check a given array of chars for an arbitrary letter, and change it according to what it is. For example, the characters "a" or "A" would be changed to "4"(the character representing 4). This is a coding excercise for me :)
The code is as follows:
#include <stdio.h>
#include <string.h>
#include <assert.h>
#include <zlib.h>
#define NUM_BUFFERS 8
#define BUFFER_LENGTH 1024
char buffArrays[NUM_BUFFERS][BUFFER_LENGTH];
int main(int argc, const char* arg[v])
{
const char a[] = "a";
gzFile file;
file = gzopen("a.txt", "rb"); //contains 8 lines of 1024 'a's
int counter = 0;
while(counter < NUM_BUFFERS)
{
gzread(file, buffArrays[counter], BUFFER_LENGTH - 1);
counter++;
}
counter = 0;
while(counter < NUM_BUFFERS)
{
int i = 0;
for( i; i < BUFFER_LENGTH; i++ )
{
int *changed = &buffArrays[counter][i];
if( memcmp(&a, changed, 1) == 0 )
printf("SUCCESS\n");
}
counter++;
}
gzclose(file);
return 0;
}
This code never reaches the "SUCCESS" part. This says to me that either
(1) the value of changed is not pointing to the correct thing
(2) the pointer &a is incorrect
(3) I am completely wrong and it is something else
Any help would be appreciated.

Two things.
The following assigns the value 0x61 or 'a' to the character string.
const char a[] = 'a';
You probably rather meant to write
const char a = 'a'; /* assign a character to a character */
or
const char a[] = "a"; /* assign a string to a string */
The next thing is with the following statement. Hereby you assign a pointer to an int with the memory address of a char. Which invokes undefined behavior as you are reading over the bounds of your valid memory in the next statement.
int *changed = &bufferArrays[counter][i];
Hereby you compare the first four bytes starting from both addresses. Both variables are only one byte wide.
if( memcmp(&a, changed, 4) == 0 )
If you only want to know whether there is an 'a' in some of your buffer, why don't you just.
int i, j;
for (i = 0; i < NUM_BUFFERS; i++)
for (j = 0; j < BUFFER_LENGTH; j++)
if (bufferArrays[i][j] == 'a') printf("got it!\n");

This:
bufferArrays[counter] = "a"; //all the buffers contain one "a"
is wrong, since bufferArrays[counter] is not a character pointer but a character array. You need:
strcpy(bufferArrays[counter], "a");
Also, you don't show readTOmodify, so that part is a bit hard to understand.
Further, strings are best compared with strcpy(), which compares character-by-character and stops at the terminating '\0'. You use memcmp(), and I don't understand the reason for the 4 which is the number of bytes you're comparing.

1) bufferArrays[counter] = "a"; //all the buffers contain one "a"
This is not ok, you have to use strcpy to copy strings:
strcpy(bufferArrays[counter],"a"); //all the buffers contain one "a"
2)
#define BUFFER_LENGTH 1
Here's a problem. Buffer length should be at least 2 if you want to store just one char (for the extra null-termination).
3) In both of your loops, you never change counter, which leads to infinite loop.
Where's your code? I don't see any function surrounding it.
EDIT:
To assign you can also use:
while(counter < NUM_BUFFERS)
{
bufferArrays[counter][0] = 'a'; //all the buffers contain one "a"
counter++;
}
In any case, you have to have Buffer length as 2 if you want use it as a C-string.

The statement
bufferArrays[counter] = "a";
is not legal. It assigns a pointer to a single char and should give a compiler error (or at least a warning). Instead try
bufferArrays[counter] = 'a';
Also, in the while loops (both of them) you do not increase counter and so loop over the same index over and over forever.
Edit: Further problems
The condition where you do the comparison is flawed as well:
memcmp(&a, changed, 4)
The above doesn't compare pointers, it compares the contents of what the pointers point to, and you compare four bytes while the contents is only a single byte. Besides, you can't compare the pointers, as they will be different; The contents of the variable a is stored at a different location than that of the contents of bufferArrays[counter][i].