memcmp with arrays of arrays - c

In C, I want to check a given array of chars for an arbitrary letter, and change it according to what it is. For example, the characters "a" or "A" would be changed to "4"(the character representing 4). This is a coding excercise for me :)
The code is as follows:
#include <stdio.h>
#include <string.h>
#include <assert.h>
#include <zlib.h>
#define NUM_BUFFERS 8
#define BUFFER_LENGTH 1024
char buffArrays[NUM_BUFFERS][BUFFER_LENGTH];
int main(int argc, const char* arg[v])
{
const char a[] = "a";
gzFile file;
file = gzopen("a.txt", "rb"); //contains 8 lines of 1024 'a's
int counter = 0;
while(counter < NUM_BUFFERS)
{
gzread(file, buffArrays[counter], BUFFER_LENGTH - 1);
counter++;
}
counter = 0;
while(counter < NUM_BUFFERS)
{
int i = 0;
for( i; i < BUFFER_LENGTH; i++ )
{
int *changed = &buffArrays[counter][i];
if( memcmp(&a, changed, 1) == 0 )
printf("SUCCESS\n");
}
counter++;
}
gzclose(file);
return 0;
}
This code never reaches the "SUCCESS" part. This says to me that either
(1) the value of changed is not pointing to the correct thing
(2) the pointer &a is incorrect
(3) I am completely wrong and it is something else
Any help would be appreciated.

Two things.
The following assigns the value 0x61 or 'a' to the character string.
const char a[] = 'a';
You probably rather meant to write
const char a = 'a'; /* assign a character to a character */
or
const char a[] = "a"; /* assign a string to a string */
The next thing is with the following statement. Hereby you assign a pointer to an int with the memory address of a char. Which invokes undefined behavior as you are reading over the bounds of your valid memory in the next statement.
int *changed = &bufferArrays[counter][i];
Hereby you compare the first four bytes starting from both addresses. Both variables are only one byte wide.
if( memcmp(&a, changed, 4) == 0 )
If you only want to know whether there is an 'a' in some of your buffer, why don't you just.
int i, j;
for (i = 0; i < NUM_BUFFERS; i++)
for (j = 0; j < BUFFER_LENGTH; j++)
if (bufferArrays[i][j] == 'a') printf("got it!\n");

This:
bufferArrays[counter] = "a"; //all the buffers contain one "a"
is wrong, since bufferArrays[counter] is not a character pointer but a character array. You need:
strcpy(bufferArrays[counter], "a");
Also, you don't show readTOmodify, so that part is a bit hard to understand.
Further, strings are best compared with strcpy(), which compares character-by-character and stops at the terminating '\0'. You use memcmp(), and I don't understand the reason for the 4 which is the number of bytes you're comparing.

1) bufferArrays[counter] = "a"; //all the buffers contain one "a"
This is not ok, you have to use strcpy to copy strings:
strcpy(bufferArrays[counter],"a"); //all the buffers contain one "a"
2)
#define BUFFER_LENGTH 1
Here's a problem. Buffer length should be at least 2 if you want to store just one char (for the extra null-termination).
3) In both of your loops, you never change counter, which leads to infinite loop.
Where's your code? I don't see any function surrounding it.
EDIT:
To assign you can also use:
while(counter < NUM_BUFFERS)
{
bufferArrays[counter][0] = 'a'; //all the buffers contain one "a"
counter++;
}
In any case, you have to have Buffer length as 2 if you want use it as a C-string.

The statement
bufferArrays[counter] = "a";
is not legal. It assigns a pointer to a single char and should give a compiler error (or at least a warning). Instead try
bufferArrays[counter] = 'a';
Also, in the while loops (both of them) you do not increase counter and so loop over the same index over and over forever.
Edit: Further problems
The condition where you do the comparison is flawed as well:
memcmp(&a, changed, 4)
The above doesn't compare pointers, it compares the contents of what the pointers point to, and you compare four bytes while the contents is only a single byte. Besides, you can't compare the pointers, as they will be different; The contents of the variable a is stored at a different location than that of the contents of bufferArrays[counter][i].

Related

For loop is running too often

I have a for loop which should run 4 times but is running 6 times.
Could you please explain the behaviour?
This is strange because stringarr1 is not changed.
Edit: I want to remove all '!' from my first string and want to save the letters in a second string.
#include <stdio.h>
#include <math.h>
#include <string.h>
int main(){
char stringarr1[] = "a!bc";
char stringarr2[] = "";
printf("%d\n", strlen(stringarr1)); // lenght --> 4
for (size_t i = 0; i < strlen(stringarr1); i++)
{
printf("i: %d\n", i);
if (stringarr1[i] != '!') {
stringarr2[strlen(stringarr2)] = stringarr1[i];
printf("info: != '!'\n");
}
}
}
You are overrunning the buffer for stringarr2 (length 1), which is in this case corrupting the memory-adjacent stringarr1, causing the string length to change by overwriting its nul terminator.
Then because you are reevaluating the string length on each iteration, the loop will run for a non-deterministic number of iterations - in your case just 6, but it could be worse; the behaviour you have observed is just one of several possibilities - it is undefined.
Apart from correcting the buffer length for stringarr2, it is best practice to evaluate loop-invariants once (although in this case the string length is not invariant due to a bug). So the following:
const size_t length = strlen( stringarr1 ) ;
for( size_t i = 0; i < length; i++ )
{
...
will run for 4 iterations regardless of the buffer overrun bug because the length is not reevaluated following the corruption. Re-evaluating loop-invariants can lead to very slow code execution.
Your code can run any number of times. You write beyond the end of stringarr2 so you may be smashing the stack and overwriting local variables. What you meant to do is probably something like this:
#include <stdio.h>
#include <math.h>
#include <string.h>
int main(){
char stringarr1[] = "a!bc";
char stringarr2[10];
int len = strlen(stringarr1);
printf("%d\n", len); // lenght --> 4
for (size_t i = 0; i < len; i++)
{
printf("i: %d\n", i);
if (stringarr1[i] != '!') {
stringarr2[len] = stringarr1[i];
printf("info: != '!'\n");
}
}
}
Like others said, it is not really clear what you are trying to accomplish here. But in C, a declaration like char s[] = "string" only allocates enough memory to store whatever is on the right hand side of the assignment. If that is an empty string like in your case, only a single byte is allocated, to store the end of string 'null' character. You need to either explicitly specify, like I did, the number of bytes to allocate as the array size, or use dynamic memory allocation.
The problem is that you're writing past the end of stringarr2. This triggers undefined behaviour.
To fix this, you need to allocate sufficient memory for stringarr2.
First, we must allocate the string to be long enough.
char stringarr1[] = "a!bc";
//save this in a variable beforehand because strlen loops over the string every time it is called
size_t len = strlen(stringarr1);
char stringarr2[1024] = { 0 };
{ 0 } initializes all characters in the string to 0, which means the last one will always be a null terminator after we add characters. This tells C string functions where the string ends.
Now we can put stuff in there. It seems like you're trying to append, so keep a separate iterator for the 2nd string. This is more efficient than calling strlen every loop.
for(size_t i = 0, j = 0; i < len; i++){
printf("i: %d\n", i);
if (stringarr1[i] != '!') {
stringarr2[j++] = stringarr1[i];
printf("info: != '!'\n");
}
}

Mystery of the mysterious P

Background:
I'm trying to create a program that takes a user name(assuming that input is clean), and prints out the initials of the name.
Objective:
Trying my hand out at C programming with CS50
Getting myself familiar with malloc & realloc
Code:
#include <cs50.h>
#include <stdio.h>
#include <string.h>
#include <ctype.h>
string prompt(void);
char *getInitials(string input);
char *appendArray(char *output,char c,int count);
//Tracks # of initials
int counter = 0;
int main(void){
string input = prompt();
char *output = getInitials(input);
for(int i = 0; i < counter ; i++){
printf("%c",toupper(output[i]));
}
}
string prompt(void){
string input;
do{
printf("Please enter your name: ");
input = get_string();
}while(input == NULL);
return input;
}
char *getInitials(string input){
bool initials = true;
char *output;
output = malloc(sizeof(char) * counter);
for(int i = 0, n = strlen(input); i < n ; i++){
//32 -> ASCII code for spacebar
//9 -> ASCII code for tab
if(input[i] == 32 || input[i] == 9 ){
//Next char after spaces/tab will be initial
initials = true;
}else{//Not space/tab
if(initials == true){
counter++;
output = appendArray(output,input[i],counter);
initials = false;
}
}
// eprintf("Input[i] is : %c\n",input[i]);
// eprintf("Counter is : %i\n",counter);
// eprintf("i is : %i\n",i);
// eprintf("n is : %i\n",n);
}
return output;
}
char *appendArray(char *output,char c,int count){
// allocate an array of some initial (fairly small) size;
// read into this array, keeping track of how many elements you've read;
// once the array is full, reallocate it, doubling the size and preserving (i.e. copying) the contents;
// repeat until done.
//pointer to memory
char *data = malloc(0);
//Increase array size by 1
data = realloc(output,sizeof(char) * count);
//append the latest initial
strcat(data,&c);
printf("Value of c is :%c\n",c);
printf("Value of &c is :%s\n",&c);
for(int i = 0; i< count ; i++){
printf("Output: %c\n",data[i]);
}
return data;
}
Problem:
The output is not what i expected as there is a mysterious P appearing in the output.
E.g When i enter the name Barack Obama, instead of getting the result:BO, i get the result BP and the same happens for whatever name i choose to enter, with the last initial always being P.
Output:
Please enter your name: Barack Obama
Value of c is :B
Value of &c is :BP
Output: B
Value of c is :O
Value of &c is :OP
Output: B
Output: P
BP
What i've done:
I've traced the problem to the appendArray function, and more specifically to the value of &c (Address of c) though i have no idea what's causing the P to appear,what it means, why it appears and how i can get rid of it.
The value of P shows up no matter when i input.
Insights as to why it's happening and what i can do to solve it will be much appreciated.
Thanks!
Several issues, in decreasing order of importance...
First issue - c in appendArray is not a string - it is not a sequence of character values terminated by a 0. c is a single char object, storing a single char value.
When you try to print c as a string, as in
printf("Value of &c is :%s\n",&c);
printf writes out the sequence of character values starting at the address of c until it sees a 0-valued byte. For whatever reason, the byte immediately following c contains the value 80, which is the ASCII (or UTF-8) code for the character 'P'. The next byte contains a 0 (or there's a sequence of bytes containing non-printable characters, followed by a 0-valued byte).
Similarly, using &c as the argument to strcat is inappropriate, since c is not a string. Instead, you should do something like
data[count-1] = c;
Secondly, if you want to treat the data array as a string, you must make sure to size it at least 1 more than the number of initials and write a 0 to the final element:
data[count-1] = 0; // after all initials have been stored to data
Third,
char *data = malloc(0);
serves no purpose, the behavior is implementation-defined, and you immediately overwrite the result of malloc(0) with a call to realloc:
data = realloc(output,sizeof(char) * count);
So, get rid of the malloc(0) call altogether; either just initialize data to NULL, or initialize it with the realloc call:
char *data = realloc( output, sizeof(char) * count );
Fourth, avoid using "magic numbers" - numeric constants with meaning beyond their immediate, literal value. When you want to compare against character values, use character constants. IOW, change
if(input[i] == 32 || input[i] == 9 ){
to
if ( input[i] == ' ' || input[i] == '\t' )
That way you don't have to worry about whether the character encoding is ASCII, UTF-8, EBCDIC, or some other system. ' ' means space everywhere, '\t' means tab everywhere.
Finally...
I know part of your motivation for this exercise is to get familiar with malloc and realloc, but I want to caution you about some things:
realloc is potentially an expensive operation, it may move data to a new location, and it may fail. You really don't want to realloc a buffer a byte at a time. Instead, it's better to realloc in chunks. A typical strategy is to multiply the current buffer size by some factor > 1 (typically doubling):
char *tmp = realloc( data, current_size * 2 );
if ( tmp )
{
current_size *= 2;
data = tmp;
}
You should always check the result of a malloc, calloc, or realloc call to make sure it succeeded before attempting to access that memory.
Minor stylistic notes:
Avoid global variables where you can. There's no reason counter should be global, especially since you pass it as an argument to appendArray. Declare it local to main and pass it as an argument (by reference) to getInput:
int main( void )
{
int counter = 0;
...
char *output = getInitials( input, &counter );
for(int i = 0; i < counter ; i++)
{
printf("%c",toupper(output[i]));
}
...
}
/**
* The "string" typedef is an abomination that *will* lead you astray,
* and I want to have words with whoever created the CS50 header.
*
* They're trying to abstract away the concept of a "string" in C, but
* they've done it in such a way that the abstraction is "leaky" -
* in order to use and access the input object correctly, you *need to know*
* the representation behind the typedef, which in this case is `char *`.
*
* Secondly, not every `char *` object points to the beginning of a
* *string*.
*
* Hiding pointer types behind typedefs is almost always bad juju.
*/
char *getInitials( const char *input, int *counter )
{
...
(*counter)++; // parens are necessary here
output = appendArray(output,input[i],*counter); // need leading * here
...
}

Reversing a string on C

I'm new in in code and I'm doing K&R for C coding, but I have some simple questions that are complicating me, I know it can be a very stupid question but again, I'm new and if you can explain me in a way that a noob would understand I will appreciate it.
Just want to store "4321" in srev[] but it just doesn't print anything, I know there is other ways to reverse a string but I would like to know why this one doesn't work, thanks.
#include <stdio.h>
#define MAXL 1000
char s[MAXL] = "1234";
char srev[MAXL];
main(){
int i =0;
for(i=0; 4>=i; ++i){
srev[i] = s[4-i];
}
printf("srev[]: %s", srev);
}
To expand upon the comment by Dunno: the string "1234"in C is five bytes long. The fifth byte s[4] is a zero byte denoting string termination.
Your code copies that zero byte to srev[0], so now you have a C string that terminates before it has even begun.
Use i<4 in your for loop (and adjust the arithmetic to 3-i accordingly) so that you only swap the non-zero bytes. Then set srev[4] = '\0'; explicitly to terminate the new string in the correct place.
In your for loop the last thing you do is put s[4] into srev[0]. The that element (the fifth because arrays are zero indexed) is the strings null terminator. That means that the first thing in srev tells printf to stop printing.
Change your loop to this:
for(i=0; 3>=i; ++i){
srev[i] = s[3-i];
}
or:
for(i=0; 4 > i; ++i){
srev[i] = s[3-i];
}
becuase s[4] = '\0' which means end of character string. if you assign null terminator to a string it means you tell it: "it's the end, accept no more characters":
#include <stdio.h>
#define MAXL 1000
char s[MAXL] = "1234";
char srev[MAXL];
main(){
int i = 0;
for(i=0; 4 > i; ++i){
srev[i] = s[3-i]; // 3 - 0 = 3 so s[3] = '4' s4 = '\0'
}
printf("srev[]: %s", srev);
printf("\n\n");
}

Pointer De-referencing

#include<stdlib.h>
#include<stdio.h>
#define NO_OF_CHARS 256
/* Returns an array of size 256 containg count
of characters in the passed char array */
int *getCharCountArray(char *str)
{
int *count = (int *)calloc(sizeof(int), NO_OF_CHARS);
int i;
for (i = 0; *(str+i); i++)
count[*(str+i)]++;
return count;
}
/* The function returns index of first non-repeating
character in a string. If all characters are repeating
then returns -1 */
int firstNonRepeating(char *str)
{
int *count = getCharCountArray(str);
int index = -1, i;
for (i = 0; *(str+i); i++)
{
if (count[*(str+i)] == 1)
{
index = i;
break;
}
}
free(count); // To avoid memory leak
return index;
}
/* Driver program to test above function */
int main()
{
char str[] = "geeksforgeeks";
int index = firstNonRepeating(str);
if (index == -1)
printf("Either all characters are repeating or string is empty");
else
printf("First non-repeating character is %c", str[index]);
getchar();
return 0;
}
I really can't grasp the following lines:
count[*(str+i)]++;
amd
int *getCharCountArray(char *str)
{
int *count = (int *)calloc(sizeof(int), NO_OF_CHARS);
int i;
for (i = 0; *(str+i); i++)
count[*(str+i)]++;
return count;
}
The program is used to find the first Non-Repeating character in the string.
*(str+i) is same as str[i]. The line:
for (i = 0; *(str+i); i++)
is the same as:
for (i = 0; str[i]; i++)
The statements in the loop will be executed as long as str[i] evaluates to non-zero. Since C strings are arrays of characters that are terminated by a null character, the for loop will be executed for each character in str. It will stop when the end of the string is reached.
count[*(str+i)]++;
is the same as:
count[str[i]]++;
If str[i] is 'a', this line will increment the value of count['a'], which is count[97] in ASCII encoding.
At the end of the loop, count will be filled with integers that represent the number of times a particular character appears in str.
I really can't grasp the following lines:
count[*(str+i)]++;
Work from the outside in:
since str is a pointer to char and i is an int, str + i is a pointer to the char that is i chars after the one str itself points to
*(str+i) dereferences pointer str+i, meaning it evaluates to the char the pointer points to. This is exactly equivalent to str[i].
count[*(str+i)] uses the char at index i in string str as an index into dynamic array count. The expression designates the int at that index (since count points to an array of ints). See also below.
count[*(str+i)]++ evaluates to the int at index *(str+i) in the array count points to. As a side effect, it increments that array element by one after the value of the is determined expression. This overall expression is present in your code exclusively for its side effect.
It is important to note that although space is reserved in array count for counting appearances of 256 distinct char values, the expression you asked about is not a safe way to count all of them. That's because type char can be implemented as a signed type (at the C implementer's discretion), and it is common for it to be implemented that way. In that case, only the non-negative char values correspond to array elements, and undefined behavior will result if the input string contains others. Safer would be:
#include <stdint.h>
# ...
count[(uint8_t) *(str+i)]++;
i.e. the same as the original, except for explicitly casting each character of the input string to an unsigned 8-bit value.
Overall, the function simply creates an array of 256 ints, one for each possible char value, and scans the string to count the number of occurrences of each char value that appears in it. It then returns this array of occurrence counts.
This code is equivalent to the confusing loop you posted. Does it help?
*(str + i) is confusing way of expressing str[i] and IMO inappropriate here.
for (i = 0; str[i] != '\0'; ++i)
{
char curr_char = str[i];
++count[curr_char];
}
In for loop there are three things we need to consider :
Explanation of for loop
Initialization of counter variable( i in your eg.). 2) Condition (*(str+i)) 3) Increment/decrement part (i++).
the for loop gets executed till the condition is true(i.e any non zero value) . so *(str+i) is providing a non zero value until there is any character in the array..
count[*(str+i)]++; // it is counting the number of characters in the array by incrementing the string character by character.
count[*(str+i)]++ =>count[*(str+i)]=count[*(str+i)]+1
Now consider one scenario:
char str[] = "aaab";
*(str+i)/str[i] Will show char like 'a','b'...etc.
So
count[*(str+i)]++=count['a']++ Mean;
count['a']=count['a']+1 // Will store iteration of a=1
count['a']=count['a']+1 // Will Update iteration of a=2
count['a']=count['a']+1 // Will Update iteration of a=3
and like other character.
So count[*(str+i)]++ will update occrance of charcarter in updated count.

Cast an int array to string, then print with printf, without allocating new memory

I thought I had this solved, but apparently, I was incorrect. The question is... what did I miss?
Assignment description:
You are to create a C program which fills an integer array with integers and then you are to cast it as a string and print it out. The output of the string should be your first and last name with proper capitalization, spacing and punctuation. Your program should have structure similar to:
main()
{
int A[100];
char *S;
A[0]=XXXX;
A[1]=YYYY;
...
A[n]=0; -- because C strings are terminated with NULL
...
printf("My name is %s\n",S);
}
Response to my submission:
You still copied memory cells to other, which is not expected. You use different space for the integer array as the string which does not follow the requirements. Please follow the instructions carefully next time.
My submission
Note that the first time I submitted, I simply used malloc on S, and copied casted values from A to S. The response was that I could not use malloc or allocate new space. This requirement was not in the problem description above.
Below was my second and final submission, which is the submission being referred to in the submission response above.
#include <stdio.h>
/* Main Program*/
int main (int arga, char **argb){
int A[100];
char *S;
A[0] = 68;
A[1] = 117;
/** etc. etc. etc. **/
A[13] = 115;
A[14] = 0;
// Point a char pointer to the first integer
S = (char *) A;
// For generality, in C, [charSize == 1 <= intSize]
// This is the ratio of intSize over charSize
int ratio = sizeof(int);
// Copy the i'th (char sized) set of bytes into
// consecutive locations in memory.
int i = 0;
// Using the char pointer as our reference, each set of
// bits is then i*ratio positions away from the i'th
// consecutive position in which it belongs for a string.
while (S[i*ratio] != 0){
S[i] = S[i*ratio];
i++;
}
// a sentinel for the 'S string'
S[i] = 0;
printf("My name is %s\n", S);
return 0;
}// end main
It looks like you've got the core idea down: the space for one integer will hold many chars. I believe you just need to pack the integer array "by hand" instead of in the for loop. Assuming a 4-byte integer on a little-endian machine, give this a shot.
#include <stdio.h>
int main()
{
int x[50];
x[0] = 'D' | 'u' << 8 | 's' << 16 | 't' << 24;
x[1] = 0;
char *s = (char*)x;
printf("Name: %s\n", s);
return 0;
}
It sounds like your professor wanted you to put 4 bytes into each int instead of having an array of n "1 byte" ints that you later condensed into 4 / sizeof(int) bytes using the while loop. Per Hurkyl's comment, the solution to this assignment would be platform dependent, meaning that it will differ from machine to machine. I'm assuming your instructor had the class ssh into and use a specific machine?
In any case, assuming you're on a little endian machine, say you wanted to type out the string: "Hi Dad!". Then a snippet of the solution would look something like this:
// Precursor stuff
A[0] = 0x44206948; // Hi D
A[1] = 0x216461; // ad!
A[2] = 0; // Null terminated
char *S = (char *)A;
printf("My string: %s\n", S);
// Other stuff

Resources