How to Delete Duplicate Elements from Dynamically Allocated String Array in C - c

I have created a program in C that reads in a word file and counts how many words are in that file, along with how many times each word occurs.
When I run it through Valgrind I either get too many bytes lost or a Segmentation Fault.
How can I remove a duplicate element from a dynamically allocated array and free the memory as well?
Gist: wordcount.c
int tokenize(Dictionary **dictionary, char *words, int total_words)
{
char *delim = " .,?!:;/\"\'\n\t";
char **temp = malloc(sizeof(char) * strlen(words) + 1);
char *token = strtok(words, delim);
*dictionary = (Dictionary*)malloc(sizeof(Dictionary) * total_words);
int count = 1, index = 0;
while (token != NULL)
{
temp[index] = (char*)malloc(sizeof(char) * strlen(token) + 1);
strcpy(temp[index], token);
token = strtok(NULL, delim);
index++;
}
for (int i = 0; i < total_words; ++i)
{
for (int j = i + 1; j < total_words; ++j)
{
if (strcmp(temp[i], temp[j]) == 0) // <------ segmentation fault occurs here
{
count++;
for (int k = j; k < total_words; ++k) // <----- loop to remove duplicates
temp[k] = temp[k+1];
total_words--;
j--;
}
}
int length = strlen(temp[i]) + 1;
(*dictionary)[i].word = (char*)malloc(sizeof(char) * length);
strcpy((*dictionary)[i].word, temp[i]);
(*dictionary)[i].count = count;
count = 1;
}
free(temp);
return 0;
}
Thanks in advance.

Without A Minimal, Complete, and Verifiable example, there is no guarantee that additional problems do not originate elsewhere in your code, but the following need careful attention:
char **temp = malloc(sizeof(char) * strlen(words) + 1);
Above you are allocating pointers not words, your allocation is too small by a factor of sizeof (char*) - sizeof (char). To prevent such problems, if you use the sizeof *thepointer, you will always have the correct size, e.g.
char **temp = malloc (sizeof *temp * strlen(words) + 1);
(unless you plan on providing a sentinel NULL as the final pointer, then + 1 is unnecessary. You must also validate the return (see below))
Next:
*dictionary = (Dictionary*)malloc(sizeof(Dictionary) * total_words);
There is no need to cast the return of malloc, it is unnecessary. See: Do I cast the result of malloc?. Further, if *dictionary was previously allocated elsewhere, the allocation above creates a memory leak because you lose the reference to the original pointer. If it has been previously allocated, you need realloc, not malloc. And if wasn't allocate, a better way of writing it would be:
*dictionary = malloc (sizeof **dictionary * total_words);
You must also validation the allocation succeeds before attempting to use the block of memory, e.g.
if (! *dictionary) {
perror ("malloc - *dictionary");
exit (EXIT_FAILURE);
}
In:
temp[index] = (char*)malloc(sizeof(char) * strlen(token) + 1);
sizeof(char) is always 1 and can be omitted. Better written as:
temp[index] = malloc (strlen(token) + 1);
or better, allocate and validate in a single block:
if (!(temp[index] = malloc (strlen(token) + 1))) {
perror ("malloc - temp[index]");
exit (EXIT_FAILURE);
}
then
strcpy(temp[index++], token);
Next, while total_words may be equal to the words in temp, you have only validated that you have index number of words. That combined with your original allocation times sizeof (char) instead of sizeof (char *), makes it no wonder there can be segfaults where you attempt to iterate over your list of pointers in temp. Better:
for (int i = 0; i < index; ++i)
{
for (int j = i + 1; j < index; ++j)
(the same applies to your k loop as well. Additionally, since you have allocated each temp[index], when you shuffle pointers with temp[k] = temp[k+1]; you overwrite the pointer address in temp[k] causing a memory leak with every pointer you overwrite. Each temp[k] that is overwritten should be freed before the assignment is made.
While you are updating total_words--, there still to this point has never been a validation that index == total_words, and in the event they are not, you can have no confidence in total_words or that you won't segfault attempting to iterate over uninitialized pointers as the result.
The rest appears workable, but after changes are made above, you should insure that the are no additional changes needed. Look things over and let me know if you need additional help. (and with a MCVE, I'm happy to help further)
Additional Problems
I apologize for the delay, real-world called -- and this took a lot longer than anticipated, because what you have is an awkward slow-motion logical train-wreck. First and foremost, while there is nothing wrong with reading an entire text-file file into a buffer with fread -- the buffer is NOT nul-terminated and therefore cannot be used with any functions expecting a string. Yes, strtok, strcpy or any string function will read past the end of word_data looking for the nul-terminating character (well out into memory you don't own) resulting in a SegFault.
Your various scattered +1 tacked onto your malloc allocations now make a little more sense, as it appears you were looking for where you needed to add an additional character to make sure you could nul-terminate word_data, but couldn't quite figure out where it went. (don't worry, I straightened that out for you, but it is a big hint that you are probably going about this in the wrong way -- reading with POSIX getline or fgets is probably a better approach than the file-at-once for this type of text processing)
That is literally, just the tip of the iceberg in the problems encountered in your code. As hinted at earlier, in tokenize, you failed to validate that index equals total_words. This ends up being important given your choice of delim which includes the ASCII apostrophe (or single-quote). This causes your index to exceed the word_count any time a plural-possessive or contraction is encountered in the buffer (e.g. "can't" is split is "can" and "t", "Peter's" is split into "Peter" and "s", etc.... You will have to decide how you want to resolve this, I have simply removed the single quote for now.
Your logic in both tokenize and count_words was difficult to follows, and just wrong in some aspects, and your return type (void) for read_file provided absolutely no way to indicate a success (or failure) within. Always choose a return type that provides meaningful information from which you can determine is a critical function has succeeded or failed (reading your data qualifies as critical).
If it provides a return -- use it. This applies to all functions that can fail (including functions like fseek)
Returning 0 from tokenize misses the return of the number of words (allocated struts) in dictionary leaving you unable to properly free the information and leaving you to guess at some number to display (e.g. for (int i = 0; i < 333; ++i) in main()). You need to track the number of dictionary structs and member word that are allocated in tokenize (keep an index, say dindex). Then returning dindex to main() (assigned to hello in your code) provides the information you need to iterate over the structs in main() to output your information, as well as to free each allocated word before freeing the pointers.
If you don't have an accurate count of the number of allocated dictionary structs back in main(), you have failed in the two responsibilities you have regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed. If you don't know how many blocks there are, then you haven't done (1) and can't do (2).
This is a nit about style, and while not an error, the standard coding style for C avoids the use of Initialcaps, camelCase or MixedCase variable names in favor of all lower-case while reserving upper-case names for use with macros and constants. It is a matter of style -- so it is completely up to you, but failing to follow it can lead to the wrong first impression in some circles.
Rather than carry on for another handful of paragraphs, I've reworked your example for you and added a few comments inline. Go though it, I haven't punishingly tested it for all corner-cases, but it should be a sound base to build from. You will note in going though it, your count_words and tokenize have been simplified. Try and understand why what was done, was done, and ask if you have any questions:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <errno.h>
typedef struct{
char *word;
int count;
} dictionary_t;
char *read_file (FILE *file, char **words, size_t *length)
{
size_t size = *length = 0;
if (fseek (file, 0, SEEK_END) == -1) {
perror ("fseek SEEK_END");
return NULL;
}
size = (size_t)ftell (file);
if (fseek (file, 0, SEEK_SET) == -1) {
perror ("fseek SEEK_SET");
return NULL;
}
/* +1 needed to nul-terminate buffer to pass to strtok */
if (!(*words = malloc (size + 1))) {
perror ("malloc - size");
return NULL;
}
if (fread (*words, 1, size, file) != size) {
perror ("fread words");
free (*words);
return NULL;
}
*length = size;
(*words)[*length] = 0; /* nul-terminate buffer - critical */
return *words;
}
int tokenize (dictionary_t **dictionary, char *words, int total_words)
{
// char *delim = " .,?!:;/\"\'\n\t"; /* don't split on apostrophies */
char *delim = " .,?!:;/\"\n\t";
char **temp = malloc (sizeof *temp * total_words);
char *token = strtok(words, delim);
int index = 0, dindex = 0;
if (!temp) {
perror ("malloc temp");
return -1;
}
if (!(*dictionary = malloc (sizeof **dictionary * total_words))) {
perror ("malloc - dictionary");
return -1;
}
while (token != NULL)
{
if (!(temp[index] = malloc (strlen (token) + 1))) {
perror ("malloc - temp[index]");
exit (EXIT_FAILURE);
}
strcpy(temp[index++], token);
token = strtok (NULL, delim);
}
if (total_words != index) { /* validate total_words = index */
fprintf (stderr, "error: total_words != index (%d != %d)\n",
total_words, index);
/* handle error */
}
for (int i = 0; i < total_words; i++) {
int found = 0, j = 0;
for (; j < dindex; j++)
if (strcmp((*dictionary)[j].word, temp[i]) == 0) {
found = 1;
break;
}
if (!found) {
if (!((*dictionary)[dindex].word = malloc (strlen (temp[i]) + 1))) {
perror ("malloc (*dictionay)[dindex].word");
exit (EXIT_FAILURE);
}
strcpy ((*dictionary)[dindex].word, temp[i]);
(*dictionary)[dindex++].count = 1;
}
else
(*dictionary)[j].count++;
}
for (int i = 0; i < total_words; i++)
free (temp[i]); /* you must free storage for words */
free (temp); /* before freeing pointers */
return dindex;
}
int count_words (char *words, size_t length)
{
int count = 0;
char previous_char = ' ';
while (length--) {
if (isspace (previous_char) && !isspace (*words))
count++;
previous_char = *words++;
}
return count;
}
int main (int argc, char **argv)
{
char *word_data = NULL;
int word_count, hello;
size_t length = 0;
dictionary_t *dictionary = NULL;
FILE *input = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!input) { /* validate file open for reading */
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
if (!read_file (input, &word_data, &length)) {
fprintf (stderr, "error: file_read failed.\n");
return 1;
}
if (input != stdin) fclose (input); /* close file if not stdin */
word_count = count_words (word_data, length);
printf ("wordct: %d\n", word_count);
/* number of dictionary words returned in hello */
if ((hello = tokenize (&dictionary, word_data, word_count)) <= 0) {
fprintf (stderr, "error: no words or tokenize failed.\n");
return 1;
}
for (int i = 0; i < hello; ++i) {
printf("%-16s : %d\n", dictionary[i].word, dictionary[i].count);
free (dictionary[i].word); /* you must free word storage */
}
free (dictionary); /* free pointers */
free (word_data); /* free buffer */
return 0;
}
Let me know if you have further questions.

There are a few things that you need to do to make your code work:
Fix the memory allocation of temp by replacing sizeof(char) with sizeof(char *) like so:
char **temp = malloc(sizeof(char *) * strlen(words) + 1);
Fix the memory allocation of dictionary by replacing sizeof(Dictionary) with sizeof(Dictionary *):
*dictionary = (Dictionary*)malloc(sizeof(Dictionary *) * (*total_words));
Pass the address of address of word_count when calling tokenize:
int hello = tokenize(&dictionary, word_data, &word_count);
Replace all occurrences of total_words in tokenize function with (*total_words). In the tokenize function signature, you can replace int total_words with int *total_words.
You should also replace the hard-coded value of 333 in your for loop in the main function with word_count.
After you make these changes, your code should work as expected. I was able to run it successfully with these changes.

Related

Dynamic growing string array memory issues

I'm working on a crosswords program in which a word dictionary is necessary. I'm trying load a jspell dictionary file into an dynamic string array but i keep getting the
error malloc(): mismatching next->prev_size (unsorted)
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include "dictionary.h"
void dict_init(Dictionary * dict, char * dict_dir, size_t w_len)
{
printf("dictionary.c (dict_init): initializing dictionary.\n");
/*Adjust this value to control the initial array size*/
size_t init_size = 1000;
/*initialize dictionary file directory*/
dict->dir = malloc(strlen(dict_dir) * sizeof(char) + 1);
strcpy(dict->dir, dict_dir);
/*create memory for words array*/
dict->words = malloc(init_size * sizeof(char *));
/*initialize array size*/
dict->size = init_size;
/*initilize word length*/
dict->w_len = w_len;
/*initialize word counter*/
dict->counter = 0;
/*load words into dictionary*/
dict_load(dict);
printf("dictionary.c (dict_init): dictionary initialized.\n");
}
void dict_add(Dictionary * dict, char * word)
{
char ** dictionary = dict->words;
/*check if word array is full*/
if(dict->counter == dict->size)
{
/*increrase size of dictionary*/
dict->size *= 1.5;
dict->words = realloc(dict->words, dict->size * sizeof(char *));
}
/*add word to dictionary*/
dictionary[dict->counter] = malloc(strlen(word) * sizeof(char) + 1);
strcpy(dictionary[dict->counter], word);
dict->counter++;
free(word);
}
void dict_free(Dictionary * dict)
{
free(dict->words);
}
void dict_load(Dictionary * dict)
{
FILE * fp;
char * line = NULL;
char * word = NULL;
size_t len = 0;
ssize_t read;
fp = fopen(dict->dir, "r");
/*check if file exists*/
if (fp == NULL)
{
perror("ERROR: File not found.");
exit(EXIT_FAILURE);
}
/*discard first line*/
if(strstr(dict->dir, ".dic"))
getline(&line, &len, fp);
/*read file lines*/
while ((read = getline(&line, &len, fp)) != -1)
{
if(((strstr(line, "[CAT=punct") == NULL) && (word = parse_line(line, dict->w_len)) != NULL)) {
dict_add(dict, word);
}
}
fclose(fp);
free(line);
printf("dictionary.c (dict_load): dictionary loaded %ld words.\n", dict->counter);
}
char * parse_line(char * line, size_t w_len)
{
int i;
char s_tmp[101] = "";
char * dlm_slash, * dlm_space, * dlm_tab , *substring;
/*get delimiter pointer*/
dlm_slash = strchr(line, '/');
dlm_space = strchr(line, ' ');
dlm_tab = strchr(line, '\t');
/*check if delimiter exists in line*/
if(dlm_slash != NULL)
i = (int)(dlm_slash - line);
else if(dlm_space != NULL)
i = (int)(dlm_space - line);
else if(dlm_tab != NULL)
i = (int)(dlm_tab - line);
else
{
/*replace '\n' with '\0'*/
line[strcspn(line, "\n")] = '\0';
i = strlen(line);
}
strncpy(s_tmp, line, i);
substring = malloc(sizeof(char) * strlen(s_tmp) + 1);
strncpy(substring, s_tmp, strlen(s_tmp));
/*lowercase word*/
lower_case(substring);
if((is_valid(substring) == 0) && (strlen(substring) <= w_len))
return substring;
free(substring);
return NULL;
}
Here's the basic problem, I think:
void dict_add(Dictionary * dict, char * word) {
char ** dictionary = dict->words; /* **** 1 **** */
/*check if word array is full*/
if(dict->counter == dict->size)
{
/*increrase size of dictionary*/
dict->size *= 1.5; /* **** 2 **** */
dict->words = realloc(dict->words, dict->size * sizeof(char *));
/* **** 3 **** */
}
/*add word to dictionary*/
This one is the problem:
dictionary[dict->counter] = malloc(strlen(word) * sizeof(char) + 1);
strcpy(dictionary[dict->counter], word);
dict->counter++;
free(word); /* **** 4 **** */
}
The problem is that dictionary was saved before you called realloc. realloc might make a brand-new memory allocation, in which case it will automatically free() the old one after copying its contents into the new one. So any copy of the pointer which you made before calling realloc might end up pointing to unallocated memory. Writing to unallocated memory is a big no-no; in this particular case, you're probably overwriting malloc's bookkeeping information about the unallocated block, which is why it detects the problem and complains. Count yourself lucky: lots of memory corruption problems go undetected for quite a while until the factory explodes.
Some other issues which I noticed while writing this, with numbered comments in the source:
There's actually no need for the variable dictionary at all.
dict->size is an integer. Forcing conversion to a floating point number and then truncating back to an integer is not very useful. Prefer dict->size += dict->size/2;. Even better would be to first make sure that dict->size isn't so big that increasing it will cause integer wraparound. (This is not undefined behaviour on unsigned types like size_t, but it's not going to produce correct results.)
Here you could actually use a temporary, because realloc might return NULL indicating a memory allocation failure. If that happens, the original allocation is not automatically freed, and you don't have a way to free it. (Actually you do, since you have a variable confusingly called dictionary, but in point 1 I recommended that you get rid of it.) A more idiomatic call would be:
if(dict->counter == dict->size) {
/*increrase size of dictionary*/
dict->size += dict->size / 2; /* See point 2, above */
char** new_words = realloc(dict->words, dict->size * sizeof(*new_words));
if (new_words == NULL) {
/* Report allocation error and free all the memory you've allocated */
/* Then probably exit(1) but if this were a library function, just
* return some kind of failure indication so that the caller can do
* their own clean-up.
*/
}
dict->words = new_words;
}
dict->words[dict->counter] = word; /* See point 4, below */
You're freeing word here because it was allocated in parse_line(). But if you know you're going to free it anyway, there wasn't much point making a copy of it first. You might as well just use it. (But you need to document the fact that this function takes ownership of the word passed as an argument.)
It might be considered cleaner to do the copy as you do but then not free the argument, leaving it for the caller to do that. That would have the advantage of allowing the caller to provide a word which hadn't been dynamically allocated, or use the word for some other purpose.
(Not indicated in this snippet, but nonetheless important). Every block of allocated memory must be freed. So your program should execute free exactly as many times as it executed malloc. But you don't do that; you just free the array of word pointers, and let the words pointed to in that array leak. You should fix that. (Note that you don't need an extra call to free for a call to realloc, since realloc itself frees the old block if it allocates a new one. You only need to match the initial malloc with a free.)

Why does my string_split implementation not work?

My str_split function returns (or at least I think it does) a char** - so a list of strings essentially. It takes a string parameter, a char delimiter to split the string on, and a pointer to an int to place the number of strings detected.
The way I did it, which may be highly inefficient, is to make a buffer of x length (x = length of string), then copy element of string until we reach delimiter, or '\0' character. Then it copies the buffer to the char**, which is what we are returning (and has been malloced earlier, and can be freed from main()), then clears the buffer and repeats.
Although the algorithm may be iffy, the logic is definitely sound as my debug code (the _D) shows it's being copied correctly. The part I'm stuck on is when I make a char** in main, set it equal to my function. It doesn't return null, crash the program, or throw any errors, but it doesn't quite seem to work either. I'm assuming this is what is meant be the term Undefined Behavior.
Anyhow, after a lot of thinking (I'm new to all this) I tried something else, which you will see in the code, currently commented out. When I use malloc to copy the buffer to a new string, and pass that copy to aforementioned char**, it seems to work perfectly. HOWEVER, this creates an obvious memory leak as I can't free it later... so I'm lost.
When I did some research I found this post, which follows the idea of my code almost exactly and works, meaning there isn't an inherent problem with the format (return value, parameters, etc) of my str_split function. YET his only has 1 malloc, for the char**, and works just fine.
Below is my code. I've been trying to figure this out and it's scrambling my brain, so I'd really appreciate help!! Sorry in advance for the 'i', 'b', 'c' it's a bit convoluted I know.
Edit: should mention that with the following code,
ret[c] = buffer;
printf("Content of ret[%i] = \"%s\" \n", c, ret[c]);
it does indeed print correctly. It's only when I call the function from main that it gets weird. I'm guessing it's because it's out of scope ?
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#define DEBUG
#ifdef DEBUG
#define _D if (1)
#else
#define _D if (0)
#endif
char **str_split(char[], char, int*);
int count_char(char[], char);
int main(void) {
int num_strings = 0;
char **result = str_split("Helo_World_poopy_pants", '_', &num_strings);
if (result == NULL) {
printf("result is NULL\n");
return 0;
}
if (num_strings > 0) {
for (int i = 0; i < num_strings; i++) {
printf("\"%s\" \n", result[i]);
}
}
free(result);
return 0;
}
char **str_split(char string[], char delim, int *num_strings) {
int num_delim = count_char(string, delim);
*num_strings = num_delim + 1;
if (*num_strings < 2) {
return NULL;
}
//return value
char **ret = malloc((*num_strings) * sizeof(char*));
if (ret == NULL) {
_D printf("ret is null.\n");
return NULL;
}
int slen = strlen(string);
char buffer[slen];
/* b is the buffer index, c is the index for **ret */
int b = 0, c = 0;
for (int i = 0; i < slen + 1; i++) {
char cur = string[i];
if (cur == delim || cur == '\0') {
_D printf("Copying content of buffer to ret[%i]\n", c);
//char *tmp = malloc(sizeof(char) * slen + 1);
//strcpy(tmp, buffer);
//ret[c] = tmp;
ret[c] = buffer;
_D printf("Content of ret[%i] = \"%s\" \n", c, ret[c]);
//free(tmp);
c++;
b = 0;
continue;
}
//otherwise
_D printf("{%i} Copying char[%c] to index [%i] of buffer\n", c, cur, b);
buffer[b] = cur;
buffer[b+1] = '\0'; /* extend the null char */
b++;
_D printf("Buffer is now equal to: \"%s\"\n", buffer);
}
return ret;
}
int count_char(char base[], char c) {
int count = 0;
int i = 0;
while (base[i] != '\0') {
if (base[i++] == c) {
count++;
}
}
_D printf("Found %i occurence(s) of '%c'\n", count, c);
return count;
}
You are storing pointers to a buffer that exists on the stack. Using those pointers after returning from the function results in undefined behavior.
To get around this requires one of the following:
Allow the function to modify the input string (i.e. replace delimiters with null-terminator characters) and return pointers into it. The caller must be aware that this can happen. Note that supplying a string literal as you are doing here is illegal in C, so you would instead need to do:
char my_string[] = "Helo_World_poopy_pants";
char **result = str_split(my_string, '_', &num_strings);
In this case, the function should also make it clear that a string literal is not acceptable input, and define its first parameter as const char* string (instead of char string[]).
Allow the function to make a copy of the string and then modify the copy. You have expressed concerns about leaking this memory, but that concern is mostly to do with your program's design rather than a necessity.
It's perfectly valid to duplicate each string individually and then clean them all up later. The main issue is that it's inconvenient, and also slightly pointless.
Let's address the second point. You have several options, but if you insist that the result be easily cleaned-up with a call to free, then try this strategy:
When you allocate the pointer array, also make it large enough to hold a copy of the string:
// Allocate storage for `num_strings` pointers, plus a copy of the original string,
// then copy the string into memory immediately following the pointer storage.
char **ret = malloc((*num_strings) * sizeof(char*) + strlen(string) + 1);
char *buffer = (char*)&ret[*num_strings];
strcpy(buffer, string);
Now, do all your string operations on buffer. For example:
// Extract all delimited substrings. Here, buffer will always point at the
// current substring, and p will search for the delimiter. Once found,
// the substring is terminated, its pointer appended to the substring array,
// and then buffer is pointed at the next substring, if any.
int c = 0;
for(char *p = buffer; *buffer; ++p)
{
if (*p == delim || !*p) {
char *next = p;
if (*p) {
*p = '\0';
++next;
}
ret[c++] = buffer;
buffer = next;
}
}
When you need to clean up, it's just a single call to free, because everything was stored together.
The string pointers you store into the res with ret[c] = buffer; array point to an automatic array that goes out of scope when the function returns. The code subsequently has undefined behavior. You should allocate these strings with strdup().
Note also that it might not be appropriate to return NULL when the string does not contain a separator. Why not return an array with a single string?
Here is a simpler implementation:
#include <stdlib.h>
char **str_split(const char *string, char delim, int *num_strings) {
int i, n, from, to;
char **res;
for (n = 1, i = 0; string[i]; i++)
n += (string[i] == delim);
*num_strings = 0;
res = malloc(sizeof(*res) * n);
if (res == NULL)
return NULL;
for (i = from = to = 0;; from = to + 1) {
for (to = from; string[to] != delim && string[to] != '\0'; to++)
continue;
res[i] = malloc(to - from + 1);
if (res[i] == NULL) {
/* allocation failure: free memory allocated so far */
while (i > 0)
free(res[--i]);
free(res);
return NULL;
}
memcpy(res[i], string + from, to - from);
res[i][to - from] = '\0';
i++;
if (string[to] == '\0')
break;
}
*num_strings = n;
return res;
}

dynamically allocating my 2d array in c

Any hints on how I would dynamically allocate myArray so I can enter any amount of strings and it would store correctly.
int main()
{
char myArray[1][1]; //how to dynamically allocate the memory?
counter = 0;
char *readLine;
char *word;
char *rest;
printf("\n enter: ");
ssize_t buffSize = 0;
getline(&readLine, &buffSize, stdin);//get user input
//tokenize the strings
while(word = strtok_r(readLine, " \n", &rest )) {
strcpy(myArray[counter], word);
counter++;
readLine= rest;
}
//print the elements user has entered
int i =0;
for(i = 0;i<counter;i++){
printf("%s ",myArray[i]);
}
printf("\n");
}
Use realloc like this:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void){
char **myArray = NULL;
char *readLine = NULL;
size_t buffSize = 0;
size_t counter = 0;
char *word, *rest, *p;
printf("\n enter: ");
getline(&readLine, &buffSize, stdin);
p = readLine;
while(word = strtok_r(p, " \n", &rest )) {
myArray = realloc(myArray, (counter + 1) * sizeof(*myArray));//check omitted
myArray[counter++] = strdup(word);
p = NULL;
}
free(readLine);
for(int i = 0; i < counter; i++){
printf("<%s> ", myArray[i]);
free(myArray[i]);
}
printf("\n");
free(myArray);
}
Here is one way you might approach this problem. If you are going to dynamically allocate storage for an unknown number of words of unknown length, you can start with a buffSize that seems reasonable, allocate that much space for the readLine buffer, and grow this memory as needed. Similarly, you can choose a reasonable size for the number of words expected, and grow word storage as needed.
In the program below, myArray is a pointer to pointer to char. arrSize is initialized so that pointers to 100 words may be stored in myArray. First, readLine is filled with an input line. If more space than provided by the initial allocation is required, the memory is realloced to be twice as large. After reading in the line, the memory is again realloced to trim it to the size of the line (including space for the '\0').
strtok_r() breaks the line into tokens. The pointer store is used to hold the address of the memory allocated to hold the word, and then word is copied into this memory using strcpy(). If more space is needed to store words, the memory pointed to by myArray is realloced and doubled in size. After all words have been stored, myArray is realloced a final time to trim it to its minimum size.
When doing this much allocation, it is nice to write functions which allocate memory and check for errors, so that you don't have to do this manually every allocation. xmalloc() takes a size_t argument and an error message string. If an allocation error occurs, the message is printed to stderr and the program exits. Otherwise, a pointer to the allocated memory is returned. Similarly, xrealloc() takes a pointer to the memory to be reallocated, a size_t argument, and an error message string. Note here that realloc() can return a NULL pointer if there is an allocation error, so you need to assign the return value to a temporary pointer to avoid a memory leak. Moving realloc() into a separate function helps protect you from this issue. If you assigned the return value of realloc() directly to readLine, for example, and if there were an allocation error, readLine would no longer point to the previously allocated memory, which would be lost. This function prints the error message and exits if there is an error.
Also, you need to free all of these memory allocations, so this is done before the program exits.
This method is more efficient than reallocing memory for every added character in the line, and for every added pointer to a word in myArray. With generous starting values for buffSize and arrSize, you may only need the initial allocations, which are then trimmed to final size. Of course, there are still the individual allocations for each of the individual words. You could also use strdup() for this part, but you would still need to remember to free those allocations as well.Still, not nearly as many allocations will be needed as when readLine and myArray are grown one char or one pointer at a time.
#define _POSIX_C_SOURCE 1
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void * xmalloc(size_t size, char *msg);
void * xrealloc(void *ptr, size_t size, char *msg);
int main(void)
{
char **myArray;
size_t buffSize = 1000;
size_t arrSize = 100;
size_t charIndex = 0;
size_t wordIndex = 0;
char *readLine;
char *inLine;
char *word;
char *rest;
char *store;
/* Initial allocations */
readLine = xmalloc(buffSize, "Allocation error: readLine");
myArray = xmalloc(sizeof(*myArray) * arrSize,
"Allocation error: myArray\n");
/* Get user input */
printf("\n enter a line of input:\n");
int c;
while ((c = getchar()) != '\n' && c != EOF) {
if (charIndex + 1 >= buffSize) { // keep room for '\0'
buffSize *= 2;
readLine = xrealloc(readLine, buffSize,
"Error in readLine realloc()\n");
}
readLine[charIndex++] = c;
}
readLine[charIndex] = '\0'; // add '\0' terminator
/* If you must, trim the allocation now */
readLine = xrealloc(readLine, strlen(readLine) + 1,
"Error in readLine trim\n");
/* Tokenize readLine */
inLine = readLine;
while((word = strtok_r(inLine, " \n", &rest)) != NULL) {
store = xmalloc(strlen(word) + 1, "Error in word allocation\n");
strcpy(store, word);
if (wordIndex >= arrSize) {
arrSize *= 2;
myArray = xrealloc(myArray, sizeof(*myArray) * arrSize,
"Error in myArray realloc()\n");
}
myArray[wordIndex] = store;
wordIndex++;
inLine = NULL;
}
/* You can trim this allocation, too */
myArray = xrealloc(myArray, sizeof(*myArray) * wordIndex,
"Error in myArray trim\n");
/* Print words */
for(size_t i = 0; i < wordIndex; i++){
printf("%s ",myArray[i]);
}
printf("\n");
/* Free allocated memory */
for (size_t i = 0; i < wordIndex; i++) {
free(myArray[i]);
}
free(myArray);
free(readLine);
return 0;
}
void * xmalloc(size_t size, char *msg)
{
void *temp = malloc(size);
if (temp == NULL) {
fprintf(stderr, "%s\n", msg);
exit(EXIT_FAILURE);
}
return temp;
}
void * xrealloc(void *ptr, size_t size, char *msg)
{
void *temp = realloc(ptr, size);
if (temp == NULL) {
fprintf(stderr, "%s\n", msg);
exit(EXIT_FAILURE);
}
return temp;
}
I suggest you first scan the data and then call malloc() with the appropriate size.
Otherwise, you can use realloc() to reallocate memory as you go through the data.

Read unknown number of lines from stdin, C

i have a problem with reading stdin of unknown size. In fact its a table in .txt file, which i get to stdin by calling parameter '<'table.txt. My code should look like this:
#include <stdio.h>
#include <string.h>
int main(int argc,char *argv[])
{
char words[10][1024];
int i=0;
while(feof(stdin)==0)
{
fgets(words[i],100,stdin);
printf("%s", words[i]);
i++;
}
return 0;
}
but there is the problem i dont know the nuber of lines, which in this case is 10(we know the number of characters in line - 1024).
It would be great if someone know the solution. Thanks in advance.
You have hit on one of the issues that plagues all new C-programmers. How do I dynamically allocate all memory I need to free myself from static limits while still keeping track of my collection of 'stuff' in memory. This problem usually presents itself when you need to read an unknown number of 'things' from an input. The initial options are (1) declare some limit big enough to work (defeating the purpose), or (2) dynamically allocate a pointers as needed.
Obviously, the goal is (2). However, you then run into the problem of "How do I keep track of what I've allocated?" This in itself is an issue that dogs beginners. The problem being, If I dynamically allocate using a bunch of pointers, **How do I iterate over the list to get my 'stuff' back out? Also, you have to initialize some initial number of pointers (unless using an advanced data structure like a linked-list), so the next question is "what do I do when I run out?"
The usual solution is to allocate an initial set of pointers, then when the limit is reached, reallocate to twice as many as original, and keep going. (as Grayson indicated in his answer).
However, there is one more trick to iterate over the list to get your 'stuff' back out that is worth understanding. Yes, you can allocate with malloc and keep track of the number of pointers used, but you can free yourself from tying a counter to your list of pointers by initially allocating with calloc. That not only allocates space, but also sets the allocated pointers to NULL (or 0). This allows you to iterate over your list with a simple while (pointer != NULL). This provides many benefits when it comes to passing your collection of pointers to functions, etc.. The downside (a minimal one) is that you get to write a reallocation scheme that uses calloc to allocate new space when needed. (bummer, I get to get smarter -- but I have to work to do it...)
You can evaluate whether to use malloc/realloc off-the-shelf, or whether to reallocate using calloc and a custom reallocate function depending on what your requirements are. Regardless, understanding both, just adds more tools to your programming toolbox.
OK, enough jabber, where is the example in all this blather?
Both of the following examples simply read all lines from any text file and print the lines (with pointer index numbers) back to stdout. Both expect that you will provide the filename to read as the first argument on the command line. The only difference between the two is the second has the reallocation with calloc done is a custom reallocation function. They both allocate 255 pointers initially and double the number of pointers each time the limit is hit. (for fun, you can set MAXLINES to something small like 10 and force repeated reallocations to test).
first example with reallocation in main()
# include <stdio.h>
# include <stdlib.h>
# include <string.h>
#define MAXLINES 255
void free_buffer (char **buffer)
{
register int i = 0;
while (buffer[i])
{
free (buffer[i]);
i++;
}
free (buffer);
}
int main (int argc, char **argv) {
if (argc < 2) {
fprintf (stderr, "Error: insufficient input. Usage: %s input_file\n", argv[0]);
return 1;
}
char *line = NULL; /* forces getline to allocate space for buf */
ssize_t read = 0; /* number of characters read by getline */
size_t n = 0; /* limit number of chars to 'n', 0 no limit */
char **filebuf = NULL;
char **rtmp = NULL;
int linecnt = 0;
size_t limit = MAXLINES;
size_t newlim = 0;
FILE *ifp = fopen(argv[1],"r");
if (!ifp)
{
fprintf(stderr, "\nerror: failed to open file: '%s'\n\n", argv[1]);
return 1;
}
filebuf = calloc (MAXLINES, sizeof (*filebuf)); /* allocate MAXLINES pointers */
while ((read = getline (&line, &n, ifp)) != -1) /* read each line in file with getline */
{
if (line[read - 1] == 0xa) { line[read - 1] = 0; read--; } /* strip newline */
if (linecnt >= (limit - 1)) /* test if linecnt at limit, reallocate */
{
newlim = limit * 2; /* set new number of pointers to 2X old */
if ((rtmp = calloc (newlim, sizeof (*filebuf)))) /* calloc to set to NULL */
{
/* copy original filebuf to newly allocated rtmp */
if (memcpy (rtmp, filebuf, linecnt * sizeof (*filebuf)) == rtmp)
{
free (filebuf); /* free original filebuf */
filebuf = rtmp; /* set filebuf equal to new rtmp */
}
else
{
fprintf (stderr, "error: memcpy failed, exiting\n");
return 1;
}
}
else
{
fprintf (stderr, "error: rtmp allocation failed, exiting\n");
return 1;
}
limit = newlim; /* update limit to new limit */
}
filebuf[linecnt] = strdup (line); /* copy line (strdup allocates) */
linecnt++; /* increment linecnt */
}
fclose(ifp);
if (line) free (line); /* free memory allocated to line */
linecnt = 0; /* reset linecnt to iterate filebuf */
printf ("\nLines read in filebuf buffer:\n\n"); /* output all lines read */
while (filebuf[linecnt])
{
printf (" line[%d]: %s\n", linecnt, filebuf[linecnt]);
linecnt++;
}
printf ("\n");
free_buffer (filebuf); /* free memory allocated to filebuf */
return 0;
}
second example with reallocation in custom function
# include <stdio.h>
# include <stdlib.h>
# include <string.h>
#define MAXLINES 255
/* function to free allocated memory */
void free_buffer (char **buffer)
{
register int i = 0;
while (buffer[i])
{
free (buffer[i]);
i++;
}
free (buffer);
}
/* custom realloc using calloc/memcpy */
char **recalloc (size_t *lim, char **buf)
{
int newlim = *lim * 2;
char **tmp = NULL;
if ((tmp = calloc (newlim, sizeof (*buf))))
{
if (memcpy (tmp, buf, *lim * sizeof (*buf)) == tmp)
{
free (buf);
buf = tmp;
}
else
{
fprintf (stderr, "%s(): error, memcpy failed, exiting\n", __func__);
return NULL;
}
}
else
{
fprintf (stderr, "%s(): error, tmp allocation failed, exiting\n", __func__);
return NULL;
}
*lim = newlim;
return tmp;
}
int main (int argc, char **argv) {
if (argc < 2) {
fprintf (stderr, "Error: insufficient input. Usage: %s input_file\n", argv[0]);
return 1;
}
char *line = NULL; /* forces getline to allocate space for buf */
ssize_t read = 0; /* number of characters read by getline */
size_t n = 0; /* limit number of chars to 'n', 0 no limit */
char **filebuf = NULL;
int linecnt = 0;
size_t limit = MAXLINES;
FILE *ifp = fopen(argv[1],"r");
if (!ifp)
{
fprintf(stderr, "\nerror: failed to open file: '%s'\n\n", argv[1]);
return 1;
}
filebuf = calloc (MAXLINES, sizeof (*filebuf)); /* allocate MAXLINES pointers */
while ((read = getline (&line, &n, ifp)) != -1) /* read each line in file with getline */
{
if (line[read - 1] == 0xa) { line[read - 1] = 0; read--; } /* strip newline */
if (linecnt >= (limit - 1)) /* test if linecnt at limit, reallocate */
{
filebuf = recalloc (&limit, filebuf); /* reallocate filebuf to 2X size */
if (!filebuf)
{
fprintf (stderr, "error: recalloc failed, exiting.\n");
return 1;
}
}
filebuf[linecnt] = strdup (line); /* copy line (strdup allocates) */
linecnt++; /* increment linecnt */
}
fclose(ifp);
if (line) free (line); /* free memory allocated to line */
linecnt = 0; /* reset linecnt to iterate filebuf */
printf ("\nLines read in filebuf buffer:\n\n"); /* output all lines read */
while (filebuf[linecnt])
{
printf (" line[%d]: %s\n", linecnt, filebuf[linecnt]);
linecnt++;
}
printf ("\n");
free_buffer (filebuf); /* free memory allocated to filebuf */
return 0;
}
Take a look at both examples. Know that there are many, many ways to do this. These examples just give one approach that provide example of using a few extra tricks than you will normally find. Give them a try. Drop a comment if you need more help.
I suggest that you use malloc and realloc to manage your memory. Keep track of how big your array is or how many entries it has, and call realloc to double its size whenever the array is not big enough.
Op appears to need to store the data somewhere
#define N 100000u
char BABuffer[N];
int main(int argc, char *argv[]) {
size_t lcount = 0;
size_t ccount = 0;
char words[1024 + 2];
while(fgets(words, sizeof words, stdin) != NULL) {
size_t len = strlen(words);
if (ccount + len >= N - 1) {
fputs("Too much!\n", stderr);
break;
}
memcpy(&BABuffer[ccount], words, len);
ccount += len;
lcount++;
}
BABuffer[ccount] = '\0';
printf("Read %zu lines.\n", lcount);
printf("Read %zu char.\n", ccount);
fputs(BABuffer, stdout);
return 0;
}
Note: ccount includes the end-of-line character(s).

array of strings access error

When I print char** surname and char** first, I get some strange outputs. I am not sure if I am doing the malloc correctly or if I'm doing something else incorrectly.
The Input -> names1.txt
The outputs
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main ()
{
int size, i;
char **surname, **first, *middle_init, dummy, str[80];
FILE *fp_input = fopen("names1.txt", "r");
fscanf(fp_input, "%d%c", &size, &dummy); // gets size of array from file
/* dynamic memory allocation */
middle_init = (char*)malloc(size * sizeof(char));
surname = (char**)malloc(size * sizeof(char*));
first = (char**)malloc(size * sizeof(char*));
for (i = 0; i < size; i++)
{
surname[i] = (char*)malloc(17 * sizeof(char));
first[i] = (char*)malloc(17 * sizeof(char));
} // for
/* reads from file and assigns value to arrays */
i = 0;
strcpy(middle_init, "");
while (fgets(str, 80, fp_input) != NULL)
{
surname[i] = strtok(str, ", \n");
first[i] = strtok(NULL, ". ");
strcat(middle_init, strtok(NULL, ". "));
i++;
} // while
/* prints arrays */
for (i = 0; i < size; i++)
printf("%s %s\n", surname[i], first[i]);
return 0;
} // main
A casual look at the code suggests:
You must use strcpy() or a variant on the theme to copy the string found by strtok() into the surname, etc.
The way you've written it, you throw away your allocated memory.
You get the repeated output because you're storing pointers to the string you use to hold the line in the surname and first arrays. That string only holds the last line when you do the printing. This and the previous point are corollaries of the first point.
You only allocate a single character for the middle initials. You then use strcat() to treat them as strings. I recommend treating middle initials as strings, much like the other names. Or, since you aren't required to print them, you might decide to ignore middle initials altogether.
Using 17 instead of enum { NAME_LENGTH = 17 }; or equivalent is not a good idea.
There are undoubtedly other issues too.
I guess you've not reached structures in your course of study yet. If you have covered structures, you should probably use a structure type to represent a complete name, and use a single array of names instead of parallel arrays. This will likely simplify memory management too; you'd use fixed size array elements in the structure, so you'd only have to make one allocation for each name.
The code below produces the output:
Ryan Elizabeth
McIntyre O
Cauble-Chantrenne Kristin
Larson Lois
Thorpe Trinity
Ruiz Pedro
In this code, the err_exit() function is vastly valuable because it makes error reporting into a one-line call, rather than a 4-line paragraph, which means you're more likely to do the error checking. It is a basic use of variable length argument lists, and you may not understand it yet, but it is extremely convenient and powerful. The only functions that could be error checked but aren't are the fclose() and printf(). If you're reading a file, there's little benefit to checking fclose(); if you're writing and fclose() fails, you may have run out of disk space or something like that and it is probably appropriate to report the error. You could add <errno.h> to the list of headers and report on errno and strerror(errno) if you wanted to improve the error reporting more. The code frees the allocated memory; valgrind gives it a clean bill of health.
#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
static void err_exit(const char *fmt, ...);
int main(void)
{
enum { NAME_SIZE = 25 };
const char *file = "names1.txt";
int size, i;
char **surname, **first, str[80];
FILE *fp_input = fopen(file, "r");
if (fp_input == NULL)
err_exit("Failed to open file %s\n", file);
if (fgets(str, sizeof(str), fp_input) == 0)
err_exit("Unexpected EOF on file %s\n", file);
if (sscanf(str, "%d", &size) != 1)
err_exit("Did not find integer in line: %s\n", str);
if (size <= 0 || size > 1000)
err_exit("Integer %d out of range 1..1000\n", size);
if ((surname = (char**)malloc(size * sizeof(char*))) == 0 ||
(first = (char**)malloc(size * sizeof(char*))) == 0)
err_exit("Memory allocation failure\n");
for (i = 0; i < size; i++)
{
if ((surname[i] = (char*)malloc(NAME_SIZE * sizeof(char))) == 0 ||
(first[i] = (char*)malloc(NAME_SIZE * sizeof(char))) == 0)
err_exit("Memory allocation failure\n");
}
for (i = 0; i < size && fgets(str, sizeof(str), fp_input) != NULL; i++)
{
char *tok_s = strtok(str, ",. \n");
char *tok_f = strtok(NULL, ". ");
if (tok_s == 0 || tok_f == 0)
err_exit("Failed to read surname and first name from: %s\n", str);
if (strlen(tok_s) >= NAME_SIZE || strlen(tok_f) >= NAME_SIZE)
err_exit("Name(s) %s and %s are too long (max %d)\n", tok_s, tok_f, NAME_SIZE-1);
strcpy(surname[i], tok_s);
strcpy(first[i], tok_f);
}
if (i != size)
err_exit("Only read %d names\n", i);
fclose(fp_input);
/* prints arrays */
for (i = 0; i < size; i++)
printf("%s %s\n", surname[i], first[i]);
for (i = 0; i < size; i++)
{
free(surname[i]);
free(first[i]);
}
free(surname);
free(first);
return 0;
}
static void err_exit(const char *fmt, ...)
{
va_list args;
va_start(args, fmt);
vfprintf(stderr, fmt, args);
va_end(args);
exit(1);
}
here:
surname[i] = (char*)malloc(17 * sizeof(char));
first[i] = (char*)malloc(17 * sizeof(char));
..
surname[i] = strtok(str, ", \n");
first[i] = strtok(NULL, ". ");
you allocate memory for surname and first and you don't use that memory because you assign to it the string returned from strtok which you should not do anyway because it points to a static buffer used by the function for parsing, you could use strdup instead:
while (fgets(str, 80, fp_input) != NULL) {
surname[i] = strdup(strtok(str, ", \n"));
first[i] = strdup(strtok(NULL, ". "));
middle_init[i] = strtok(NULL, ". ")[0];
i++;
} // while
/* prints arrays */
for (i = 0; i < size; i++)
printf("%s %s %c\n", surname[i], first[i], middle_init[i]);
strdup will allocate memory and copy the string, this way you avoid hard coding the string length too, you should free that memory when you're done, also note that middile_init is a char array, so I just assign 1 char.

Resources