I'm trying to malloc array of strings in a struct and it doesn't work well.
I also want to check the size of the arrays but I don't get the right values
what is wrong with my code?
by the way, I know I should free the memory allocation..
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
typedef struct{
char** string;
} Strt;
int main(int agrc, char *argv[]) {
int i;
int size1 = 0, size2 = 0;
char** arrr;
Strt* arr = malloc(sizeof(Strt));
printf("number of arrays: ");
scanf("%d", &size1);
printf("size of each array: ");
scanf("%d", &size2);
arr->string = (char**)malloc(size1 * sizeof(char));
printf("size of string: %d\n", sizeof(arr->string));
for(i = 0; i < size1; i++){
arr->string[i] = (char*)malloc(size2 * sizeof(char));
}
return 0;
}
It looks like you are wanting to allocate for an unknown number of strings. You are thinking correctly that you will want to use a pointer-to-pointer-to char (e.g. char **arr;), but the "train fell off the tracks" so to speak when you went to implement the logic.
Let's look at what you need to do to read and store an unknown number of strings (or any object for that matter). It is a two-step process where you:
allocate storage for pointers, one for each object you wish to store, and
allocate a block of storage for each object, copying the object to the newly allocated block and assigning the beginning address of that block to one of your pointers.
To grow the number of objects stored, you realloc() more pointers and keep going until you run out of objects to store.
When you realloc(), you always use a temporary pointer. Why? Because when (not if) realloc() fails, it returns NULL and if you are simply calling realloc() with the pointer itself, you overwrite the address to the current block of memory with NULL losing the original pointer that can now no longer be freed.
So instead of:
arr = realloc (arr, /* how big */);
You do:
/* always realloc using temp pointer to avoid mem-leak on realloc failure. */
void *tmp = realloc (arr, (nstr + 1) * sizeof *arr); /* realloc pointers */
if (!tmp) { /* VALIDATE EVERY ALLOCATION */
perror ("realloc-tmp");
break;
}
arr = tmp; /* assign reallocated block of pointers to arr */
Now if I understand correctly (though I am not 100% clear on) your need to read and store an unknown number of strings, you do not need a struct. Look at Strt -- it is a single-member struct. The struct serves no purpose there. You simply want a pointer to a block of pointers which you can grow. So char **arr; is all that is needed. Other than that you need a counter to keep track of now many strings are stored, and for the actual read of the string, simply use a fixed size buffer with fgets() or use POSIX getline(). So your setup for beginning to read strings could be:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define MAXC 1024 /* if you need a constant, #define one (or more) */
int main (void) {
char buf[MAXC], /* buffer to hold each line read */
**arr = NULL; /* pointer-to-pointer-to char */
size_t nstr = 0; /* counter for number of strings stored */
...
Above you will read all input into buf, then you can trim the '\n' included by fgets(), obtaining the length of the input so you can then allocate length + 1 characters of storage for the string (+1 for the nul-terminating character) assigning the beginning address of the new block to the next available pointer. You then copy from buf to your newly allocated block and increment your counter.
(A simple way to trim the '\n' and obtain the length of the input is with strcspn())
Putting those pieces together, your read, trim, reallocation, allocation of the new block and copy to it could be done with:
for (;;) { /* loop continually reading input */
size_t len; /* var to hold length of string after \n removal */
fputs ("enter string (EOF to quit): ", stdout); /* prompt */
if (!fgets (buf, MAXC, stdin)) { /* read line (or EOF) */
puts ("(all done)\n");
break;
}
buf[(len = strcspn (buf, "\r\n"))] = 0; /* trim \n, save len */
/* always realloc using temp pointer to avoid mem-leak on realloc failure. */
void *tmp = realloc (arr, (nstr + 1) * sizeof *arr); /* realloc pointers */
if (!tmp) { /* VALIDATE EVERY ALLOCATION */
perror ("realloc-tmp");
break;
}
arr = tmp; /* assign reallocated block of pointers to arr */
if (!(arr[nstr] = malloc (len + 1))) { /* allocate/validate block for string */
perror ("malloc-arr[nstr]");
break;
}
memcpy (arr[nstr++], buf, len + 1); /* copy string to allocated block */
}
(note: the manual EOF is generated by the user pressing Ctrl + d -- or Ctrl + z on windows)
All that remains is outputting the stored string (or using them however you need) and then freeing the allocated memory:
for (size_t i = 0; i < nstr; i++) { /* loop outputting strings */
puts (arr[i]);
free (arr[i]); /* free string */
}
free (arr); /* free pointers */
}
If you put the whole program together you would have:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define MAXC 1024 /* if you need a constant, #define one (or more) */
int main (void) {
char buf[MAXC], /* buffer to hold each line read */
**arr = NULL; /* pointer-to-pointer-to char */
size_t nstr = 0; /* counter for number of strings stored */
for (;;) { /* loop continually reading input */
size_t len; /* var to hold length of string after \n removal */
fputs ("enter string (EOF to quit): ", stdout); /* prompt */
if (!fgets (buf, MAXC, stdin)) { /* read line (or EOF) */
puts ("(all done)\n");
break;
}
buf[(len = strcspn (buf, "\r\n"))] = 0; /* trim \n, save len */
/* always realloc using temp pointer to avoid mem-leak on realloc failure. */
void *tmp = realloc (arr, (nstr + 1) * sizeof *arr); /* realloc pointers */
if (!tmp) { /* VALIDATE EVERY ALLOCATION */
perror ("realloc-tmp");
break;
}
arr = tmp; /* assign reallocated block of pointers to arr */
if (!(arr[nstr] = malloc (len + 1))) { /* allocate/validate block for string */
perror ("malloc-arr[nstr]");
break;
}
memcpy (arr[nstr++], buf, len + 1); /* copy string to allocated block */
}
for (size_t i = 0; i < nstr; i++) { /* loop outputting strings */
puts (arr[i]);
free (arr[i]); /* free string */
}
free (arr); /* free pointers */
}
(note: while you would normally what to allocate blocks of pointers keeping track of the number available and number used and only reallocating when used == available to minimize the number of reallocations needed -- when taking user input, there is no efficiency to be gained due to the delay caused by the user in pecking out the input. But know you can optimize by minimizing the number of reallocations needed)
Example Use/Output
$ ./bin/allocate_p2p
enter string (EOF to quit): My dog
enter string (EOF to quit): has fleas...
enter string (EOF to quit): My cat
enter string (EOF to quit): has none...
enter string (EOF to quit): Lucky cat!
enter string (EOF to quit): (all done)
My dog
has fleas...
My cat
has none...
Lucky cat!
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to ensure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/allocate_p2p
==16556== Memcheck, a memory error detector
==16556== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==16556== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==16556== Command: ./bin/allocate_p2p
==16556==
enter string (EOF to quit): My dog
enter string (EOF to quit): has fleas...
enter string (EOF to quit): My cat
enter string (EOF to quit): has none...
enter string (EOF to quit): Lucky cat!
enter string (EOF to quit): (all done)
My dog
has fleas...
My cat
has none...
Lucky cat!
==16556==
==16556== HEAP SUMMARY:
==16556== in use at exit: 0 bytes in 0 blocks
==16556== total heap usage: 12 allocs, 12 frees, 2,218 bytes allocated
==16556==
==16556== All heap blocks were freed -- no leaks are possible
==16556==
==16556== For counts of detected and suppressed errors, rerun with: -v
==16556== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
Look things over and let me know if I read between the lines correctly as to what you needed, or if I missed. I'm happy to help further if what you need is slightly different. However looking at what you were trying to do, this seemed like what you were aiming at. Let me know if you have further questions.
Related
The script successfully prints the text file however I want to store what is in the text file into an array, I have looked a lot of places but I am not exactly understanding what information I have come across, is there anyway I can get some guidance?
#include <stdlib.h>
int main()
{
// OPENS THE FILE
FILE *fp = fopen("/classes/cs3304/cs330432/Programs/StringerTest/people.txt", "r");
size_t len = 1000;
char *word = malloc(sizeof(char) * len);
// CHECKS IF THE FILE EXISTS, IF IT DOESN'T IT WILL PRINT OUT A STATEMENT SAYING SO
if (fp == NULL)
{
printf("file not found");
return 0;
}
while(fgets(word, len, fp) != NULL)
{
printf("%s", word);
}
free(word);
}
the text file has the following in it(just a list of words):
endorse
vertical
glove
legend
scenario
kinship
volunteer
scrap
range
elect
release
sweet
company
solve
elapse
arrest
witch
invasion
disclose
professor
plaintiff
definition
bow
chauvinist
Let's see if we can't get you straightened out. First, you are thinking in the right direction, and you should be commended for using fgets() to read each line into a fixed buffer (character array), and then you need to collect and store all of the lines so that they are available for use by your program -- that appears to be where the wheels fell off.
Basic Outline of Approach
In an overview, when you want to handle an unlimited number of lines, you have two different types of blocks of memory you are going to allocate and manage. The first is a block of memory you allocate that will hold some number of pointers (one for each line you will store). It doesn't matter how many you initially allocate, because you will keep track of the number allocated (number available) and the number used. When (used == available) you will realloc() a bigger block of memory to hold more pointers and keep on going.
The second type block of memory you will handle is the storage for each line. No mystery there. You will allocate storage for each character (+1 for the null-terminating character) and you will copy the line from your fixed buffer to the allocated block.
The two blocks of memory work together, because to create your collection, you simply assign the address for the block of memory holding the line of data to the next available pointer.
Let's think through a short example where we declare char **lines; as the pointer to the block of memory holding pointers. Then say we allocate two-pointers initially, we have valid pointers available for lines[0] and lines[1]. We track the number of pointers available with nptrs and the number used with used. So initially nptrs = 2; and used = 0;.
When we read our first line with fgets(), we will trim the '\n' from the end of the string and then get the length of the string (len = strlen(buffer);). We can then allocate storage for the string assigning the address of the allocated block to our first pointer, e.g.
lines[used] = malloc (len + 1);
and then copy the contents of buffer to lines[0], e.g.
memcpy (lines[used], buffer, len + 1);
(note: there is no reason to call strcpy() and have it scan for end-of-string again, we already know how many characters to copy -- including the nul-terminating character)
Finally, all that is needed to keep our counters happy is to increment used by one. We store the next line the same way, and on the 3rd iteration used == nptrs so we realloc() more pointers (generally just doubling the number of pointers each time a realloc() is required). That is a good balance between calls to realloc() and growth of the number of pointers -- but you are free to increment the allocation any way you like -- but avoid calling realloc() for every line.
So you keep reading lines, checking if realloc() is required, reallocating if needed, and allocating for each line assigning the starting address to each of your pointers in turn. The only additional note is that when you realloc() you always use a temporary pointer so when realloc() fails and returns NULL, you do not overwrite your original pointer with NULL losing the starting address to the block of memory holding pointers -- creating a memory leak.
Implementation
The details were left out of the overview, so let's look at a short example to read an unknown number of lines from a file (each line being 1024 characters or less) and storing each line in a collection using a pointer-to-pointer to char as described above. Don't use Magic-Numbers in your code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAXC 1024 /* if you need a constant, #define one (or more) */
#define NPTRS 2 /* initial no. of pointers to allocate (lines) */
Don't hardcode Filenames in your code either, that was argc and argv are for in int main (int argc, char **argv). Pass the filename to read as the first argument to the program (or read from stdin by default if no argument is given):
int main (int argc, char **argv) {
char buf[MAXC], /* fixed buffer to read each line */
**lines = NULL; /* pointer to pointer to hold collection of lines */
size_t nptrs = NPTRS, /* number of pointers available */
used = 0; /* number of pointers used */
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
(note: you should not need to recompile your program just to read from a different filename)
Now allocate and Validate your initial number of pointers
/* allocate/validate block holding initial nptrs pointers */
if ((lines = malloc (nptrs * sizeof *lines)) == NULL) {
perror ("malloc-lines");
exit (EXIT_FAILURE);
}
Read each line and trim the '\n' from the end and get the number of characters that remaining after the '\n' has been removed (you can use strcspn() to do it all at once):
while (fgets (buf, MAXC, fp)) { /* read each line into buf */
size_t len;
buf[(len = strcspn (buf, "\n"))] = 0; /* trim \n, save length */
Next we check if a reallocation is needed and if so reallocate using a temporary pointer:
if (used == nptrs) { /* check if realloc of lines needed */
/* always realloc using temporary pointer (doubling no. of pointers) */
void *tmp = realloc (lines, (2 * nptrs) * sizeof *lines);
if (!tmp) { /* validate reallocation */
perror ("realloc-lines");
break; /* don't exit, lines still good */
}
lines = tmp; /* assign reallocated block to lines */
nptrs *= 2; /* update no. of pointers allocatd */
/* (optionally) zero all newly allocated memory here */
}
Now allocate and Validate the storage for the line and copy the line to the new storage, incrementing used when done -- completing your read-loop.
/* allocate/validate storage for line */
if (!(lines[used] = malloc (len + 1))) {
perror ("malloc-lines[used]");
break;
}
memcpy (lines[used], buf, len + 1); /* copy line from buf to lines[used] */
used += 1; /* increment used pointer count */
}
/* (optionally) realloc to 'used' pointers to size no. of pointers exactly here */
if (fp != stdin) /* close file if not stdin */
fclose (fp);
Now you can use the lines stored in lines as needed in your program, remembering to free the memory for each line when done and then finally freeing the block of pointers, e.g.
/* use lines as needed (simply outputting here) */
for (size_t i = 0; i < used; i++) {
printf ("line[%3zu] : %s\n", i, lines[i]);
free (lines[i]); /* free line storage when done */
}
free (lines); /* free pointers when done */
}
That's all that is needed. Now you can go read the 324,000 words in /usr/share/dict/words (or perhaps on your system /var/lib/dict/words depending on distro) and you will not have any problems doing so.
Input File
A short example file:
$ cat dat/captnjack.txt
This is a tale
Of Captain Jack Sparrow
A Pirate So Brave
On the Seven Seas.
Example Use/Output
$ ./bin/fgets_lines_dyn_simple dat/captnjack.txt
line[ 0] : This is a tale
line[ 1] : Of Captain Jack Sparrow
line[ 2] : A Pirate So Brave
line[ 3] : On the Seven Seas.
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to ensure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/fgets_lines_dyn_simple dat/captnjack.txt
==8156== Memcheck, a memory error detector
==8156== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==8156== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==8156== Command: ./bin/fgets_lines_dyn_simple dat/captnjack.txt
==8156==
line[ 0] : This is a tale
line[ 1] : Of Captain Jack Sparrow
line[ 2] : A Pirate So Brave
line[ 3] : On the Seven Seas.
==8156==
==8156== HEAP SUMMARY:
==8156== in use at exit: 0 bytes in 0 blocks
==8156== total heap usage: 9 allocs, 9 frees, 5,796 bytes allocated
==8156==
==8156== All heap blocks were freed -- no leaks are possible
==8156==
==8156== For counts of detected and suppressed errors, rerun with: -v
==8156== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
The Full Code
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAXC 1024 /* if you need a constant, #define one (or more) */
#define NPTRS 2 /* initial no. of pointers to allocate (lines) */
int main (int argc, char **argv) {
char buf[MAXC], /* fixed buffer to read each line */
**lines = NULL; /* pointer to pointer to hold collection of lines */
size_t nptrs = NPTRS, /* number of pointers available */
used = 0; /* number of pointers used */
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
/* allocate/validate block holding initial nptrs pointers */
if ((lines = malloc (nptrs * sizeof *lines)) == NULL) {
perror ("malloc-lines");
exit (EXIT_FAILURE);
}
while (fgets (buf, MAXC, fp)) { /* read each line into buf */
size_t len;
buf[(len = strcspn (buf, "\n"))] = 0; /* trim \n, save length */
if (used == nptrs) { /* check if realloc of lines needed */
/* always realloc using temporary pointer (doubling no. of pointers) */
void *tmp = realloc (lines, (2 * nptrs) * sizeof *lines);
if (!tmp) { /* validate reallocation */
perror ("realloc-lines");
break; /* don't exit, lines still good */
}
lines = tmp; /* assign reallocated block to lines */
nptrs *= 2; /* update no. of pointers allocatd */
/* (optionally) zero all newly allocated memory here */
}
/* allocate/validate storage for line */
if (!(lines[used] = malloc (len + 1))) {
perror ("malloc-lines[used]");
break;
}
memcpy (lines[used], buf, len + 1); /* copy line from buf to lines[used] */
used += 1; /* increment used pointer count */
}
/* (optionally) realloc to 'used' pointers to size no. of pointers exactly here */
if (fp != stdin) /* close file if not stdin */
fclose (fp);
/* use lines as needed (simply outputting here) */
for (size_t i = 0; i < used; i++) {
printf ("line[%3zu] : %s\n", i, lines[i]);
free (lines[i]); /* free line storage when done */
}
free (lines); /* free pointers when done */
}
Look things over and let me know if you have any questions. If you also wanted to read lines of unknown length (millions of characters long), you would simply loop doing the same thing allocating and reallocating for each line until the '\n' character was found (or EOF) marking the end of the line. It is no different in principle than what we have done above for the pointers.
Considering the code provided by #David C. Rankin in this previous answer:
How to count only words that start with a Capital in a list?
How do you optimise this code to include Memory Allocation for much larger text files? With this code below it will complete for small .txt files.
However, what is the best way to set memory allocation to this code so that C (Programming Language) does not run out of memory. Is it best to use linked lists?
/**
* C program to count occurrences of all words in a file.
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <limits.h>
#define MAX_WORD 50 /* max word size */
#define MAX_WORDS 512 /* max number of words */
#ifndef PATH_MAX
#define PATH_MAX 2048 /* max path (defined for Linux in limits.h) */
#endif
typedef struct { /* use a struct to hold */
char word[MAX_WORD]; /* lowercase word, and */
int cap, count; /* if it appeast capitalized, and its count */
} words_t;
char *strlwr (char *str) /* no need for unsigned char */
{
char *p = str;
while (*p) {
*p = tolower(*p);
p++;
}
return str;
}
int main (void) {
FILE *fptr;
char path[PATH_MAX], word[MAX_WORD];
size_t i, len, index = 0;
/* Array of struct of distinct words, initialized all zero */
words_t words[MAX_WORDS] = {{ .word = "" }};
/* Input file path */
printf ("Enter file path: ");
if (scanf ("%s", path) != 1) { /* validate every input */
fputs ("error: invalid file path or cancellation.\n", stderr);
return 1;
}
fptr = fopen (path, "r"); /* open file */
if (fptr == NULL) { /* validate file open */
fputs ( "Unable to open file.\n"
"Please check you have read privileges.\n", stderr);
exit (EXIT_FAILURE);
}
while (index < MAX_WORDS && /* protect array bounds */
fscanf (fptr, "%s", word) == 1) { /* while valid word read */
int iscap = 0, isunique = 1; /* is captial, is unique flags */
if (isupper (*word)) /* is the word uppercase */
iscap = 1;
/* remove all trailing punctuation characters */
len = strlen (word); /* get length */
while (len && ispunct(word[len - 1])) /* only if len > 0 */
word[--len] = 0;
strlwr (word); /* convert word to lowercase */
/* check if word exits in list of all distinct words */
for (i = 0; i < index; i++) {
if (strcmp(words[i].word, word) == 0) {
isunique = 0; /* set unique flag zero */
if (iscap) /* if capital flag set */
words[i].cap = iscap; /* set capital flag in struct */
words[i].count++; /* increment word count */
break; /* bail - done */
}
}
if (isunique) { /* if unique, add to array, increment index */
memcpy (words[index].word, word, len + 1); /* have len */
if (iscap) /* if cap flag set */
words[index].cap = iscap; /* set capital flag in struct */
words[index++].count++; /* increment count & index */
}
}
fclose (fptr); /* close file */
/*
* Print occurrences of all words in file.
*/
puts ("\nOccurrences of all distinct words with Cap in file:");
for (i = 0; i < index; i++) {
if (words[i].cap) {
strcpy (word, words[i].word);
*word = toupper (*word);
/*
* %-15s prints string in 15 character width.
* - is used to print string left align inside
* 15 character width space.
*/
printf("%-15s %d\n", word, words[i].count);
}
}
return 0;
}
Example Use/Output
Using your posted input
$ ./bin/unique_words_with_cap
Enter file path: dat/girljumped.txt
Occurrences of all distinct words with Cap in file:
Any 7
One 4
Some 10
The 6
A 13
Since you already have an answer using a fixed-size array of struct to hold the information, changing from using the fixed-size array where storage is automatically reserved for you on the stack, to dynamically allocated storage where you can realloc as needed, simply requires initially declaring a pointer-to-type rather than array-of-type, and then allocating storage for each struct.
Where before, with a fixed-size array of 512 elements you would have:
#define MAX_WORDS 512 /* max number of words */
...
/* Array of struct of distinct words, initialized all zero */
words_t words[MAX_WORDS] = {{ .word = "" }};
When dynamically allocating, simply declare a pointer-to-type and provide an initial allocation of some reasonable number of elements, e.g.
#define MAX_WORDS 8 /* initial number of struct to allocate */
...
/* pointer to allocated block of max_words struct initialized zero */
words_t *words = calloc (max_words, sizeof *words);
(note: you can allocate with either malloc, calloc or realloc, but only calloc allocates and also sets all bytes zero. In your case since you want the .cap and .count members initialized zero, calloc is a sensible choice)
It's worth pausing a bit to understand whether you use a fixed size array or an allocated block of memory, you are accessing your data through a pointer to the first element. The only real difference is the compiler reserving storage for your array on the stack with a fixed array, and you being responsible for reserving storage for it through allocation.
Access to the elements will be exactly the same because on access, an array is converted to a pointer to the first element. See: C11 Standard - 6.3.2.1 Other Operands - Lvalues, arrays, and function designators(p3) Either way you access the memory through a pointer to the first element. When dynamically allocating, you are assigning the address of the first element to your pointer rather than the compiler reserving storage for the array. Whether it is an array with storage reserved for you, or you declare a pointer and assign an allocated block of memory to it -- how you access the elements will be identical. (pause done)
When you allocate, it is up to you to validate that the allocation succeeds. So you would follow your allocation with:
if (!words) { /* valdiate every allocation */
perror ("calloc-words");
exit (EXIT_FAILURE);
}
You are already keeping track of index telling you how many struct you have filled, you simply need to add one more variable to track how many struct are available (size_t max_words = MAX_WORDS; gives you the 2nd variable set to the initial allocation size MAX_WORDS). So your test for "Do I need to realloc now?" is simply when filled == available, or in your case if (index == max_words).
Since you now have the ability to realloc, your read loop no longer has to protect your array bounds and you can simply read each word in the file, e.g.
while (fscanf (fptr, "%s", word) == 1) { /* while valid word read */
int iscap = 0, isunique = 1; /* is captial, is unique flags */
...
Now all that remains is the index == max_words test before you fill another element. You can either place the test and realloc before the for and if blocks for handling isunique, which is fine, or you can actually place it within the if (isunique) block since technically unless you are adding a unique word, no realloc will be required. The only difference it makes is a corner-case where index == max_words and you call realloc before your for loop where the last word is not-unique, you may make one call to realloc where it wasn't technically required (think through that).
To prevent that one realloc too many, place the test and realloc immediately before the new element will be filled, e.g.
if (isunique) { /* if unique, add to array, increment index */
if (index == max_words) { /* is realloc needed? */
/* always use a temporary pointer with realloc */
void *tmp = realloc (words, 2 * max_words * sizeof *words);
if (!tmp) {
perror ("realloc-words");
break; /* don't exit, original data still valid */
}
words = tmp; /* assign reallocated block to words */
/* (optional) set all new memory to zero */
memset (words + max_words, 0, max_words * sizeof *words);
max_words *= 2; /* update max_words to reflect new limit */
}
memcpy (words[index].word, word, len + 1); /* have len */
if (iscap) /* if cap flag set */
words[index].cap = iscap; /* set capital flag in struct */
words[index++].count++; /* increment count & index */
}
Now let's look closer at the reallocation itself, e.g.
if (index == max_words) { /* is realloc needed? */
/* always use a temporary pointer with realloc */
void *tmp = realloc (words, 2 * max_words * sizeof *words);
if (!tmp) { /* validate every allocation */
perror ("realloc-words");
break; /* don't exit, original data still valid */
}
words = tmp; /* assign reallocated block to words */
/* (optional) set all new memory to zero */
memset (words + max_words, 0, max_words * sizeof *words);
max_words *= 2; /* update max_words to reflect new limit */
}
The realloc call itself is void *tmp = realloc (words, 2 * max_words * sizeof *words);. Why not just words = realloc (words, 2 * max_words * sizeof *words);? Answer: You Never realloc the pointer itself, and always use a temporary pointer. Why? realloc allocates new storage, copies the existing data to the new storage and then calls free() on the old block of memory. When (not If) realloc fails, it returns NULL and does not touch the old block of memory. If you blindly assign NULL to your exiting pointer words, you have just overwritten the address to your old block of memory with NULL creating a memory-leak because you no longer have a reference to the old block of memory and it cannot be freed. So lesson learned, Always realloc with a temporary pointer!
If realloc succeeds, what then? Pay close attention to the lines:
words = tmp; /* assign reallocated block to words */
/* (optional) set all new memory to zero */
memset (words + max_words, 0, max_words * sizeof *words);
max_words *= 2; /* update max_words to reflect new limit */
The first simply assigns the address for the new block of memory created and filled by realloc to your words pointer. (`words now points to a block of memory with twice as many elements as it had before).
The second line -- recall, realloc and malloc do not initialize the new memory to zero, if you want to initialize the memory zero, (which for your .cap and .count members is really helpful, you have to do that yourself with memset. So what needs to be set to zero? All the memory what wasn't in your original block. Where is that? Well, it starts at words + max_words. How many zeros do I have to write? You have to fill all memory above words + max_words to the end of the block. Since you doubled the size, you simply have to zero what was the original size starting at words + max_words which is max_words * sizeof *words bytes of memory. (remember we used 2 * max_words * sizeof *words as the new size, and we have NOT updated max_words yet, so it still holds the original size)
Lastly, now it is time to update max_words. Here just make it match whatever you added to your allocation in realloc above. I simply doubled the size of the current allocation each time realloc is called, so to update max_words to the new allocation size, you simply multiply by 2 with max_words *= 2;. You can add as little or a much memory as you like each time. You could scale by 3/2., you could add a fixed number of elements (say 10), it is completely up to you, but avoid calling realloc to add 1-element each time. You can do it, but allocation and reallocation are relatively expensive operations, so better to add a reasonably sized block each time you realloc, and doubling is a reasonable balance between memory growth and the number of times realloc is called.
Putting it altogether, you could do:
/**
* C program to count occurrences of all words in a file.
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <limits.h>
#define MAX_WORD 50 /* max word size */
#define MAX_WORDS 8 /* initial number of struct to allocate */
#ifndef PATH_MAX
#define PATH_MAX 2048 /* max path (defined for Linux in limits.h) */
#endif
typedef struct { /* use a struct to hold */
char word[MAX_WORD]; /* lowercase word, and */
int cap, count; /* if it appeast capitalized, and its count */
} words_t;
char *strlwr (char *str) /* no need for unsigned char */
{
char *p = str;
while (*p) {
*p = tolower(*p);
p++;
}
return str;
}
int main (void) {
FILE *fptr;
char path[PATH_MAX], word[MAX_WORD];
size_t i, len, index = 0, max_words = MAX_WORDS;
/* pointer to allocated block of max_words struct initialized zero */
words_t *words = calloc (max_words, sizeof *words);
if (!words) { /* valdiate every allocation */
perror ("calloc-words");
exit (EXIT_FAILURE);
}
/* Input file path */
printf ("Enter file path: ");
if (scanf ("%s", path) != 1) { /* validate every input */
fputs ("error: invalid file path or cancellation.\n", stderr);
return 1;
}
fptr = fopen (path, "r"); /* open file */
if (fptr == NULL) { /* validate file open */
fputs ( "Unable to open file.\n"
"Please check you have read privileges.\n", stderr);
exit (EXIT_FAILURE);
}
while (fscanf (fptr, "%s", word) == 1) { /* while valid word read */
int iscap = 0, isunique = 1; /* is captial, is unique flags */
if (isupper (*word)) /* is the word uppercase */
iscap = 1;
/* remove all trailing punctuation characters */
len = strlen (word); /* get length */
while (len && ispunct(word[len - 1])) /* only if len > 0 */
word[--len] = 0;
strlwr (word); /* convert word to lowercase */
/* check if word exits in list of all distinct words */
for (i = 0; i < index; i++) {
if (strcmp(words[i].word, word) == 0) {
isunique = 0; /* set unique flag zero */
if (iscap) /* if capital flag set */
words[i].cap = iscap; /* set capital flag in struct */
words[i].count++; /* increment word count */
break; /* bail - done */
}
}
if (isunique) { /* if unique, add to array, increment index */
if (index == max_words) { /* is realloc needed? */
/* always use a temporary pointer with realloc */
void *tmp = realloc (words, 2 * max_words * sizeof *words);
if (!tmp) { /* validate every allocation */
perror ("realloc-words");
break; /* don't exit, original data still valid */
}
words = tmp; /* assign reallocated block to words */
/* (optional) set all new memory to zero */
memset (words + max_words, 0, max_words * sizeof *words);
max_words *= 2; /* update max_words to reflect new limit */
}
memcpy (words[index].word, word, len + 1); /* have len */
if (iscap) /* if cap flag set */
words[index].cap = iscap; /* set capital flag in struct */
words[index++].count++; /* increment count & index */
}
}
fclose (fptr); /* close file */
/*
* Print occurrences of all words in file.
*/
puts ("\nOccurrences of all distinct words with Cap in file:");
for (i = 0; i < index; i++) {
if (words[i].cap) {
strcpy (word, words[i].word);
*word = toupper (*word);
/*
* %-15s prints string in 15 character width.
* - is used to print string left align inside
* 15 character width space.
*/
printf("%-15s %d\n", word, words[i].count);
}
}
free (words);
return 0;
}
Example Use/Output
Where with your sample data you would get:
$ ./bin/unique_words_with_cap_dyn
Enter file path: dat/girljumped.txt
Occurrences of all distinct words with Cap in file:
Any 7
One 4
Some 10
The 6
A 13
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/unique_words_with_cap_dyn
==7962== Memcheck, a memory error detector
==7962== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==7962== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==7962== Command: ./bin/unique_words_with_cap_dyn
==7962==
Enter file path: dat/girljumped.txt
Occurrences of all distinct words with Cap in file:
Any 7
One 4
Some 10
The 6
A 13
==7962==
==7962== HEAP SUMMARY:
==7962== in use at exit: 0 bytes in 0 blocks
==7962== total heap usage: 4 allocs, 4 frees, 3,912 bytes allocated
==7962==
==7962== All heap blocks were freed -- no leaks are possible
==7962==
==7962== For counts of detected and suppressed errors, rerun with: -v
Above you can see there were 4 allocations and 4 frees (original allocation of 8, realloc at 8, 16 & 32) and you can see there were 0 errors.
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
Look things over and let me know if you have any questions.
However, what is the best way to set memory allocation to this code so that C (Programming Language) does not run out of memory.
Notice that most computers, even cheap laptops, have quite a lot of RAM. In practice, you could expect to be able to allocate at least a gigabyte of memory. That is a lot for textual file processing!
A large human-written text file is the Bible. As a rule of thumb, that text takes about 16 megabytes (to a factor of two). For most computers, that is a quite small amount of memory today (my AMD2970WX has more than that in its CPU cache).
Is it best to use linked lists?
The practical consideration is more algorithmic time complexity than memory consumption. For example, searching something in a linked list has linear time. And going thru a list of a million words does take some time (even if computers are fast).
You may want to read more about:
flexible array members (use that instead in your word_t).
string duplication routines like strdup or asprintf. Even if you don't have them, reprogramming them is a fairly easy task.
But you still want to avoid memory leaks and also, and even more importantly, undefined behavior.
Read How to debug small programs. Tools like valgrind, the clang static analyzer, the gdb debugger, the address sanitizer, etc.. are very useful to learn and use.
At last, read carefully, and in full, Norvig's Teach yourself programming in 10 years. That text is thought provoking, and its appendix at least is surprisingly close to your questions.
PS. I leave you to guess and estimate the total amount of text, in bytes, you are capable of reading during your entire life. That size is surprisingly small and probably fits in any smartphone today. On today's devices, text is really cheap. Photos and videos are not.
NB. "What is the best way" types of question are too broad, off-topic here, matter of opinion, and related to P vs NP question. Rice's theorem and to the halting problem. These questions usually have no clear answer and are supposed to be unsolvable: it is often difficult to prove that a better answer could not be thought of in a dozen of years (even if, for some such questions, you could get a proof today: e.g. sorting is proved today to require at least O(n log n) time.).
Is there a function which I can use that which will allow me to replace a specific texts.
For example:
char *test = "^Hello world^"; would be replaced with char *test = "<s>Hello world</s>";
Another example: char *test2 = "This is ~my house~ bud" would be replaced with char *test2 = "This is <b>my house</b> bud"
Before you can begin to replace substrings within a string, you have to understand what you are dealing with. In your example you want to know whether you can replace characters within a string, and you give as an example:
char *test = "^Hello world^";
By being declared and initialized as shown above test, is a string-literal created in read-only memory (on virtually all systems) and any attempt to modify characters stored in read-only memory invokes Undefined Behavior (and most likely a Segmentation Fault)
As noted in the comments, test could be declared and initialized as a character array, e.g. char test[] = "^Hello world^"; and insure that test is modifiable, but that does not address the problem where your replacement strings are longer than the substrings being replaced.
To handle the additional characters, you have two options (1) you can declare test[] to be sufficiently large to accommodate the substitutions, or (2) you can dynamically allocate storage for the replacement string, and realloc additional memory if you reach your original allocation limit.
For instance if you limit the code associated with test to a single function, you could declare test with a sufficient number of characters to handle the replacements, e.g.
#define MAXC 1024 /* define a constant for the maximum number of characters */
...
test[MAXC] = "^Hello world^";
You would then simply need to keep track of the original string length plus the number of character added with each replacement and insure that the total never exceeds MAXC-1 (reserving space for the nul-terminating character).
However, if you decided to move the replacement code to a separate function -- you now have the problem that you cannot return a pointer to a locally declared array (because the locally declared array is declared within the function stack space -- which is destroyed (released for reuse) when the function returns) A locally declared array has automatic storage duration. See: C11 Standard - 6.2.4 Storage durations of objects
To avoid the problem of a locally declared array not surviving the function return, you can simply dynamically allocate storage for your new string which results in the new string having allocated storage duration which is good for the life of the program, or until the memory is freed by calling free(). This allows you to declare and allocate storage for a new string within a function, make your substring replacements, and then return a pointer to the new string for use back in the calling function.
For you circumstance, a simple declaration of a new string within a function and allocating twice the amount of storage as the original string is a reasonable approach to take. (you still must keep track of the number of bytes of memory you use, but you then have the ability to realloc additional memory if you should reach your original allocation limit) This process can continue and accommodate any number of strings and substitutions, up to the available memory on your system.
While there are a number of ways to approach the substitutions, simply searching the original string for each substring, and then copying the text up to the substring to the new string, then copying the replacement substring allows you to "inch-worm" from the beginning to the end of your original string making replacement substitutions as you go. The only challenge you have is keeping track of the number of characters used (so you can reallocate if necessary) and advancing your read position within the original from the beginning to the end as you go.
Your example somewhat complicates the process by needing to alternate between one of two replacement strings as you work your way down the string. This can be handled with a simple toggle flag. (a variable you alternate 0,1,0,1,...) which will then determine the proper replacement string to use where needed.
The ternary operator (e.g. test ? if_true : if_false; can help reduce the number of if (test) { if_true; } else { if_false; } blocks you have sprinkled through your code -- it's up to you. If the if (test) {} format is more readable to you -- use that, otherwise, use the ternary.
The following example takes the (1) original string, (2) the find substring, (3) the 1st replacement substring, and (4) the 2nd replacement substring as arguments to the program. It allocates for the new string within the strreplace() function, makes the substitutions requested and returns a pointer to the new string to the calling function. The code is heavily commented to help you follow along, e.g.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/* replace all instances of 'find' in 's' with 'r1' and `r2`, alternating.
* allocate memory, as required, to hold string with replacements,
* returns allocated string with replacements on success, NULL otherwise.
*/
char *strreplace (const char *s, const char *find,
const char *r1, const char *r2)
{
const char *p = s, /* pointer to s */
*sp = s; /* 2nd substring pointer */
char *newstr = NULL, /* newsting pointer to allocate/return */
*np = newstr; /* pointer to newstring to fill */
size_t newlen = 0, /* length for newstr */
used = 0, /* amount of allocated space used */
slen = strlen (s), /* length of s */
findlen = strlen (find), /* length of find string */
r1len = strlen (r1), /* length of replace string 1 */
r2len = strlen (r2); /* length of replace string 2 */
int toggle = 0; /* simple 0/1 toggle flag for r1/r2 */
if (s == NULL || *s == 0) { /* validate s not NULL or empty */
fputs ("strreplace() error: input NULL or empty\n", stderr);
return NULL;
}
newlen = slen * 2; /* double length of s for newstr */
newstr = calloc (1, newlen); /* allocate twice length of s */
if (newstr == NULL) { /* validate ALL memory allocations */
perror ("calloc-newstr");
return NULL;
}
np = newstr; /* initialize newpointer to newstr */
/* locate each substring using strstr */
while ((sp = strstr (p, find))) { /* find beginning of each substring */
size_t len = sp - p; /* length to substring */
/* check if realloc needed? */
if (used + len + (toggle ? r2len : r1len) + 1 > newlen) {
void *tmp = realloc (newstr, newlen * 2); /* realloc to temp */
if (!tmp) { /* validate realloc succeeded */
perror ("realloc-newstr");
return NULL;
}
newstr = tmp; /* assign realloc'ed block to newstr */
newlen *= 2; /* update newlen */
}
strncpy (np, p, len); /* copy from pointer to substring */
np += len; /* advance newstr pointer by len */
*np = 0; /* nul-terminate (already done by calloc) */
strcpy (np, toggle ? r2 : r1); /* copy r2/r1 string to end */
np += toggle ? r2len : r1len; /* advance newstr pointer by r12len */
*np = 0; /* <ditto> */
p += len + findlen; /* advance p by len + findlen */
used += len + (toggle ? r2len : r1len); /* update used characters */
toggle = toggle ? 0 : 1; /* toggle 0,1,0,1,... */
}
/* handle segment of s after last find substring */
slen = strlen (p); /* get remaining length */
if (slen) { /* if not at end */
if (used + slen + 1 > newlen) { /* check if realloc needed? */
void *tmp = realloc (newstr, used + slen + 1); /* realloc */
if (!tmp) { /* validate */
perror ("realloc-newstr");
return NULL;
}
newstr = tmp; /* assign */
newlen += slen + 1; /* update (not required here, know why? */
}
strcpy (np, p); /* add final segment to string */
*(np + slen) = 0; /* nul-terminate */
}
return newstr; /* return newstr */
}
int main (int argc, char **argv) {
const char *s = NULL,
*find = NULL,
*r1 = NULL,
*r2 = NULL;
char *newstr = NULL;
if (argc < 5) { /* validate required no. or arguments given */
fprintf (stderr, "error: insufficient arguments,\n"
"usage: %s <find> <rep1> <rep2>\n", argv[0]);
return 1;
}
s = argv[1]; /* assign arguments to poitners */
find = argv[2];
r1 = argv[3];
r2 = argv[4];
newstr = strreplace (s, find, r1, r2); /* replace substrings in s */
if (newstr) { /* validate return */
printf ("oldstr: %s\nnewstr: %s\n", s, newstr);
free (newstr); /* don't forget to free what you allocate */
}
else { /* handle error */
fputs ("strreplace() returned NULL\n", stderr);
return 1;
}
return 0;
}
(above, the strreplace function uses pointers to walk ("inch-worm") down the original string making replacement, but you can use string indexes and index variables if that makes more sense to you)
(also note the use of calloc for the original allocation. calloc allocates and sets the new memory to all zero which can aid in insuring you don't forget to nul-terminate your string, but note any memory added by realloc will not be zeroed -- unless you manually zero it with memset or the like. The code above manually terminates the new string after each copy, so you can use either malloc or calloc for the allocation)
Example Use/Output
First example:
$ ./bin/str_substr_replace2 "^Hello world^" "^" "<s>" "</s>"
oldstr: ^Hello world^
newstr: <s>Hello world</s>
Second example:
$ ./bin/str_substr_replace2 "This is ~my house~ bud" "~" "<b>" "</b>"
oldstr: This is ~my house~ bud
newstr: This is <b>my house</b> bud
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/str_substr_replace2 "This is ~my house~ bud" "~" "<b>" "</b>"
==8962== Memcheck, a memory error detector
==8962== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==8962== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==8962== Command: ./bin/str_substr_replace2 This\ is\ ~my\ house~\ bud ~ \<b\> \</b\>
==8962==
oldstr: This is ~my house~ bud
newstr: This is <b>my house</b> bud
==8962==
==8962== HEAP SUMMARY:
==8962== in use at exit: 0 bytes in 0 blocks
==8962== total heap usage: 1 allocs, 1 frees, 44 bytes allocated
==8962==
==8962== All heap blocks were freed -- no leaks are possible
==8962==
==8962== For counts of detected and suppressed errors, rerun with: -v
==8962== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
Look things over and let me know if you have any further questions.
I am trying to create a dynamic character "string pointer"/array and my code will not print values is the characters typed in exceed 249 characters. I am just wondering if there is a maximum return length for a character array/"string pointer".
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char *input() {
char *inpt;
char check;
int i;
int count;
i = 0;
count = 1;
check = '\0';
inpt = (char*)malloc(1);
while (check != '\n') {
check = getchar();
if (check == '\n') {
break;
} else {
inpt[i] = check;
i++;
count++;
inpt = realloc(inpt, count);
}
}
inpt[i] = '\0';
char *retrn;
retrn = inpt;
free(inpt);
printf("%d \n", i);
return retrn;
}
int main(int argc, char **argv) {
char *name;
printf("Please print name: \n");
name = input();
printf("%s is the name \n", name);
return 0;
}
The problem is not with the length of the string you attempt to return, but that you return a pointer to memory that no longer is allocated to you:
char *retrn;
retrn = inpt;
free(inpt);
return retrn;
When you do retrn = inpt you don't copy the memory, instead you have two pointers pointing to the same memory. Then you free that memory and return a pointer to the newly free'd memory. That pointer can't of course not be dereferenced and any attempt of doing that will lead to undefined behavior.
The solution is not any temporary variable like retrn, but to simply not free the memory in the input function. Instead return inpt and in the calling function (main in your case) you free the memory.
Continuing from my comment, there are a number of schemes to allocate memory dynamically. One thing you want to avoid from an efficiency standpoint is needlessly reallocating for every character. Rather than call realloc for every character added to name, allocate a reasonable number of characters to hold name, and if you reach that amount, then reallocate, doubling the current allocation size, update your variable holding the current size and keep going.
You already have an array index, so there is no need to keep a separate count. Just use your array index as the counter, insuring your have at least index + 1 characters available to provide space to nul-terminate inpt.
There is no need to keep separate pointers in input(). Just allocate for inpt and return inpt as the pointer to your block of memory when done. (don't forget to free (name); in main() which will free the memory you allocated in input.
Never realloc the pointer directly. (e.g. DON'T inpt = realloc (inpt, size);) If realloc fails it returns NULL causing the loss of a pointer to to the allocated block inpt referenced prior to the realloc call. Instead use a temporary pointer, validate that realloc succeeded, and then assign the new block to inpt (example below)
Putting it altogether, you could do something similar to:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MEMSZ 32 /* initial allocation size (must be at least 1) */
char *input (void)
{
char *inpt = NULL, check;
size_t mem = MEMSZ, ndx = 0;
if (!(inpt = malloc (mem))) { /* allocate/validate mem chars */
fprintf (stderr, "input() error: virtual memory exhausted.\n");
return NULL;
}
/* you must check for EOF in addition to '\n' */
while ((check = getchar()) && check != '\n' && check != EOF)
{ /* check index + 1 to insure space to nul-terminate */
if (ndx + 1 == mem) { /* if mem limit reached realloc */
void *tmp = realloc (inpt, mem * 2); /* use tmp ptr */
if (!tmp) { /* validate reallocation */
fprintf (stderr, "realloc(): memory exhausted.\n");
break; /* on failure, preserve existing chars */
}
inpt = tmp; /* assign new block of memory to inpt */
mem *= 2; /* set mem to new allocaiton size */
}
inpt[ndx++] = check; /* assign, increment index */
}
inpt[ndx] = 0; /* nul-terminate */
return inpt; /* return pointer to allocated block */
}
int main (void)
{
char *name = NULL;
printf ("Please enter name: ");
if (!(name = input())) /* validate input() succeeded */
return 1;
printf ("You entered : %s\n", name);
free (name); /* don't forget to free name */
return 0;
}
Example Use/Output
$ ./bin/entername
Please enter name: George Charles Butte
You entered : George Charles Butte
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to insure you do not attempt to write beyond/outside the bounds of your allocated block of memory, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/entername
==2566== Memcheck, a memory error detector
==2566== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==2566== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==2566== Command: ./bin/entername
==2566==
Please enter name: George Charles Butte
You entered : George Charles Butte
==2566==
==2566== HEAP SUMMARY:
==2566== in use at exit: 0 bytes in 0 blocks
==2566== total heap usage: 1 allocs, 1 frees, 32 bytes allocated
==2566==
==2566== All heap blocks were freed -- no leaks are possible
==2566==
==2566== For counts of detected and suppressed errors, rerun with: -v
==2566== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
Let me know if you have any additional questions.
It's most likely due to using free memory. Your assignment input to retrn is not creating another copy. You'll get undefined behaviour, perhaps including what you are experiencing.
Is there a maximum return length for a character array in C
The maximum size of a character array is SIZE_MAX. SIZE_MAX is at least 65535.
I am just wondering if there is a maximum return length for a character array/"string pointer".
For a string, its maximum size is SIZE_MAX and the maximum length is SIZE_MAX - 1.
There are intrinsic limits for the size of an array:
available memory is limited: malloc() and realloc() may return NULL if the request cannot be honored due to lack of core memory. You should definitely check for malloc() and realloc() success.
system quotas may limit the amount of memory available to your process to a lower number than actual physical memory installed or virtual memory accessible in the system.
the maximum size for an array is the maximum value for the type size_t: SIZE_MAX which has a minimum value of 65535, but you use type int for your requests to malloc() or realloc(), that may have a smaller range than size_t. Type int is 32 bits on most current desktop systems where size_t may be 64 bits and available memory may be much more than 2GB. Use size_t instead of int.
Note however that your problem comes from a much simpler bug: you free the memory block you allocated for the string and return a copy of the pointer, which now points to freed memory. Accessing this memory has undefined behavior, which can be anything, including apparent correct behavior upto 249 bytes and failure beyond.
Note also that you should use type int for check and compare the return value of getchar() to EOF to avoid an endless loop if the input does not contain a newline (such as en empty file).
Here is a corrected version:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char *input(void) {
char *p = malloc(1); /* simplistic reallocation, 1 byte at a time */
size_t i = 0; /* use size_t for very large input */
int c; /* use int to detect EOF reliably */
if (p == NULL) {
return NULL; /* allocation error */
}
while ((c = getchar()) != EOF && c != '\n') {
char *newp = realloc(p, i + 2);
if (newp == NULL) {
free(p); /* avoid a memory leak */
return NULL; /* reallocation error */
}
p = newp;
p[i++] = c;
}
if (i == 0 && c == EOF) {
free(p);
return NULL; /* end of file */
}
p[i] = '\0';
return p;
}
int main(int argc, char **argv) {
char *name;
printf("Please print name: ");
name = input();
if (name == NULL) {
printf("input() returned NULL\n");
} else {
printf("%s is the name\n", name);
free(name);
}
return 0;
}
I have to get names with a known number of names from input as one string each separated by a space, I have to dynamically allocate memory for an array of strings where each string gets a name,
char** names;
char ch;
names = malloc(N*sizeof(char*); /*N is defined*/
for(i=0; i<N; i++) {
Now I have to allocate for each string without using a defined number:
i=0, j=0;
while ((ch=getchar) != '\n') {
while (ch != ' ') {
names[i][j++] = ch;
}
if (ch == ' ') {
names[i][j] = '\0';
i++}}
if (ch == '\n')
names[i][j] = '\0';
This is the classic question of how do I handle dynamic allocation and reallocation to store an unknown number of strings. (with a twist to separate each string into individual tokens before saving to the array) It is worth understanding this process in detail as it will serve as the basis for just about any any other circumstance where you are reading an unknown number of values (whether they are structs, floats, characters, etc...).
There are a number of different types of data structures you can employ, lists, trees, etc., but the basic approach is by creating an array of pointer-to-pointer-to-type (with type being char in this case) and then allocating space for, filling with data, and assigning the starting address for the new block of memory to each pointer as your data is read. The short-hand for pointer-to-pointer-to-type is simply double-pointer (e.g. char **array;, which is technically a pointer-to-pointer-to-char or pointer-to-char* if you like)
The general, and efficient, approach to allocating memory for an unknown number of lines is to first allocate a reasonably anticipated number of pointers (1 for each anticipated token). This is much more efficient than calling realloc and reallocating the entire collection for every token you add to your array. Here, you simply keep a counter of the number of tokens added to your array, and when you reach your original allocation limit, you simmply reallocate twice the number of pointers you currenly have. Note, you are free to add any incremental amount you choose. You can simply add a fixed amount each time, or you can use some scaled multiple of the original -- it's up to you. The realloc to twice the current is just one of the standard schemes.
What is "a reasonably anticipated number of pointers?" It's no precise number. You simply want to take an educated guess at the number of tokens you roughtly expect and use that as an initial number for allocating pointers. You wouldn't want to allocate 10,000 pointers if you only expect 100. That would be horribly wasteful. Reallocation will take care of any shortfall, so a rough guess is all that is needed. If you truly have no idea, then allocate some reasonable number, say 64 or 128, etc.. You can simply declare the limit as a constant at the beginning of your code, so it is easily adjusted. e.g.:
#declare MAXPTR 128
or accomplish the same thing using an anonymous enum
enum { MAXPTR = 128 };
When allocating your pointers originally, and as part of your reallocation, you can benefit by setting each pointer to NULL. This is easily accomplished for the original allocation. Simply use calloc instead of malloc. On reallocation, it requires that you set all new pointers allocated to NULL. The benefit it provides is the first NULL acts as a sentinel indicating the point at which your valid pointers stop. As long as you insure you have at least one NULL preserved as a sentinel, you can iterate without the benefit of knowing precise number of pointers filled. e.g.:
size_t i = 0;
while (array[i]) {
... do your stuff ...
}
When you are done using the allocated memory, you want to insure you free the memory. While in a simple piece of code, the memory is freed on exit, get in the habit of tracking the memory you allocate and freeing it when it is no longer needed.
As for this particular task, you will want to read a line of unknown number of characters into memory and then tokenize (separate) the string into tokens. getline will read and allocate memory sufficient to hold any size character string. You can do the same thing with any of the other input functions, you just have to code the repeated checks and reallocations yourself. If getline is available (it is in every modern compier), use it. Then it is just a matter of separating the input into tokens with strtok or strsep. You will then want to duplicate the each token to preserve each token in its own block of memory and assign the location to your array of tokens. The following provides a short example.
Included in the example are several helper functions for opening files, allocating and reallocating. All they do is simple error checking which help keep the main body of your code clean and readable. Look over the example and let me know if you have any questions.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAXL 64 /* initial number of pointers */
/* simple helper/error check functions */
FILE *xfopen (const char *fn, const char *mode);
void *xcalloc (size_t n, size_t s);
void *xrealloc_dp (void *ptr, size_t *n);
int main (int argc, char **argv) {
char **array = NULL;
char *line = NULL;
size_t i, idx = 0, maxl = MAXL, n = 0;
ssize_t nchr = 0;
FILE *fp = argc > 1 ? xfopen (argv[1], "r") : stdin;
array = xcalloc (maxl, sizeof *array); /* allocate maxl pointers */
while ((nchr = getline (&line, &n, fp)) != -1)
{
while (nchr > 0 && (line[nchr-1] == '\r' || line[nchr-1] == '\n'))
line[--nchr] = 0; /* strip carriage return or newline */
char *p = line; /* pointer to use with strtok */
for (p = strtok (line, " \n"); p; p = strtok (NULL, " \n")) {
array[idx++] = strdup (p); /* allocate & copy */
/* check limit reached - reallocate */
if (idx == maxl) array = xrealloc_dp (array, &maxl);
}
}
free (line); /* free memory allocated by getline */
if (fp != stdin) fclose (fp);
for (i = 0; i < idx; i++) /* print all tokens */
printf (" array[%2zu] : %s\n", i, array[i]);
for (i = 0; i < idx; i++) /* free all memory */
free (array[i]);
free (array);
return 0;
}
/* fopen with error checking */
FILE *xfopen (const char *fn, const char *mode)
{
FILE *fp = fopen (fn, mode);
if (!fp) {
fprintf (stderr, "xfopen() error: file open failed '%s'.\n", fn);
// return NULL;
exit (EXIT_FAILURE);
}
return fp;
}
/* simple calloc with error checking */
void *xcalloc (size_t n, size_t s)
{
void *memptr = calloc (n, s);
if (memptr == 0) {
fprintf (stderr, "xcalloc() error: virtual memory exhausted.\n");
exit (EXIT_FAILURE);
}
return memptr;
}
/* realloc array of pointers ('memptr') to twice current
* number of pointer ('*nptrs'). Note: 'nptrs' is a pointer
* to the current number so that its updated value is preserved.
* no pointer size is required as it is known (simply the size
* of a pointer
*/
void *xrealloc_dp (void *ptr, size_t *n)
{
void **p = ptr;
void *tmp = realloc (p, 2 * *n * sizeof tmp);
if (!tmp) {
fprintf (stderr, "%s() error: virtual memory exhausted.\n", __func__);
exit (EXIT_FAILURE);
}
p = tmp;
memset (p + *n, 0, *n * sizeof tmp); /* set new pointers NULL */
*n *= 2;
return p;
}
Input File
$ cat dat/captnjack.txt
This is a tale
Of Captain Jack Sparrow
A Pirate So Brave
On the Seven Seas.
Output
$ ./bin/getline_strtok <dat/captnjack.txt
array[ 0] : This
array[ 1] : is
array[ 2] : a
array[ 3] : tale
array[ 4] : Of
array[ 5] : Captain
array[ 6] : Jack
array[ 7] : Sparrow
array[ 8] : A
array[ 9] : Pirate
array[10] : So
array[11] : Brave
array[12] : On
array[13] : the
array[14] : Seven
array[15] : Seas.
Memory/Error Check
In any code your write that dynamically allocates memory, you have 2 responsibilites regarding any block of memory allocated: (1) always preserves a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed. It is imperative that you use a memory error checking program to insure you haven't written beyond/outside your allocated block of memory and to confirm that you have freed all the memory you have allocated. For Linux valgrind is the normal choice. There are so many subtle ways to misuse a block of memory that can cause real problems, there is no excuse not to do it. There are similar memory checkers for every platform. They are all simple to use. Just run your program through it.
$ valgrind ./bin/getline_strtok <dat/captnjack.txt
==26284== Memcheck, a memory error detector
==26284== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==26284== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==26284== Command: ./bin/getline_strtok
==26284==
array[ 0] : This
array[ 1] : is
<snip>
array[14] : Seven
array[15] : Seas.
==26284==
==26284== HEAP SUMMARY:
==26284== in use at exit: 0 bytes in 0 blocks
==26284== total heap usage: 18 allocs, 18 frees, 708 bytes allocated
==26284==
==26284== All heap blocks were freed -- no leaks are possible
==26284==
==26284== For counts of detected and suppressed errors, rerun with: -v
==26284== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)
What you want to confirm each time is "All heap blocks were freed -- no leaks are possible" and "ERROR SUMMARY: 0 errors from 0 contexts".
How about growing the buffer gradually, for example, by doubling the size of buffer when the buffer becomes full?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
char *read_string(void) {
size_t allocated_size = 2;
size_t read_size = 0;
char *buf = malloc(allocated_size); /* allocate initial buffer */
if (buf == NULL) return NULL;
for(;;) {
/* read next character */
int input = getchar();
if (input == EOF || isspace(input)) break;
/* if there isn't enough buffer */
if (read_size >= allocated_size - 1) {
/* allocate new buffer */
char *new_buf = malloc(allocated_size *= 2);
if (new_buf == NULL) {
/* failed to allocate */
free(buf);
return NULL;
}
/* copy data read to new buffer */
memcpy(new_buf, buf, read_size);
/* free old buffer */
free(buf);
/* assign new buffer */
buf = new_buf;
}
buf[read_size++] = input;
}
buf[read_size] = '\0';
return buf;
}
int main(void) {
int N = 5;
int i;
char** names;
names = malloc(N*sizeof(char*));
if(names == NULL) return 1;
for(i=0; i<N; i++) {
names[i] = read_string();
}
for(i = 0; i < N; i++) {
puts(names[i] ? names[i] : "NULL");
free(names[i]);
}
free(names);
return 0;
}
Note: They say you shouldn't cast the result of malloc() in C.
For a known number of strings, you have allocated the char ** correctly:
char** names;
names = (char**) malloc(N*sizeof(char*));
Note, because the cast is not necessary in C, you could write it like this:
names = malloc(N*sizeof(char*));
For allocating memory as you read the file, for strings of unknown length, use the following approach:
allocate a buffer using [m][c]alloc of a known starting size (calloc is cleaner)
read into the buffer until you run out of space.
use realloc to increase the size of buffer by some increment (double it)
repeat steps 1 through 3 until file is read
Also, when working with buffers of unknown length, and you would like its contents to be pre-set, or zeroed, consider using calloc() over malloc(). It is a cleaner option.
When you say,
char** names;
char ch;
names = malloc(N*sizeof(char*));
You created a names variable which is double pointer capable of storing address of strings multiple N times.
Ex: if you have 32 strings, then N is 32.
So, 32* sizeof(char*)
and sizeof char* is 4 bytes
Hence, 128 bytes will be allocated
After that you did this,
names[i][j++] = ch;
The above expression is wrong way to use.
Because, you are trying to assign char data to address variables.
You need to create sub memories for memory address variables name .
Or you need to assign address of each sub string from main string.
use readline() or getline() to acquire a pointer to a memory allocation that contains the data.
Then use something like sscanf() or strtok() to extract the individual name strings into members of an array.