I have to get names with a known number of names from input as one string each separated by a space, I have to dynamically allocate memory for an array of strings where each string gets a name,
char** names;
char ch;
names = malloc(N*sizeof(char*); /*N is defined*/
for(i=0; i<N; i++) {
Now I have to allocate for each string without using a defined number:
i=0, j=0;
while ((ch=getchar) != '\n') {
while (ch != ' ') {
names[i][j++] = ch;
}
if (ch == ' ') {
names[i][j] = '\0';
i++}}
if (ch == '\n')
names[i][j] = '\0';
This is the classic question of how do I handle dynamic allocation and reallocation to store an unknown number of strings. (with a twist to separate each string into individual tokens before saving to the array) It is worth understanding this process in detail as it will serve as the basis for just about any any other circumstance where you are reading an unknown number of values (whether they are structs, floats, characters, etc...).
There are a number of different types of data structures you can employ, lists, trees, etc., but the basic approach is by creating an array of pointer-to-pointer-to-type (with type being char in this case) and then allocating space for, filling with data, and assigning the starting address for the new block of memory to each pointer as your data is read. The short-hand for pointer-to-pointer-to-type is simply double-pointer (e.g. char **array;, which is technically a pointer-to-pointer-to-char or pointer-to-char* if you like)
The general, and efficient, approach to allocating memory for an unknown number of lines is to first allocate a reasonably anticipated number of pointers (1 for each anticipated token). This is much more efficient than calling realloc and reallocating the entire collection for every token you add to your array. Here, you simply keep a counter of the number of tokens added to your array, and when you reach your original allocation limit, you simmply reallocate twice the number of pointers you currenly have. Note, you are free to add any incremental amount you choose. You can simply add a fixed amount each time, or you can use some scaled multiple of the original -- it's up to you. The realloc to twice the current is just one of the standard schemes.
What is "a reasonably anticipated number of pointers?" It's no precise number. You simply want to take an educated guess at the number of tokens you roughtly expect and use that as an initial number for allocating pointers. You wouldn't want to allocate 10,000 pointers if you only expect 100. That would be horribly wasteful. Reallocation will take care of any shortfall, so a rough guess is all that is needed. If you truly have no idea, then allocate some reasonable number, say 64 or 128, etc.. You can simply declare the limit as a constant at the beginning of your code, so it is easily adjusted. e.g.:
#declare MAXPTR 128
or accomplish the same thing using an anonymous enum
enum { MAXPTR = 128 };
When allocating your pointers originally, and as part of your reallocation, you can benefit by setting each pointer to NULL. This is easily accomplished for the original allocation. Simply use calloc instead of malloc. On reallocation, it requires that you set all new pointers allocated to NULL. The benefit it provides is the first NULL acts as a sentinel indicating the point at which your valid pointers stop. As long as you insure you have at least one NULL preserved as a sentinel, you can iterate without the benefit of knowing precise number of pointers filled. e.g.:
size_t i = 0;
while (array[i]) {
... do your stuff ...
}
When you are done using the allocated memory, you want to insure you free the memory. While in a simple piece of code, the memory is freed on exit, get in the habit of tracking the memory you allocate and freeing it when it is no longer needed.
As for this particular task, you will want to read a line of unknown number of characters into memory and then tokenize (separate) the string into tokens. getline will read and allocate memory sufficient to hold any size character string. You can do the same thing with any of the other input functions, you just have to code the repeated checks and reallocations yourself. If getline is available (it is in every modern compier), use it. Then it is just a matter of separating the input into tokens with strtok or strsep. You will then want to duplicate the each token to preserve each token in its own block of memory and assign the location to your array of tokens. The following provides a short example.
Included in the example are several helper functions for opening files, allocating and reallocating. All they do is simple error checking which help keep the main body of your code clean and readable. Look over the example and let me know if you have any questions.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAXL 64 /* initial number of pointers */
/* simple helper/error check functions */
FILE *xfopen (const char *fn, const char *mode);
void *xcalloc (size_t n, size_t s);
void *xrealloc_dp (void *ptr, size_t *n);
int main (int argc, char **argv) {
char **array = NULL;
char *line = NULL;
size_t i, idx = 0, maxl = MAXL, n = 0;
ssize_t nchr = 0;
FILE *fp = argc > 1 ? xfopen (argv[1], "r") : stdin;
array = xcalloc (maxl, sizeof *array); /* allocate maxl pointers */
while ((nchr = getline (&line, &n, fp)) != -1)
{
while (nchr > 0 && (line[nchr-1] == '\r' || line[nchr-1] == '\n'))
line[--nchr] = 0; /* strip carriage return or newline */
char *p = line; /* pointer to use with strtok */
for (p = strtok (line, " \n"); p; p = strtok (NULL, " \n")) {
array[idx++] = strdup (p); /* allocate & copy */
/* check limit reached - reallocate */
if (idx == maxl) array = xrealloc_dp (array, &maxl);
}
}
free (line); /* free memory allocated by getline */
if (fp != stdin) fclose (fp);
for (i = 0; i < idx; i++) /* print all tokens */
printf (" array[%2zu] : %s\n", i, array[i]);
for (i = 0; i < idx; i++) /* free all memory */
free (array[i]);
free (array);
return 0;
}
/* fopen with error checking */
FILE *xfopen (const char *fn, const char *mode)
{
FILE *fp = fopen (fn, mode);
if (!fp) {
fprintf (stderr, "xfopen() error: file open failed '%s'.\n", fn);
// return NULL;
exit (EXIT_FAILURE);
}
return fp;
}
/* simple calloc with error checking */
void *xcalloc (size_t n, size_t s)
{
void *memptr = calloc (n, s);
if (memptr == 0) {
fprintf (stderr, "xcalloc() error: virtual memory exhausted.\n");
exit (EXIT_FAILURE);
}
return memptr;
}
/* realloc array of pointers ('memptr') to twice current
* number of pointer ('*nptrs'). Note: 'nptrs' is a pointer
* to the current number so that its updated value is preserved.
* no pointer size is required as it is known (simply the size
* of a pointer
*/
void *xrealloc_dp (void *ptr, size_t *n)
{
void **p = ptr;
void *tmp = realloc (p, 2 * *n * sizeof tmp);
if (!tmp) {
fprintf (stderr, "%s() error: virtual memory exhausted.\n", __func__);
exit (EXIT_FAILURE);
}
p = tmp;
memset (p + *n, 0, *n * sizeof tmp); /* set new pointers NULL */
*n *= 2;
return p;
}
Input File
$ cat dat/captnjack.txt
This is a tale
Of Captain Jack Sparrow
A Pirate So Brave
On the Seven Seas.
Output
$ ./bin/getline_strtok <dat/captnjack.txt
array[ 0] : This
array[ 1] : is
array[ 2] : a
array[ 3] : tale
array[ 4] : Of
array[ 5] : Captain
array[ 6] : Jack
array[ 7] : Sparrow
array[ 8] : A
array[ 9] : Pirate
array[10] : So
array[11] : Brave
array[12] : On
array[13] : the
array[14] : Seven
array[15] : Seas.
Memory/Error Check
In any code your write that dynamically allocates memory, you have 2 responsibilites regarding any block of memory allocated: (1) always preserves a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed. It is imperative that you use a memory error checking program to insure you haven't written beyond/outside your allocated block of memory and to confirm that you have freed all the memory you have allocated. For Linux valgrind is the normal choice. There are so many subtle ways to misuse a block of memory that can cause real problems, there is no excuse not to do it. There are similar memory checkers for every platform. They are all simple to use. Just run your program through it.
$ valgrind ./bin/getline_strtok <dat/captnjack.txt
==26284== Memcheck, a memory error detector
==26284== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==26284== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==26284== Command: ./bin/getline_strtok
==26284==
array[ 0] : This
array[ 1] : is
<snip>
array[14] : Seven
array[15] : Seas.
==26284==
==26284== HEAP SUMMARY:
==26284== in use at exit: 0 bytes in 0 blocks
==26284== total heap usage: 18 allocs, 18 frees, 708 bytes allocated
==26284==
==26284== All heap blocks were freed -- no leaks are possible
==26284==
==26284== For counts of detected and suppressed errors, rerun with: -v
==26284== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)
What you want to confirm each time is "All heap blocks were freed -- no leaks are possible" and "ERROR SUMMARY: 0 errors from 0 contexts".
How about growing the buffer gradually, for example, by doubling the size of buffer when the buffer becomes full?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
char *read_string(void) {
size_t allocated_size = 2;
size_t read_size = 0;
char *buf = malloc(allocated_size); /* allocate initial buffer */
if (buf == NULL) return NULL;
for(;;) {
/* read next character */
int input = getchar();
if (input == EOF || isspace(input)) break;
/* if there isn't enough buffer */
if (read_size >= allocated_size - 1) {
/* allocate new buffer */
char *new_buf = malloc(allocated_size *= 2);
if (new_buf == NULL) {
/* failed to allocate */
free(buf);
return NULL;
}
/* copy data read to new buffer */
memcpy(new_buf, buf, read_size);
/* free old buffer */
free(buf);
/* assign new buffer */
buf = new_buf;
}
buf[read_size++] = input;
}
buf[read_size] = '\0';
return buf;
}
int main(void) {
int N = 5;
int i;
char** names;
names = malloc(N*sizeof(char*));
if(names == NULL) return 1;
for(i=0; i<N; i++) {
names[i] = read_string();
}
for(i = 0; i < N; i++) {
puts(names[i] ? names[i] : "NULL");
free(names[i]);
}
free(names);
return 0;
}
Note: They say you shouldn't cast the result of malloc() in C.
For a known number of strings, you have allocated the char ** correctly:
char** names;
names = (char**) malloc(N*sizeof(char*));
Note, because the cast is not necessary in C, you could write it like this:
names = malloc(N*sizeof(char*));
For allocating memory as you read the file, for strings of unknown length, use the following approach:
allocate a buffer using [m][c]alloc of a known starting size (calloc is cleaner)
read into the buffer until you run out of space.
use realloc to increase the size of buffer by some increment (double it)
repeat steps 1 through 3 until file is read
Also, when working with buffers of unknown length, and you would like its contents to be pre-set, or zeroed, consider using calloc() over malloc(). It is a cleaner option.
When you say,
char** names;
char ch;
names = malloc(N*sizeof(char*));
You created a names variable which is double pointer capable of storing address of strings multiple N times.
Ex: if you have 32 strings, then N is 32.
So, 32* sizeof(char*)
and sizeof char* is 4 bytes
Hence, 128 bytes will be allocated
After that you did this,
names[i][j++] = ch;
The above expression is wrong way to use.
Because, you are trying to assign char data to address variables.
You need to create sub memories for memory address variables name .
Or you need to assign address of each sub string from main string.
use readline() or getline() to acquire a pointer to a memory allocation that contains the data.
Then use something like sscanf() or strtok() to extract the individual name strings into members of an array.
Related
The script successfully prints the text file however I want to store what is in the text file into an array, I have looked a lot of places but I am not exactly understanding what information I have come across, is there anyway I can get some guidance?
#include <stdlib.h>
int main()
{
// OPENS THE FILE
FILE *fp = fopen("/classes/cs3304/cs330432/Programs/StringerTest/people.txt", "r");
size_t len = 1000;
char *word = malloc(sizeof(char) * len);
// CHECKS IF THE FILE EXISTS, IF IT DOESN'T IT WILL PRINT OUT A STATEMENT SAYING SO
if (fp == NULL)
{
printf("file not found");
return 0;
}
while(fgets(word, len, fp) != NULL)
{
printf("%s", word);
}
free(word);
}
the text file has the following in it(just a list of words):
endorse
vertical
glove
legend
scenario
kinship
volunteer
scrap
range
elect
release
sweet
company
solve
elapse
arrest
witch
invasion
disclose
professor
plaintiff
definition
bow
chauvinist
Let's see if we can't get you straightened out. First, you are thinking in the right direction, and you should be commended for using fgets() to read each line into a fixed buffer (character array), and then you need to collect and store all of the lines so that they are available for use by your program -- that appears to be where the wheels fell off.
Basic Outline of Approach
In an overview, when you want to handle an unlimited number of lines, you have two different types of blocks of memory you are going to allocate and manage. The first is a block of memory you allocate that will hold some number of pointers (one for each line you will store). It doesn't matter how many you initially allocate, because you will keep track of the number allocated (number available) and the number used. When (used == available) you will realloc() a bigger block of memory to hold more pointers and keep on going.
The second type block of memory you will handle is the storage for each line. No mystery there. You will allocate storage for each character (+1 for the null-terminating character) and you will copy the line from your fixed buffer to the allocated block.
The two blocks of memory work together, because to create your collection, you simply assign the address for the block of memory holding the line of data to the next available pointer.
Let's think through a short example where we declare char **lines; as the pointer to the block of memory holding pointers. Then say we allocate two-pointers initially, we have valid pointers available for lines[0] and lines[1]. We track the number of pointers available with nptrs and the number used with used. So initially nptrs = 2; and used = 0;.
When we read our first line with fgets(), we will trim the '\n' from the end of the string and then get the length of the string (len = strlen(buffer);). We can then allocate storage for the string assigning the address of the allocated block to our first pointer, e.g.
lines[used] = malloc (len + 1);
and then copy the contents of buffer to lines[0], e.g.
memcpy (lines[used], buffer, len + 1);
(note: there is no reason to call strcpy() and have it scan for end-of-string again, we already know how many characters to copy -- including the nul-terminating character)
Finally, all that is needed to keep our counters happy is to increment used by one. We store the next line the same way, and on the 3rd iteration used == nptrs so we realloc() more pointers (generally just doubling the number of pointers each time a realloc() is required). That is a good balance between calls to realloc() and growth of the number of pointers -- but you are free to increment the allocation any way you like -- but avoid calling realloc() for every line.
So you keep reading lines, checking if realloc() is required, reallocating if needed, and allocating for each line assigning the starting address to each of your pointers in turn. The only additional note is that when you realloc() you always use a temporary pointer so when realloc() fails and returns NULL, you do not overwrite your original pointer with NULL losing the starting address to the block of memory holding pointers -- creating a memory leak.
Implementation
The details were left out of the overview, so let's look at a short example to read an unknown number of lines from a file (each line being 1024 characters or less) and storing each line in a collection using a pointer-to-pointer to char as described above. Don't use Magic-Numbers in your code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAXC 1024 /* if you need a constant, #define one (or more) */
#define NPTRS 2 /* initial no. of pointers to allocate (lines) */
Don't hardcode Filenames in your code either, that was argc and argv are for in int main (int argc, char **argv). Pass the filename to read as the first argument to the program (or read from stdin by default if no argument is given):
int main (int argc, char **argv) {
char buf[MAXC], /* fixed buffer to read each line */
**lines = NULL; /* pointer to pointer to hold collection of lines */
size_t nptrs = NPTRS, /* number of pointers available */
used = 0; /* number of pointers used */
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
(note: you should not need to recompile your program just to read from a different filename)
Now allocate and Validate your initial number of pointers
/* allocate/validate block holding initial nptrs pointers */
if ((lines = malloc (nptrs * sizeof *lines)) == NULL) {
perror ("malloc-lines");
exit (EXIT_FAILURE);
}
Read each line and trim the '\n' from the end and get the number of characters that remaining after the '\n' has been removed (you can use strcspn() to do it all at once):
while (fgets (buf, MAXC, fp)) { /* read each line into buf */
size_t len;
buf[(len = strcspn (buf, "\n"))] = 0; /* trim \n, save length */
Next we check if a reallocation is needed and if so reallocate using a temporary pointer:
if (used == nptrs) { /* check if realloc of lines needed */
/* always realloc using temporary pointer (doubling no. of pointers) */
void *tmp = realloc (lines, (2 * nptrs) * sizeof *lines);
if (!tmp) { /* validate reallocation */
perror ("realloc-lines");
break; /* don't exit, lines still good */
}
lines = tmp; /* assign reallocated block to lines */
nptrs *= 2; /* update no. of pointers allocatd */
/* (optionally) zero all newly allocated memory here */
}
Now allocate and Validate the storage for the line and copy the line to the new storage, incrementing used when done -- completing your read-loop.
/* allocate/validate storage for line */
if (!(lines[used] = malloc (len + 1))) {
perror ("malloc-lines[used]");
break;
}
memcpy (lines[used], buf, len + 1); /* copy line from buf to lines[used] */
used += 1; /* increment used pointer count */
}
/* (optionally) realloc to 'used' pointers to size no. of pointers exactly here */
if (fp != stdin) /* close file if not stdin */
fclose (fp);
Now you can use the lines stored in lines as needed in your program, remembering to free the memory for each line when done and then finally freeing the block of pointers, e.g.
/* use lines as needed (simply outputting here) */
for (size_t i = 0; i < used; i++) {
printf ("line[%3zu] : %s\n", i, lines[i]);
free (lines[i]); /* free line storage when done */
}
free (lines); /* free pointers when done */
}
That's all that is needed. Now you can go read the 324,000 words in /usr/share/dict/words (or perhaps on your system /var/lib/dict/words depending on distro) and you will not have any problems doing so.
Input File
A short example file:
$ cat dat/captnjack.txt
This is a tale
Of Captain Jack Sparrow
A Pirate So Brave
On the Seven Seas.
Example Use/Output
$ ./bin/fgets_lines_dyn_simple dat/captnjack.txt
line[ 0] : This is a tale
line[ 1] : Of Captain Jack Sparrow
line[ 2] : A Pirate So Brave
line[ 3] : On the Seven Seas.
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to ensure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/fgets_lines_dyn_simple dat/captnjack.txt
==8156== Memcheck, a memory error detector
==8156== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==8156== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==8156== Command: ./bin/fgets_lines_dyn_simple dat/captnjack.txt
==8156==
line[ 0] : This is a tale
line[ 1] : Of Captain Jack Sparrow
line[ 2] : A Pirate So Brave
line[ 3] : On the Seven Seas.
==8156==
==8156== HEAP SUMMARY:
==8156== in use at exit: 0 bytes in 0 blocks
==8156== total heap usage: 9 allocs, 9 frees, 5,796 bytes allocated
==8156==
==8156== All heap blocks were freed -- no leaks are possible
==8156==
==8156== For counts of detected and suppressed errors, rerun with: -v
==8156== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
The Full Code
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAXC 1024 /* if you need a constant, #define one (or more) */
#define NPTRS 2 /* initial no. of pointers to allocate (lines) */
int main (int argc, char **argv) {
char buf[MAXC], /* fixed buffer to read each line */
**lines = NULL; /* pointer to pointer to hold collection of lines */
size_t nptrs = NPTRS, /* number of pointers available */
used = 0; /* number of pointers used */
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
/* allocate/validate block holding initial nptrs pointers */
if ((lines = malloc (nptrs * sizeof *lines)) == NULL) {
perror ("malloc-lines");
exit (EXIT_FAILURE);
}
while (fgets (buf, MAXC, fp)) { /* read each line into buf */
size_t len;
buf[(len = strcspn (buf, "\n"))] = 0; /* trim \n, save length */
if (used == nptrs) { /* check if realloc of lines needed */
/* always realloc using temporary pointer (doubling no. of pointers) */
void *tmp = realloc (lines, (2 * nptrs) * sizeof *lines);
if (!tmp) { /* validate reallocation */
perror ("realloc-lines");
break; /* don't exit, lines still good */
}
lines = tmp; /* assign reallocated block to lines */
nptrs *= 2; /* update no. of pointers allocatd */
/* (optionally) zero all newly allocated memory here */
}
/* allocate/validate storage for line */
if (!(lines[used] = malloc (len + 1))) {
perror ("malloc-lines[used]");
break;
}
memcpy (lines[used], buf, len + 1); /* copy line from buf to lines[used] */
used += 1; /* increment used pointer count */
}
/* (optionally) realloc to 'used' pointers to size no. of pointers exactly here */
if (fp != stdin) /* close file if not stdin */
fclose (fp);
/* use lines as needed (simply outputting here) */
for (size_t i = 0; i < used; i++) {
printf ("line[%3zu] : %s\n", i, lines[i]);
free (lines[i]); /* free line storage when done */
}
free (lines); /* free pointers when done */
}
Look things over and let me know if you have any questions. If you also wanted to read lines of unknown length (millions of characters long), you would simply loop doing the same thing allocating and reallocating for each line until the '\n' character was found (or EOF) marking the end of the line. It is no different in principle than what we have done above for the pointers.
I'm trying to malloc array of strings in a struct and it doesn't work well.
I also want to check the size of the arrays but I don't get the right values
what is wrong with my code?
by the way, I know I should free the memory allocation..
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
typedef struct{
char** string;
} Strt;
int main(int agrc, char *argv[]) {
int i;
int size1 = 0, size2 = 0;
char** arrr;
Strt* arr = malloc(sizeof(Strt));
printf("number of arrays: ");
scanf("%d", &size1);
printf("size of each array: ");
scanf("%d", &size2);
arr->string = (char**)malloc(size1 * sizeof(char));
printf("size of string: %d\n", sizeof(arr->string));
for(i = 0; i < size1; i++){
arr->string[i] = (char*)malloc(size2 * sizeof(char));
}
return 0;
}
It looks like you are wanting to allocate for an unknown number of strings. You are thinking correctly that you will want to use a pointer-to-pointer-to char (e.g. char **arr;), but the "train fell off the tracks" so to speak when you went to implement the logic.
Let's look at what you need to do to read and store an unknown number of strings (or any object for that matter). It is a two-step process where you:
allocate storage for pointers, one for each object you wish to store, and
allocate a block of storage for each object, copying the object to the newly allocated block and assigning the beginning address of that block to one of your pointers.
To grow the number of objects stored, you realloc() more pointers and keep going until you run out of objects to store.
When you realloc(), you always use a temporary pointer. Why? Because when (not if) realloc() fails, it returns NULL and if you are simply calling realloc() with the pointer itself, you overwrite the address to the current block of memory with NULL losing the original pointer that can now no longer be freed.
So instead of:
arr = realloc (arr, /* how big */);
You do:
/* always realloc using temp pointer to avoid mem-leak on realloc failure. */
void *tmp = realloc (arr, (nstr + 1) * sizeof *arr); /* realloc pointers */
if (!tmp) { /* VALIDATE EVERY ALLOCATION */
perror ("realloc-tmp");
break;
}
arr = tmp; /* assign reallocated block of pointers to arr */
Now if I understand correctly (though I am not 100% clear on) your need to read and store an unknown number of strings, you do not need a struct. Look at Strt -- it is a single-member struct. The struct serves no purpose there. You simply want a pointer to a block of pointers which you can grow. So char **arr; is all that is needed. Other than that you need a counter to keep track of now many strings are stored, and for the actual read of the string, simply use a fixed size buffer with fgets() or use POSIX getline(). So your setup for beginning to read strings could be:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define MAXC 1024 /* if you need a constant, #define one (or more) */
int main (void) {
char buf[MAXC], /* buffer to hold each line read */
**arr = NULL; /* pointer-to-pointer-to char */
size_t nstr = 0; /* counter for number of strings stored */
...
Above you will read all input into buf, then you can trim the '\n' included by fgets(), obtaining the length of the input so you can then allocate length + 1 characters of storage for the string (+1 for the nul-terminating character) assigning the beginning address of the new block to the next available pointer. You then copy from buf to your newly allocated block and increment your counter.
(A simple way to trim the '\n' and obtain the length of the input is with strcspn())
Putting those pieces together, your read, trim, reallocation, allocation of the new block and copy to it could be done with:
for (;;) { /* loop continually reading input */
size_t len; /* var to hold length of string after \n removal */
fputs ("enter string (EOF to quit): ", stdout); /* prompt */
if (!fgets (buf, MAXC, stdin)) { /* read line (or EOF) */
puts ("(all done)\n");
break;
}
buf[(len = strcspn (buf, "\r\n"))] = 0; /* trim \n, save len */
/* always realloc using temp pointer to avoid mem-leak on realloc failure. */
void *tmp = realloc (arr, (nstr + 1) * sizeof *arr); /* realloc pointers */
if (!tmp) { /* VALIDATE EVERY ALLOCATION */
perror ("realloc-tmp");
break;
}
arr = tmp; /* assign reallocated block of pointers to arr */
if (!(arr[nstr] = malloc (len + 1))) { /* allocate/validate block for string */
perror ("malloc-arr[nstr]");
break;
}
memcpy (arr[nstr++], buf, len + 1); /* copy string to allocated block */
}
(note: the manual EOF is generated by the user pressing Ctrl + d -- or Ctrl + z on windows)
All that remains is outputting the stored string (or using them however you need) and then freeing the allocated memory:
for (size_t i = 0; i < nstr; i++) { /* loop outputting strings */
puts (arr[i]);
free (arr[i]); /* free string */
}
free (arr); /* free pointers */
}
If you put the whole program together you would have:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define MAXC 1024 /* if you need a constant, #define one (or more) */
int main (void) {
char buf[MAXC], /* buffer to hold each line read */
**arr = NULL; /* pointer-to-pointer-to char */
size_t nstr = 0; /* counter for number of strings stored */
for (;;) { /* loop continually reading input */
size_t len; /* var to hold length of string after \n removal */
fputs ("enter string (EOF to quit): ", stdout); /* prompt */
if (!fgets (buf, MAXC, stdin)) { /* read line (or EOF) */
puts ("(all done)\n");
break;
}
buf[(len = strcspn (buf, "\r\n"))] = 0; /* trim \n, save len */
/* always realloc using temp pointer to avoid mem-leak on realloc failure. */
void *tmp = realloc (arr, (nstr + 1) * sizeof *arr); /* realloc pointers */
if (!tmp) { /* VALIDATE EVERY ALLOCATION */
perror ("realloc-tmp");
break;
}
arr = tmp; /* assign reallocated block of pointers to arr */
if (!(arr[nstr] = malloc (len + 1))) { /* allocate/validate block for string */
perror ("malloc-arr[nstr]");
break;
}
memcpy (arr[nstr++], buf, len + 1); /* copy string to allocated block */
}
for (size_t i = 0; i < nstr; i++) { /* loop outputting strings */
puts (arr[i]);
free (arr[i]); /* free string */
}
free (arr); /* free pointers */
}
(note: while you would normally what to allocate blocks of pointers keeping track of the number available and number used and only reallocating when used == available to minimize the number of reallocations needed -- when taking user input, there is no efficiency to be gained due to the delay caused by the user in pecking out the input. But know you can optimize by minimizing the number of reallocations needed)
Example Use/Output
$ ./bin/allocate_p2p
enter string (EOF to quit): My dog
enter string (EOF to quit): has fleas...
enter string (EOF to quit): My cat
enter string (EOF to quit): has none...
enter string (EOF to quit): Lucky cat!
enter string (EOF to quit): (all done)
My dog
has fleas...
My cat
has none...
Lucky cat!
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to ensure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/allocate_p2p
==16556== Memcheck, a memory error detector
==16556== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==16556== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==16556== Command: ./bin/allocate_p2p
==16556==
enter string (EOF to quit): My dog
enter string (EOF to quit): has fleas...
enter string (EOF to quit): My cat
enter string (EOF to quit): has none...
enter string (EOF to quit): Lucky cat!
enter string (EOF to quit): (all done)
My dog
has fleas...
My cat
has none...
Lucky cat!
==16556==
==16556== HEAP SUMMARY:
==16556== in use at exit: 0 bytes in 0 blocks
==16556== total heap usage: 12 allocs, 12 frees, 2,218 bytes allocated
==16556==
==16556== All heap blocks were freed -- no leaks are possible
==16556==
==16556== For counts of detected and suppressed errors, rerun with: -v
==16556== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
Look things over and let me know if I read between the lines correctly as to what you needed, or if I missed. I'm happy to help further if what you need is slightly different. However looking at what you were trying to do, this seemed like what you were aiming at. Let me know if you have further questions.
I have a question about reading a file character by character and counting it in C
here's my code down below
void read_in(char** quotes){
FILE *frp = fopen(IN_FILE, "r");
char c;
size_t tmp_len =0, i=0;
//char* tmp[100];
//char* quotes[MAX_QUOTES];
//char str = fgets(str, sizeof(quotes),frp);
while((c=fgetc(frp)) != EOF){
if(frp == NULL){
printf("File is empty!");
fclose(frp); exit(1);
}
else{
if(c != '\n'){
printf("%c",c);
c=fgetc(frp);
tmp_len++;
}
}
char* tmp = (char*)calloc(tmp_len+1, sizeof(char));
fgets(tmp, sizeof(tmp), frp);
strcpy((char*)quotes[i], tmp);
printf("%s\n", (char*)quotes[i]);
i++;
}
}
It doesn't work but I don't understand why.
Thank you
From your question and through the comments, it is relatively clear you want to read all quotes (lines) in a file into dynamically allocated storage (screen 1) and then sort the lines by length and output the first 5 shortest lines (screen 2) saving the 5 shortest lines to a second output file (this part is left to you). Reading and storing all lines from a file isn't difficult -- but it isn't trivial either. It sounds basic, and it is, but it requires that you use all of the basic tools needed to interface with persistent storage (reading the file from disk/storage media) and your computer's memory subsystem (RAM) -- correctly.
Reading each line from a file isn't difficult, but like anything in C, it requires you to pay attention to the details. You can read from a file using character-oriented input functions (fgetc(), getc(), etc..), you can use formatted-input functions (fscanf()) and you can use line-oriented input functions such as (fgets() or POSIX getline()). Reading lines from a file is generally done with line-oriented functions, but there is nothing wrong with using a character-oriented approach either. In fact you can relatively easily write a function based around fgetc() that will read each line from a file for you.
In the trivial case where you know the maximum number of characters for the longest line in the file, you can use a 2D array of characters to store the entire file. This simplifies the process by eliminating the need to allocate storage dynamically, but has a number of disadvantages like each line in the file requiring the same storage as the longest line in the file, and by limiting the size of the file that can be stored to the size of your program stack. Allocating storage dynamically with (malloc, calloc, or realloc) eliminates these disadvantages and inefficiencies allowing you to store files up to the limit of the memory available on your computer. (there are methods that allow both to handle files of any size by using sliding-window techniques well beyond your needs here)
There is nothing difficult about handling dynamically allocated memory, or in copying or storing data within it on a character-by-character basis. That said, the responsibility for each allocation, tracking the amount of data written to each allocated block, reallocating to resize the block to ensure no data is written outside the bounds of each block and then freeing each allocated block when it is no longer needed -- is yours, the programmer. C gives the programmer the power to use each byte of memory available, and also places on the programmer the responsibility to use the memory correctly.
The basic approach to storing a file is simple. You read each line from the file, allocating/reallocating storage for each character until a '\n' or EOF is encountered. To coordinate all lines, you allocate a block of pointers, and you assign the address for each block of memory holding a line to a pointer, in sequence, reallocating the number of pointers required as needed to hold all lines.
Sometimes a picture really is worth 1000 words. With the basic approach you declare a pointer (to what?) a pointer so you can allocate a block of memory containing pointers to which you will assign each allocated line. For example, you could declare, char **lines; A pointer-to-pointer is a single pointer that points to a block of memory containing pointers. Then the type for each pointer for lines will be char * which will point to each block holding a line from the file, e.g.
char **lines;
|
| allocated
| pointers allocated blocks holding each line
lines --> +----+ +-----+
| p1 | --> | cat |
+----+ +-----+--------------------------------------+
| p2 | --> | Four score and seven years ago our fathers |
+----+ +-------------+------------------------------+
| p3 | --> | programming |
+----+ +-------------------+
| .. | | ... |
+----+ +-------------------+
| pn | --> | last line read |
+----+ +----------------+
You can make lines a bit more flexible to use by allocating 1 additional pointer and initializing that pointer to NULL which allows you to iterate over lines without knowing how many lines there are -- until NULL is encountered, e.g.
| .. | | ... |
+----+ +-------------------+
| pn | --> | last line read |
+----+ +----------------+
|pn+1| | NULL |
+----+ +------+
While you can put this all together in a single function, to help the learning process (and just for practical reusability), it is often easier to break this up into two function. One that reads and allocates storage for each line, and a second function that basically calls the first function, allocating pointers and assigning the address for each allocated block of memory holding a line read from the file to the next pointer in turn. When you are done, you have an allocated block of pointers where each of the pointers holds the address of (points to) an allocated block holding a line from the file.
You have indicated you want to read from the file with fgetc() and read a character at a time. There is nothing wrong with that, and there is little penalty to this approach since the underlying I/O subsystem provides a read-buffer that you are actually reading from rather than reading from disk one character at-a-time. (the size varies between compilers, but is generally provided through the BUFSIZ macro, both Linux and Windows compilers provide this)
There are virtually an unlimited number of ways to write a function that allocates storage to hold a line and then reads a line from the file one character at-a-time until a '\n' or EOF is encountered. You can return a pointer to the allocated block holding the line and pass a pointer parameter to be updated with the number of characters contained in the line, or you can have the function return the line length and pass the address-of a pointer as a parameter to be allocated and filled within the function. It is up to you. One way would be:
#define NSHORT 5 /* no. of shortest lines to display */
#define LINSZ 128 /* initial allocation size for each line */
...
/** read line from 'fp' stored in allocated block assinged to '*s' and
* return length of string stored on success, on EOF with no characters
* read, or on failure, return -1. Block of memory sized to accommodate
* exact length of string with nul-terminating char. unless -1 returned,
* *s guaranteed to contain nul-terminated string (empty-string allowed).
* caller responsible for freeing allocated memory.
*/
ssize_t fgetcline (char **s, FILE *fp)
{
int c; /* char read from fp */
size_t n = 0, size = LINSZ; /* no. of chars and allocation size */
void *tmp = realloc (NULL, size); /* tmp pointer for realloc use */
if (!tmp) /* validate every allocation/reallocation */
return -1;
*s = tmp; /* assign reallocated block to pointer */
while ((c = fgetc(fp)) != '\n' && c != EOF) { /* read chars until \n or EOF */
if (n + 1 == size) { /* check if realloc required */
/* realloc using temporary pointer */
if (!(tmp = realloc (*s, size + LINSZ))) {
free (*s); /* on failure, free partial line */
return -1; /* return -1 */
}
*s = tmp; /* assign reallocated block to pointer */
size += LINSZ; /* update allocated size */
}
(*s)[n++] = c; /* assign char to index, increment */
}
(*s)[n] = 0; /* nul-terminate string */
if (n == 0 && c == EOF) { /* if nothing read and EOF, free mem return -1 */
free (*s);
return -1;
}
if ((tmp = realloc (*s, n + 1))) /* final realloc to exact length */
*s = tmp; /* assign reallocated block to pointer */
return (ssize_t)n; /* return length (excluding nul-terminating char) */
}
(note: the ssize_t is a signed type providing the range of size_t that essentially allows the return of -1. it is provided in the sys/types.h header. you can adjust the type as desired)
The fgetclines() function makes one final call to realloc to shrink the size of the allocation to the exact number of characters needed to hold the line and the nul-terminating character.
The function called to read all lines in the file while allocation and reallocating pointers as required does essentially the same thing as the fgetclines() function above does for characters. It simply allocates some initial number of pointers and then begins reading lines from the file, reallocating twice the number of pointers each time it is needed. It also adds one additional pointer to hold NULL as a sentinel that will allow iterating over all pointers until NULL is reached (this is optional). The parameter n is updated to with the number of lines stored to make that available back in the calling function. This function too can be written in a number of different ways, one would be:
/** read each line from `fp` and store in allocated block returning pointer to
* allocateted block of pointers to each stored line with the final pointer
* after the last stored string set to NULL as a sentinel. 'n' is updated to
* the number of allocated and stored lines (excluding the sentinel NULL).
* returns valid pointer on success, NULL otherwise. caller is responsible for
* freeing both allocated lines and pointers.
*/
char **readfile (FILE *fp, size_t *n)
{
size_t nptrs = LINSZ; /* no. of allocated pointers */
char **lines = malloc (nptrs * sizeof *lines); /* allocated bock of pointers */
void *tmp = NULL; /* temp pointer for realloc use */
/* read each line from 'fp' into allocated block, assign to next pointer */
while (fgetcline (&lines[*n], fp) != -1) {
lines[++(*n)] = NULL; /* set next pointer NULL as sentinel */
if (*n + 1 >= nptrs) { /* check if realloc required */
/* allocate using temporary pointer to prevent memory leak on failure */
if (!(tmp = realloc (lines, 2 * nptrs * sizeof *lines))) {
perror ("realloc-lines");
return lines; /* return original poiner on failure */
}
lines = tmp; /* assign reallocated block to pointer */
nptrs *= 2; /* update no. of pointers allocated */
}
}
/* final realloc sizing exact no. of pointers required */
if (!(tmp = realloc (lines, (*n + 1) * sizeof *lines)))
return lines; /* return original block on failure */
return tmp; /* return updated block of pointers on success */
}
Note above, the function takes an open FILE* parameter for the file rather than taking a filename to open within the function. You generally want to open the file in the calling function and validate that it is open for reading before calling a function to read all the lines. If the file cannot be opened in the caller, there is no reason to make the function all to read the line from the file to begin with.
With a way to read an store all lines from your file done, you next need to turn to sorting the lines by length so you can output the 5 shortest lines (quotes). Since you will normally want to preserve the lines from your file in-order, the easiest way to sort the lines by length while preserving the original order is just to make a copy of the pointers and sort the copy of pointers by line length. For example, your lines pointer can continue to contain the pointers in original order, while the set of pointers sortedlines can hold the pointers in order sorted by line length, e.g.
int main (int argc, char **argv) {
char **lines = NULL, /* pointer to allocated block of pointers */
**sortedlines = NULL; /* copy of lines pointers to sort by length */
After reading the file and filling the lines pointer, you can copy the pointers to sortedlines (including the sentinel NULL), e.g.
/* alocate storage for copy of lines pointers (plus sentinel NULL) */
if (!(sortedlines = malloc ((n + 1) * sizeof *sortedlines))) {
perror ("malloc-sortedlines");
return 1;
}
/* copy pointers from lines to sorted lines (plus sentinel NULL) */
memcpy (sortedlines, lines, (n + 1) * sizeof *sortedlines);
Then you simply call qsort to sort the pointers in sortedlines by length. Your only job with qsort is to write the *compare` function. The prototype for the compare function is:
int compare (const void *a, const void *b);
Both a and b will be pointers-to elements being sorted. In your case with char **sortedlines;, the elements will be pointer-to-char, so a and b will both have type pointer-to-pointer to char. You simply write a compare function so it will return less than zero if the length of line pointed to by a is less than b (already in the right order), return zero if the length is the same (no action needed) and return greater than zero if the length of a is greater than b (a swap is required). Writing the compare a the difference of two conditionals rather than simple a - b will prevent all potential overflow, e.g.
/** compare funciton for qsort, takes pointer-to-element in a & b */
int complength (const void *a, const void *b)
{
/* a & b are pointer-to-pointer to char */
char *pa = *(char * const *)a, /* pa is pointer to string */
*pb = *(char * const *)b; /* pb is pointer to string */
size_t lena = strlen(pa), /* length of pa */
lenb = strlen(pb); /* length of pb */
/* for numeric types returing result of (a > b) - (a < b) instead
* of result of a - b avoids potential overflow. returns -1, 0, 1.
*/
return (lena > lenb) - (lena < lenb);
}
Now you can simply pass the collection of objects, the number of object, the size of each object and the function to use to sort the objects to qsort. It doesn't matter what you need to sort -- it works the same way every time. There is no reason you should ever need to "go write" a sort (except for educational purposes) -- that is what qsort is provided for. For example, here with sortedlines, all you need is:
qsort (sortedlines, n, sizeof *sortedlines, complength); /* sort by length */
Now you can display all lines by iterating through lines and display all lines in ascending line length through sortedlines. Obviously to display the first 5 lines, just iterate over the first 5 valid pointers in sortedlines. The same applies to opening another file for writing and writing those 5 lines to a new file. (that is left to you)
That's it. Is any of it difficult -- No. Is it trivial to do -- No. It is a basic part of programming in C that takes work to learn and to understand, but that is no different than anything worth learning. Putting all the pieces together in a working program to read and display all lines in a file and then sort and display the first 5 shortest lines you could do:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#define NSHORT 5 /* no. of shortest lines to display */
#define LINSZ 128 /* initial allocation size for each line */
/** compare funciton for qsort, takes pointer-to-element in a & b */
int complength (const void *a, const void *b)
{
/* a & b are pointer-to-pointer to char */
char *pa = *(char * const *)a, /* pa is pointer to string */
*pb = *(char * const *)b; /* pb is pointer to string */
size_t lena = strlen(pa), /* length of pa */
lenb = strlen(pb); /* length of pb */
/* for numeric types returing result of (a > b) - (a < b) instead
* of result of a - b avoids potential overflow. returns -1, 0, 1.
*/
return (lena > lenb) - (lena < lenb);
}
/** read line from 'fp' stored in allocated block assinged to '*s' and
* return length of string stored on success, on EOF with no characters
* read, or on failure, return -1. Block of memory sized to accommodate
* exact length of string with nul-terminating char. unless -1 returned,
* *s guaranteed to contain nul-terminated string (empty-string allowed).
* caller responsible for freeing allocated memory.
*/
ssize_t fgetcline (char **s, FILE *fp)
{
int c; /* char read from fp */
size_t n = 0, size = LINSZ; /* no. of chars and allocation size */
void *tmp = realloc (NULL, size); /* tmp pointer for realloc use */
if (!tmp) /* validate every allocation/reallocation */
return -1;
*s = tmp; /* assign reallocated block to pointer */
while ((c = fgetc(fp)) != '\n' && c != EOF) { /* read chars until \n or EOF */
if (n + 1 == size) { /* check if realloc required */
/* realloc using temporary pointer */
if (!(tmp = realloc (*s, size + LINSZ))) {
free (*s); /* on failure, free partial line */
return -1; /* return -1 */
}
*s = tmp; /* assign reallocated block to pointer */
size += LINSZ; /* update allocated size */
}
(*s)[n++] = c; /* assign char to index, increment */
}
(*s)[n] = 0; /* nul-terminate string */
if (n == 0 && c == EOF) { /* if nothing read and EOF, free mem return -1 */
free (*s);
return -1;
}
if ((tmp = realloc (*s, n + 1))) /* final realloc to exact length */
*s = tmp; /* assign reallocated block to pointer */
return (ssize_t)n; /* return length (excluding nul-terminating char) */
}
/** read each line from `fp` and store in allocated block returning pointer to
* allocateted block of pointers to each stored line with the final pointer
* after the last stored string set to NULL as a sentinel. 'n' is updated to
* the number of allocated and stored lines (excluding the sentinel NULL).
* returns valid pointer on success, NULL otherwise. caller is responsible for
* freeing both allocated lines and pointers.
*/
char **readfile (FILE *fp, size_t *n)
{
size_t nptrs = LINSZ; /* no. of allocated pointers */
char **lines = malloc (nptrs * sizeof *lines); /* allocated bock of pointers */
void *tmp = NULL; /* temp pointer for realloc use */
/* read each line from 'fp' into allocated block, assign to next pointer */
while (fgetcline (&lines[*n], fp) != -1) {
lines[++(*n)] = NULL; /* set next pointer NULL as sentinel */
if (*n + 1 >= nptrs) { /* check if realloc required */
/* allocate using temporary pointer to prevent memory leak on failure */
if (!(tmp = realloc (lines, 2 * nptrs * sizeof *lines))) {
perror ("realloc-lines");
return lines; /* return original poiner on failure */
}
lines = tmp; /* assign reallocated block to pointer */
nptrs *= 2; /* update no. of pointers allocated */
}
}
/* final realloc sizing exact no. of pointers required */
if (!(tmp = realloc (lines, (*n + 1) * sizeof *lines)))
return lines; /* return original block on failure */
return tmp; /* return updated block of pointers on success */
}
/** free all allocated memory (both lines and pointers) */
void freelines (char **lines, size_t nlines)
{
for (size_t i = 0; i < nlines; i++) /* loop over each pointer */
free (lines[i]); /* free allocated line */
free (lines); /* free pointers */
}
int main (int argc, char **argv) {
char **lines = NULL, /* pointer to allocated block of pointers */
**sortedlines = NULL; /* copy of lines pointers to sort by length */
size_t n = 0; /* no. of pointers with allocated lines */
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
if (!(lines = readfile (fp, &n))) /* read all lines in file, fill lines */
return 1;
if (fp != stdin) /* close file if not stdin */
fclose (fp);
/* alocate storage for copy of lines pointers (plus sentinel NULL) */
if (!(sortedlines = malloc ((n + 1) * sizeof *sortedlines))) {
perror ("malloc-sortedlines");
return 1;
}
/* copy pointers from lines to sorted lines (plus sentinel NULL) */
memcpy (sortedlines, lines, (n + 1) * sizeof *sortedlines);
qsort (sortedlines, n, sizeof *sortedlines, complength); /* sort by length */
/* output all lines from file (first screen) */
puts ("All lines:\n\nline : text");
for (size_t i = 0; i < n; i++)
printf ("%4zu : %s\n", i + 1, lines[i]);
/* output first five shortest lines (second screen) */
puts ("\n5 shortest lines:\n\nline : text");
for (size_t i = 0; i < (n >= NSHORT ? NSHORT : n); i++)
printf ("%4zu : %s\n", i + 1, sortedlines[i]);
freelines (lines, n); /* free all allocated memory for lines */
free (sortedlines); /* free block of pointers */
}
(note: the file reads from the filename passed as the first argument to the program, or from stdin if no argument is given)
Example Input File
$ cat dat/fleascatsdogs.txt
My dog
My fat cat
My snake
My dog has fleas
My cat has none
Lucky cat
My snake has scales
Example Use/Output
$ ./bin/fgetclinesimple dat/fleascatsdogs.txt
All lines:
line : text
1 : My dog
2 : My fat cat
3 : My snake
4 : My dog has fleas
5 : My cat has none
6 : Lucky cat
7 : My snake has scales
5 shortest lines:
line : text
1 : My dog
2 : My snake
3 : Lucky cat
4 : My fat cat
5 : My cat has none
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to ensure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/fgetclinesimple dat/fleascatsdogs.txt
==5900== Memcheck, a memory error detector
==5900== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==5900== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==5900== Command: ./bin/fgetclinesimple dat/fleascatsdogs.txt
==5900==
All lines:
line : text
1 : My dog
2 : My fat cat
3 : My snake
4 : My dog has fleas
5 : My cat has none
6 : Lucky cat
7 : My snake has scales
5 shortest lines:
line : text
1 : My dog
2 : My snake
3 : Lucky cat
4 : My fat cat
5 : My cat has none
==5900==
==5900== HEAP SUMMARY:
==5900== in use at exit: 0 bytes in 0 blocks
==5900== total heap usage: 21 allocs, 21 frees, 7,938 bytes allocated
==5900==
==5900== All heap blocks were freed -- no leaks are possible
==5900==
==5900== For counts of detected and suppressed errors, rerun with: -v
==5900== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
There is a lot here, and as with any "how do it do X?" question, the devil is always in the detail, the proper use of each function, the proper validation of each input or allocation/reallocation. Each part is just as important as the other to ensure your code does what you need it to do -- in a defined way. Look things over, take your time to digest the parts, and let me know if you have further questions.
If you are using Linux you can try to use getline instead of fgetc and fgets because getline takes care of memory allocation.
Example:
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
FILE *fp;
char *line = NULL;
size_t len = 0;
ssize_t read;
if (argc != 2)
{
printf("usage: rf <filename>\n");
exit(EXIT_FAILURE);
}
fp = fopen(argv[1], "r");
if (fp == NULL)
{
perror("fopen");
exit(EXIT_FAILURE);
}
while ((read = getline(&line, &len, fp)) != -1) {
printf("Retrieved line of length %zu :\n", read);
printf("%s", line);
}
free(line);
exit(EXIT_SUCCESS);
}
Is there a function which I can use that which will allow me to replace a specific texts.
For example:
char *test = "^Hello world^"; would be replaced with char *test = "<s>Hello world</s>";
Another example: char *test2 = "This is ~my house~ bud" would be replaced with char *test2 = "This is <b>my house</b> bud"
Before you can begin to replace substrings within a string, you have to understand what you are dealing with. In your example you want to know whether you can replace characters within a string, and you give as an example:
char *test = "^Hello world^";
By being declared and initialized as shown above test, is a string-literal created in read-only memory (on virtually all systems) and any attempt to modify characters stored in read-only memory invokes Undefined Behavior (and most likely a Segmentation Fault)
As noted in the comments, test could be declared and initialized as a character array, e.g. char test[] = "^Hello world^"; and insure that test is modifiable, but that does not address the problem where your replacement strings are longer than the substrings being replaced.
To handle the additional characters, you have two options (1) you can declare test[] to be sufficiently large to accommodate the substitutions, or (2) you can dynamically allocate storage for the replacement string, and realloc additional memory if you reach your original allocation limit.
For instance if you limit the code associated with test to a single function, you could declare test with a sufficient number of characters to handle the replacements, e.g.
#define MAXC 1024 /* define a constant for the maximum number of characters */
...
test[MAXC] = "^Hello world^";
You would then simply need to keep track of the original string length plus the number of character added with each replacement and insure that the total never exceeds MAXC-1 (reserving space for the nul-terminating character).
However, if you decided to move the replacement code to a separate function -- you now have the problem that you cannot return a pointer to a locally declared array (because the locally declared array is declared within the function stack space -- which is destroyed (released for reuse) when the function returns) A locally declared array has automatic storage duration. See: C11 Standard - 6.2.4 Storage durations of objects
To avoid the problem of a locally declared array not surviving the function return, you can simply dynamically allocate storage for your new string which results in the new string having allocated storage duration which is good for the life of the program, or until the memory is freed by calling free(). This allows you to declare and allocate storage for a new string within a function, make your substring replacements, and then return a pointer to the new string for use back in the calling function.
For you circumstance, a simple declaration of a new string within a function and allocating twice the amount of storage as the original string is a reasonable approach to take. (you still must keep track of the number of bytes of memory you use, but you then have the ability to realloc additional memory if you should reach your original allocation limit) This process can continue and accommodate any number of strings and substitutions, up to the available memory on your system.
While there are a number of ways to approach the substitutions, simply searching the original string for each substring, and then copying the text up to the substring to the new string, then copying the replacement substring allows you to "inch-worm" from the beginning to the end of your original string making replacement substitutions as you go. The only challenge you have is keeping track of the number of characters used (so you can reallocate if necessary) and advancing your read position within the original from the beginning to the end as you go.
Your example somewhat complicates the process by needing to alternate between one of two replacement strings as you work your way down the string. This can be handled with a simple toggle flag. (a variable you alternate 0,1,0,1,...) which will then determine the proper replacement string to use where needed.
The ternary operator (e.g. test ? if_true : if_false; can help reduce the number of if (test) { if_true; } else { if_false; } blocks you have sprinkled through your code -- it's up to you. If the if (test) {} format is more readable to you -- use that, otherwise, use the ternary.
The following example takes the (1) original string, (2) the find substring, (3) the 1st replacement substring, and (4) the 2nd replacement substring as arguments to the program. It allocates for the new string within the strreplace() function, makes the substitutions requested and returns a pointer to the new string to the calling function. The code is heavily commented to help you follow along, e.g.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/* replace all instances of 'find' in 's' with 'r1' and `r2`, alternating.
* allocate memory, as required, to hold string with replacements,
* returns allocated string with replacements on success, NULL otherwise.
*/
char *strreplace (const char *s, const char *find,
const char *r1, const char *r2)
{
const char *p = s, /* pointer to s */
*sp = s; /* 2nd substring pointer */
char *newstr = NULL, /* newsting pointer to allocate/return */
*np = newstr; /* pointer to newstring to fill */
size_t newlen = 0, /* length for newstr */
used = 0, /* amount of allocated space used */
slen = strlen (s), /* length of s */
findlen = strlen (find), /* length of find string */
r1len = strlen (r1), /* length of replace string 1 */
r2len = strlen (r2); /* length of replace string 2 */
int toggle = 0; /* simple 0/1 toggle flag for r1/r2 */
if (s == NULL || *s == 0) { /* validate s not NULL or empty */
fputs ("strreplace() error: input NULL or empty\n", stderr);
return NULL;
}
newlen = slen * 2; /* double length of s for newstr */
newstr = calloc (1, newlen); /* allocate twice length of s */
if (newstr == NULL) { /* validate ALL memory allocations */
perror ("calloc-newstr");
return NULL;
}
np = newstr; /* initialize newpointer to newstr */
/* locate each substring using strstr */
while ((sp = strstr (p, find))) { /* find beginning of each substring */
size_t len = sp - p; /* length to substring */
/* check if realloc needed? */
if (used + len + (toggle ? r2len : r1len) + 1 > newlen) {
void *tmp = realloc (newstr, newlen * 2); /* realloc to temp */
if (!tmp) { /* validate realloc succeeded */
perror ("realloc-newstr");
return NULL;
}
newstr = tmp; /* assign realloc'ed block to newstr */
newlen *= 2; /* update newlen */
}
strncpy (np, p, len); /* copy from pointer to substring */
np += len; /* advance newstr pointer by len */
*np = 0; /* nul-terminate (already done by calloc) */
strcpy (np, toggle ? r2 : r1); /* copy r2/r1 string to end */
np += toggle ? r2len : r1len; /* advance newstr pointer by r12len */
*np = 0; /* <ditto> */
p += len + findlen; /* advance p by len + findlen */
used += len + (toggle ? r2len : r1len); /* update used characters */
toggle = toggle ? 0 : 1; /* toggle 0,1,0,1,... */
}
/* handle segment of s after last find substring */
slen = strlen (p); /* get remaining length */
if (slen) { /* if not at end */
if (used + slen + 1 > newlen) { /* check if realloc needed? */
void *tmp = realloc (newstr, used + slen + 1); /* realloc */
if (!tmp) { /* validate */
perror ("realloc-newstr");
return NULL;
}
newstr = tmp; /* assign */
newlen += slen + 1; /* update (not required here, know why? */
}
strcpy (np, p); /* add final segment to string */
*(np + slen) = 0; /* nul-terminate */
}
return newstr; /* return newstr */
}
int main (int argc, char **argv) {
const char *s = NULL,
*find = NULL,
*r1 = NULL,
*r2 = NULL;
char *newstr = NULL;
if (argc < 5) { /* validate required no. or arguments given */
fprintf (stderr, "error: insufficient arguments,\n"
"usage: %s <find> <rep1> <rep2>\n", argv[0]);
return 1;
}
s = argv[1]; /* assign arguments to poitners */
find = argv[2];
r1 = argv[3];
r2 = argv[4];
newstr = strreplace (s, find, r1, r2); /* replace substrings in s */
if (newstr) { /* validate return */
printf ("oldstr: %s\nnewstr: %s\n", s, newstr);
free (newstr); /* don't forget to free what you allocate */
}
else { /* handle error */
fputs ("strreplace() returned NULL\n", stderr);
return 1;
}
return 0;
}
(above, the strreplace function uses pointers to walk ("inch-worm") down the original string making replacement, but you can use string indexes and index variables if that makes more sense to you)
(also note the use of calloc for the original allocation. calloc allocates and sets the new memory to all zero which can aid in insuring you don't forget to nul-terminate your string, but note any memory added by realloc will not be zeroed -- unless you manually zero it with memset or the like. The code above manually terminates the new string after each copy, so you can use either malloc or calloc for the allocation)
Example Use/Output
First example:
$ ./bin/str_substr_replace2 "^Hello world^" "^" "<s>" "</s>"
oldstr: ^Hello world^
newstr: <s>Hello world</s>
Second example:
$ ./bin/str_substr_replace2 "This is ~my house~ bud" "~" "<b>" "</b>"
oldstr: This is ~my house~ bud
newstr: This is <b>my house</b> bud
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/str_substr_replace2 "This is ~my house~ bud" "~" "<b>" "</b>"
==8962== Memcheck, a memory error detector
==8962== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==8962== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==8962== Command: ./bin/str_substr_replace2 This\ is\ ~my\ house~\ bud ~ \<b\> \</b\>
==8962==
oldstr: This is ~my house~ bud
newstr: This is <b>my house</b> bud
==8962==
==8962== HEAP SUMMARY:
==8962== in use at exit: 0 bytes in 0 blocks
==8962== total heap usage: 1 allocs, 1 frees, 44 bytes allocated
==8962==
==8962== All heap blocks were freed -- no leaks are possible
==8962==
==8962== For counts of detected and suppressed errors, rerun with: -v
==8962== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
Look things over and let me know if you have any further questions.
I am trying to create a dynamic character "string pointer"/array and my code will not print values is the characters typed in exceed 249 characters. I am just wondering if there is a maximum return length for a character array/"string pointer".
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char *input() {
char *inpt;
char check;
int i;
int count;
i = 0;
count = 1;
check = '\0';
inpt = (char*)malloc(1);
while (check != '\n') {
check = getchar();
if (check == '\n') {
break;
} else {
inpt[i] = check;
i++;
count++;
inpt = realloc(inpt, count);
}
}
inpt[i] = '\0';
char *retrn;
retrn = inpt;
free(inpt);
printf("%d \n", i);
return retrn;
}
int main(int argc, char **argv) {
char *name;
printf("Please print name: \n");
name = input();
printf("%s is the name \n", name);
return 0;
}
The problem is not with the length of the string you attempt to return, but that you return a pointer to memory that no longer is allocated to you:
char *retrn;
retrn = inpt;
free(inpt);
return retrn;
When you do retrn = inpt you don't copy the memory, instead you have two pointers pointing to the same memory. Then you free that memory and return a pointer to the newly free'd memory. That pointer can't of course not be dereferenced and any attempt of doing that will lead to undefined behavior.
The solution is not any temporary variable like retrn, but to simply not free the memory in the input function. Instead return inpt and in the calling function (main in your case) you free the memory.
Continuing from my comment, there are a number of schemes to allocate memory dynamically. One thing you want to avoid from an efficiency standpoint is needlessly reallocating for every character. Rather than call realloc for every character added to name, allocate a reasonable number of characters to hold name, and if you reach that amount, then reallocate, doubling the current allocation size, update your variable holding the current size and keep going.
You already have an array index, so there is no need to keep a separate count. Just use your array index as the counter, insuring your have at least index + 1 characters available to provide space to nul-terminate inpt.
There is no need to keep separate pointers in input(). Just allocate for inpt and return inpt as the pointer to your block of memory when done. (don't forget to free (name); in main() which will free the memory you allocated in input.
Never realloc the pointer directly. (e.g. DON'T inpt = realloc (inpt, size);) If realloc fails it returns NULL causing the loss of a pointer to to the allocated block inpt referenced prior to the realloc call. Instead use a temporary pointer, validate that realloc succeeded, and then assign the new block to inpt (example below)
Putting it altogether, you could do something similar to:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MEMSZ 32 /* initial allocation size (must be at least 1) */
char *input (void)
{
char *inpt = NULL, check;
size_t mem = MEMSZ, ndx = 0;
if (!(inpt = malloc (mem))) { /* allocate/validate mem chars */
fprintf (stderr, "input() error: virtual memory exhausted.\n");
return NULL;
}
/* you must check for EOF in addition to '\n' */
while ((check = getchar()) && check != '\n' && check != EOF)
{ /* check index + 1 to insure space to nul-terminate */
if (ndx + 1 == mem) { /* if mem limit reached realloc */
void *tmp = realloc (inpt, mem * 2); /* use tmp ptr */
if (!tmp) { /* validate reallocation */
fprintf (stderr, "realloc(): memory exhausted.\n");
break; /* on failure, preserve existing chars */
}
inpt = tmp; /* assign new block of memory to inpt */
mem *= 2; /* set mem to new allocaiton size */
}
inpt[ndx++] = check; /* assign, increment index */
}
inpt[ndx] = 0; /* nul-terminate */
return inpt; /* return pointer to allocated block */
}
int main (void)
{
char *name = NULL;
printf ("Please enter name: ");
if (!(name = input())) /* validate input() succeeded */
return 1;
printf ("You entered : %s\n", name);
free (name); /* don't forget to free name */
return 0;
}
Example Use/Output
$ ./bin/entername
Please enter name: George Charles Butte
You entered : George Charles Butte
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to insure you do not attempt to write beyond/outside the bounds of your allocated block of memory, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/entername
==2566== Memcheck, a memory error detector
==2566== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==2566== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==2566== Command: ./bin/entername
==2566==
Please enter name: George Charles Butte
You entered : George Charles Butte
==2566==
==2566== HEAP SUMMARY:
==2566== in use at exit: 0 bytes in 0 blocks
==2566== total heap usage: 1 allocs, 1 frees, 32 bytes allocated
==2566==
==2566== All heap blocks were freed -- no leaks are possible
==2566==
==2566== For counts of detected and suppressed errors, rerun with: -v
==2566== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
Let me know if you have any additional questions.
It's most likely due to using free memory. Your assignment input to retrn is not creating another copy. You'll get undefined behaviour, perhaps including what you are experiencing.
Is there a maximum return length for a character array in C
The maximum size of a character array is SIZE_MAX. SIZE_MAX is at least 65535.
I am just wondering if there is a maximum return length for a character array/"string pointer".
For a string, its maximum size is SIZE_MAX and the maximum length is SIZE_MAX - 1.
There are intrinsic limits for the size of an array:
available memory is limited: malloc() and realloc() may return NULL if the request cannot be honored due to lack of core memory. You should definitely check for malloc() and realloc() success.
system quotas may limit the amount of memory available to your process to a lower number than actual physical memory installed or virtual memory accessible in the system.
the maximum size for an array is the maximum value for the type size_t: SIZE_MAX which has a minimum value of 65535, but you use type int for your requests to malloc() or realloc(), that may have a smaller range than size_t. Type int is 32 bits on most current desktop systems where size_t may be 64 bits and available memory may be much more than 2GB. Use size_t instead of int.
Note however that your problem comes from a much simpler bug: you free the memory block you allocated for the string and return a copy of the pointer, which now points to freed memory. Accessing this memory has undefined behavior, which can be anything, including apparent correct behavior upto 249 bytes and failure beyond.
Note also that you should use type int for check and compare the return value of getchar() to EOF to avoid an endless loop if the input does not contain a newline (such as en empty file).
Here is a corrected version:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char *input(void) {
char *p = malloc(1); /* simplistic reallocation, 1 byte at a time */
size_t i = 0; /* use size_t for very large input */
int c; /* use int to detect EOF reliably */
if (p == NULL) {
return NULL; /* allocation error */
}
while ((c = getchar()) != EOF && c != '\n') {
char *newp = realloc(p, i + 2);
if (newp == NULL) {
free(p); /* avoid a memory leak */
return NULL; /* reallocation error */
}
p = newp;
p[i++] = c;
}
if (i == 0 && c == EOF) {
free(p);
return NULL; /* end of file */
}
p[i] = '\0';
return p;
}
int main(int argc, char **argv) {
char *name;
printf("Please print name: ");
name = input();
if (name == NULL) {
printf("input() returned NULL\n");
} else {
printf("%s is the name\n", name);
free(name);
}
return 0;
}