While searching through this board for information about reading a full file into memory using C, I came across a use of fread() that I haven't seen before. I'm trying to understand it.
My questions are:
Is there a name/term for what is being done here?
What is happening when the size_t used is being added to the char *data and how is this considered a valid void *ptr by fread?
I'm going to put the code from the author's post in here and I'll link to the post as well. Unfortunately, the post is old, locked, and I don't have enough points here to leave a comment asking for clarification on it.
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
/* Size of each input chunk to be
read and allocate for. */
#ifndef READALL_CHUNK
#define READALL_CHUNK 262144
#endif
#define READALL_OK 0 /* Success */
#define READALL_INVALID -1 /* Invalid parameters */
#define READALL_ERROR -2 /* Stream error */
#define READALL_TOOMUCH -3 /* Too much input */
#define READALL_NOMEM -4 /* Out of memory */
/* This function returns one of the READALL_ constants above.
If the return value is zero == READALL_OK, then:
(*dataptr) points to a dynamically allocated buffer, with
(*sizeptr) chars read from the file.
The buffer is allocated for one extra char, which is NUL,
and automatically appended after the data.
Initial values of (*dataptr) and (*sizeptr) are ignored.
*/
int readall(FILE *in, char **dataptr, size_t *sizeptr)
{
char *data = NULL, *temp;
size_t size = 0;
size_t used = 0;
size_t n;
/* None of the parameters can be NULL. */
if (in == NULL || dataptr == NULL || sizeptr == NULL)
return READALL_INVALID;
/* A read error already occurred? */
if (ferror(in))
return READALL_ERROR;
while (1) {
if (used + READALL_CHUNK + 1 > size) {
size = used + READALL_CHUNK + 1;
/* Overflow check. Some ANSI C compilers
may optimize this away, though. */
if (size <= used) {
free(data);
return READALL_TOOMUCH;
}
temp = realloc(data, size);
if (temp == NULL) {
free(data);
return READALL_NOMEM;
}
data = temp;
}
n = fread(data + used, 1, READALL_CHUNK, in);
if (n == 0)
break;
used += n;
}
if (ferror(in)) {
free(data);
return READALL_ERROR;
}
temp = realloc(data, used + 1);
if (temp == NULL) {
free(data);
return READALL_NOMEM;
}
data = temp;
data[used] = '\0';
*dataptr = data;
*sizeptr = used;
return READALL_OK;
}
Link: C Programming: How to read the whole file contents into a buffer
What is happening when the size_t used is being added to the char *data and how is this considered a valid void *ptr by fread?
In practice(*), a pointer is just a number, which references an address in (virtual) memory. What's being done here is simple pointer arithmetic: You can add an integer to a pointer, which increases its value, so if your pointer pointed to address 1000 and you add 20, it now points to address 1020. Since used is always the number of bytes read so far, you point this many bytes into the data buffer.
But there's one more thing: This only works as described if the data type of the pointer has a size of 1 byte (as char does(*)). Because when you do pointer arithmetic, you don't increase the pointer by that many bytes, but really by multiples of the data type's size, so you always end up pointing to the start of an element in your array, and not somewhere in the middle if you're dealing with int. I.e. if you have int *x which points to address 1000, and you do x += 20, then x will point to address 1080 now, which is where x[20] would be located.
and how is this considered a valid void *ptr by fread?
Considering "pointers are just numbers", fread doesn't care how you arrived at that pointer value. As long as there is valid memory to write to, it will happily accept whatever you pass it.
(*) Assuming a modern architecture accessible by mere mortals.
Related
Imagine having this code and you don't really know what to except in the pointer of char (a string terminated or an array of chars (string not terminated)), is possibile to use strlen function on a safe way that handle not terminated string? (prevent overflows if the input is not a string terminanted) or can you fix it only by knowing the size of what you pass in input? so the function will become foo(char *c, size_t MAXSIZE)?
void foo(char *c) {
a = strlen(c);
}
We never know what really is behind a pointer. It could also only be a pointer to a single character.
Even by passing a size, you could imagine someone passing a bad pointer and an unrelated value.
C is not a safe language, it does not have runtime type checks. You can do anything you want, you can't prevent others from doing anything they want with your functions.
It's not safe to use just strlen if it's not a null-terminated byte string. According to cppreference:
The behavior is undefined if str is not a pointer to a null-terminated byte string.
If you want to cover the case of not null-terminated byte string then you should use the size_t strnlen_s( const char *str, size_t strsz ), which works just like normal strlen with the exception:
that the function returns zero if str is a null pointer and returns strsz if the null character was not found in the first strsz bytes of str.
It is not possible unless you pass the size.
You can avoid crashing, at least - by asking the operating system how much readable memory there is at this address. (Windows: call VirtualQuery. Linux: read /proc/self/maps). But it's not helpful. There can be lots of readable memory after your string that's totally unrelated to your string but just happened to get allocated after it. Finding out how much memory is safe to read doesn't tell you how long the string is.
To check if the buffer contains a null terminator within a maximum number of characters, the memchr function can be used. The following function safe_strlen behaves like the strlen_s function defined by Annex K of the C specification, and uses memchr to find the position of the first (if any) null terminator in the buffer.
#include <stdint.h>
#include <string.h>
/**
* Get the length of a possibly null terminated string, clamped to a maximum.
*
* If \p s is not NULL, searches up to \p maxsize bytes from \p s to find the
* first null terminator, if any.
*
* \param s Start of string.
* \param maxsize Maximum number of bytes to search.
*
* \return 0 if \p s is \c NULL.
* \return \p maxsize if null terminator not found.
* \return length of null terminated string if null terminator found.
*/
size_t safe_strlen(const char *s, size_t maxsize)
{
size_t length = 0;
if (s)
{
#if PTRDIFF_MAX < SIZE_MAX
/* May need to search the buffer in chunks. */
while (maxsize)
#endif
{
const char *e;
size_t pos;
#if PTRDIFF_MAX < SIZE_MAX
if (maxsize > PTRDIFF_MAX)
{
/* Limit size of chunk. */
pos = PTRDIFF_MAX;
}
else
#endif
{
/* This is the final chunk. */
pos = maxsize;
}
/* Search for null terminator in chunk. */
e = memchr(s, 0, pos);
if (e) {
/* Null terminator found. */
pos = e - s; /* position of null terminator in chunk */
#if PTRDIFF_MAX < SIZE_MAX
/* Make this the final chunk. */
maxsize = pos;
#endif
}
/* Update returned length. */
length += pos;
#if PTRDIFF_MAX < SIZE_MAX
/* Advance to next chunk. */
s += pos;
maxsize -= pos;
#endif
}
}
return length;
}
The code is complicated by the necessity to deal with buffer sizes larger than PTRDIFF_MAX if PTRDIFF_MAX is less than SIZE_MAX. The core functionality without the extra safety checks is as follows:
/* less safe version of the above - may result in undefined behavior. */
size_t less_safe_strlen(const char *s, size_t maxsize)
{
size_t length = 0;
if (s)
{
const char *e = memchr(s, 0, maxsize);
if (e)
{
length = e - s;
}
else
{
length = maxsize;
}
}
return length;
}
When we allocating memory spaces for a string, do the following 2 ways give the same result?
char *s = "abc";
char *st1 = (char *)malloc(sizeof(char)*strlen(s));
char *st2 = (char *)malloc(sizeof(s));
In other words, does allocate the memory based on the size of its characters give the same result as allocating based on the size of the whole string?
If I do use the later method, is it still possible for me to add to that memory spaces character by character such as:
*st = 'a';
st++;
*st = 'b';
or do I have to add a whole string at once now?
Let's see if we can't get you straightened out on your question and on allocating (and reallocating) storage. To begin, when you declare:
char *s = "abc";
You have declared a pointer to char s and you have assigned the starting address for the String Literal "abc" to the pointer s. Whenever you attempt to use sizeof() on a_pointer, you get sizeof(a_pointer) which is typically 8-bytes on x86_64 (or 4-bytes on x86, etc..)
If you take sizeof("abc"); you are taking the size of a character array with size 4 (e.g. {'a', 'b', 'c', '\0'}), because a string literal is an array of char initialized to hold the string "abc" (including the nul-terminating character). Also note, that on virtually all systems, a string literal is created in read-only memory and cannot be modified, it is immutable.
If you want to allocate storage to hold a copy of the string "abc", you must allocate strlen("abc") + 1 characters (the +1 for the nul-terminating character '\0' -- which is simply ASCII 0, see ASCII Table & Description.
Whenever you allocate memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed. So if you allocate for char *st = malloc (len + 1); characters, you do not want to iterate with the pointer st (e.g. no st++). Instead, declare a second pointer, char *p = st; and you are free to iterate with p.
Also, in C, there is no need to cast the return of malloc, it is unnecessary. See: Do I cast the result of malloc?.
If you want to add to an allocation, you use realloc() which will create a new block of memory for you and copy your existing block to it. When using realloc(), you always reallocate using a temporary pointer (e.g. don't st = realloc (st, new_size);) because if when realloc() fails, it returns NULL and if you assign that to your pointer st, you have just lost the original pointer and created a memory leak. Instead, use a temporary pointer, e.g. void *tmp = realloc (st, new_size); then validate realloc() succeeds before assigning st = tmp;
Now, reading between the lines that is where you are going with your example, the following shows how that can be done, keeping track of the amount of memory allocated and the amount of memory used. Then when used == allocated, you reallocate more memory (and remembering to ensure you have +1 bytes available for the nul-terminating character.
A short example would be:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define THISMANY 23
int main (void) {
char *s = "abc", *st, *p; /* string literal and pointer st */
size_t len = strlen(s), /* length of s */
allocated = len + 1, /* number of bytes in new block allocated */
used = 0; /* number of bytes in new block used */
st = malloc (allocated); /* allocate storage for copy of s */
p = st; /* pointer to allocate, preserve st */
if (!st) { /* validate EVERY allocation */
perror ("malloc-st");
return 1;
}
for (int i = 0; s[i]; i++) { /* copy s to new block of memory */
*p++ = s[i]; /* (could use strcpy) */
used++; /* advance counter */
}
*p = 0; /* nul-terminate copy */
for (size_t i = 0; i < THISMANY; i++) { /* loop THISMANY times */
if (used + 1 == allocated) { /* check if realloc needed (remember '\0') */
/* always realloc using temporary pointer */
void *tmp = realloc (st, 2 * allocated); /* realloc 2X current */
if (!tmp) { /* validate EVERY reallocation */
perror ("realloc-st");
break; /* don't exit, original st stil valid */
}
st = tmp; /* assign reallocated block to st */
allocated *= 2; /* update allocated amount */
}
*p++ = 'a' + used++; /* assign new char, increment used */
}
*p = 0; /* nul-terminate */
printf ("result st : %s\n" /* output final string, length, allocated */
"length st : %zu bytes\n"
"final size : %zu bytes\n", st, strlen(st), allocated);
free (st); /* don't forget to free what you have allocated */
}
Example Use/Output
$ ./bin/sizeofs
result st : abcdefghijklmnopqrstuvwxyz
length st : 26 bytes
final size : 32 bytes
Look things over and let me know if this answered your questions, and if not, leave a comment and I'm happy to help further.
If you are still shaky on what a pointer is, and would like more information, here are a few links that provide basic discussions of pointers that may help. Difference between char pp and (char) p? and Pointer to pointer of structs indexing out of bounds(?)... (ignore the titles, the answers discuss pointer basics)
I have a question about reading a file character by character and counting it in C
here's my code down below
void read_in(char** quotes){
FILE *frp = fopen(IN_FILE, "r");
char c;
size_t tmp_len =0, i=0;
//char* tmp[100];
//char* quotes[MAX_QUOTES];
//char str = fgets(str, sizeof(quotes),frp);
while((c=fgetc(frp)) != EOF){
if(frp == NULL){
printf("File is empty!");
fclose(frp); exit(1);
}
else{
if(c != '\n'){
printf("%c",c);
c=fgetc(frp);
tmp_len++;
}
}
char* tmp = (char*)calloc(tmp_len+1, sizeof(char));
fgets(tmp, sizeof(tmp), frp);
strcpy((char*)quotes[i], tmp);
printf("%s\n", (char*)quotes[i]);
i++;
}
}
It doesn't work but I don't understand why.
Thank you
From your question and through the comments, it is relatively clear you want to read all quotes (lines) in a file into dynamically allocated storage (screen 1) and then sort the lines by length and output the first 5 shortest lines (screen 2) saving the 5 shortest lines to a second output file (this part is left to you). Reading and storing all lines from a file isn't difficult -- but it isn't trivial either. It sounds basic, and it is, but it requires that you use all of the basic tools needed to interface with persistent storage (reading the file from disk/storage media) and your computer's memory subsystem (RAM) -- correctly.
Reading each line from a file isn't difficult, but like anything in C, it requires you to pay attention to the details. You can read from a file using character-oriented input functions (fgetc(), getc(), etc..), you can use formatted-input functions (fscanf()) and you can use line-oriented input functions such as (fgets() or POSIX getline()). Reading lines from a file is generally done with line-oriented functions, but there is nothing wrong with using a character-oriented approach either. In fact you can relatively easily write a function based around fgetc() that will read each line from a file for you.
In the trivial case where you know the maximum number of characters for the longest line in the file, you can use a 2D array of characters to store the entire file. This simplifies the process by eliminating the need to allocate storage dynamically, but has a number of disadvantages like each line in the file requiring the same storage as the longest line in the file, and by limiting the size of the file that can be stored to the size of your program stack. Allocating storage dynamically with (malloc, calloc, or realloc) eliminates these disadvantages and inefficiencies allowing you to store files up to the limit of the memory available on your computer. (there are methods that allow both to handle files of any size by using sliding-window techniques well beyond your needs here)
There is nothing difficult about handling dynamically allocated memory, or in copying or storing data within it on a character-by-character basis. That said, the responsibility for each allocation, tracking the amount of data written to each allocated block, reallocating to resize the block to ensure no data is written outside the bounds of each block and then freeing each allocated block when it is no longer needed -- is yours, the programmer. C gives the programmer the power to use each byte of memory available, and also places on the programmer the responsibility to use the memory correctly.
The basic approach to storing a file is simple. You read each line from the file, allocating/reallocating storage for each character until a '\n' or EOF is encountered. To coordinate all lines, you allocate a block of pointers, and you assign the address for each block of memory holding a line to a pointer, in sequence, reallocating the number of pointers required as needed to hold all lines.
Sometimes a picture really is worth 1000 words. With the basic approach you declare a pointer (to what?) a pointer so you can allocate a block of memory containing pointers to which you will assign each allocated line. For example, you could declare, char **lines; A pointer-to-pointer is a single pointer that points to a block of memory containing pointers. Then the type for each pointer for lines will be char * which will point to each block holding a line from the file, e.g.
char **lines;
|
| allocated
| pointers allocated blocks holding each line
lines --> +----+ +-----+
| p1 | --> | cat |
+----+ +-----+--------------------------------------+
| p2 | --> | Four score and seven years ago our fathers |
+----+ +-------------+------------------------------+
| p3 | --> | programming |
+----+ +-------------------+
| .. | | ... |
+----+ +-------------------+
| pn | --> | last line read |
+----+ +----------------+
You can make lines a bit more flexible to use by allocating 1 additional pointer and initializing that pointer to NULL which allows you to iterate over lines without knowing how many lines there are -- until NULL is encountered, e.g.
| .. | | ... |
+----+ +-------------------+
| pn | --> | last line read |
+----+ +----------------+
|pn+1| | NULL |
+----+ +------+
While you can put this all together in a single function, to help the learning process (and just for practical reusability), it is often easier to break this up into two function. One that reads and allocates storage for each line, and a second function that basically calls the first function, allocating pointers and assigning the address for each allocated block of memory holding a line read from the file to the next pointer in turn. When you are done, you have an allocated block of pointers where each of the pointers holds the address of (points to) an allocated block holding a line from the file.
You have indicated you want to read from the file with fgetc() and read a character at a time. There is nothing wrong with that, and there is little penalty to this approach since the underlying I/O subsystem provides a read-buffer that you are actually reading from rather than reading from disk one character at-a-time. (the size varies between compilers, but is generally provided through the BUFSIZ macro, both Linux and Windows compilers provide this)
There are virtually an unlimited number of ways to write a function that allocates storage to hold a line and then reads a line from the file one character at-a-time until a '\n' or EOF is encountered. You can return a pointer to the allocated block holding the line and pass a pointer parameter to be updated with the number of characters contained in the line, or you can have the function return the line length and pass the address-of a pointer as a parameter to be allocated and filled within the function. It is up to you. One way would be:
#define NSHORT 5 /* no. of shortest lines to display */
#define LINSZ 128 /* initial allocation size for each line */
...
/** read line from 'fp' stored in allocated block assinged to '*s' and
* return length of string stored on success, on EOF with no characters
* read, or on failure, return -1. Block of memory sized to accommodate
* exact length of string with nul-terminating char. unless -1 returned,
* *s guaranteed to contain nul-terminated string (empty-string allowed).
* caller responsible for freeing allocated memory.
*/
ssize_t fgetcline (char **s, FILE *fp)
{
int c; /* char read from fp */
size_t n = 0, size = LINSZ; /* no. of chars and allocation size */
void *tmp = realloc (NULL, size); /* tmp pointer for realloc use */
if (!tmp) /* validate every allocation/reallocation */
return -1;
*s = tmp; /* assign reallocated block to pointer */
while ((c = fgetc(fp)) != '\n' && c != EOF) { /* read chars until \n or EOF */
if (n + 1 == size) { /* check if realloc required */
/* realloc using temporary pointer */
if (!(tmp = realloc (*s, size + LINSZ))) {
free (*s); /* on failure, free partial line */
return -1; /* return -1 */
}
*s = tmp; /* assign reallocated block to pointer */
size += LINSZ; /* update allocated size */
}
(*s)[n++] = c; /* assign char to index, increment */
}
(*s)[n] = 0; /* nul-terminate string */
if (n == 0 && c == EOF) { /* if nothing read and EOF, free mem return -1 */
free (*s);
return -1;
}
if ((tmp = realloc (*s, n + 1))) /* final realloc to exact length */
*s = tmp; /* assign reallocated block to pointer */
return (ssize_t)n; /* return length (excluding nul-terminating char) */
}
(note: the ssize_t is a signed type providing the range of size_t that essentially allows the return of -1. it is provided in the sys/types.h header. you can adjust the type as desired)
The fgetclines() function makes one final call to realloc to shrink the size of the allocation to the exact number of characters needed to hold the line and the nul-terminating character.
The function called to read all lines in the file while allocation and reallocating pointers as required does essentially the same thing as the fgetclines() function above does for characters. It simply allocates some initial number of pointers and then begins reading lines from the file, reallocating twice the number of pointers each time it is needed. It also adds one additional pointer to hold NULL as a sentinel that will allow iterating over all pointers until NULL is reached (this is optional). The parameter n is updated to with the number of lines stored to make that available back in the calling function. This function too can be written in a number of different ways, one would be:
/** read each line from `fp` and store in allocated block returning pointer to
* allocateted block of pointers to each stored line with the final pointer
* after the last stored string set to NULL as a sentinel. 'n' is updated to
* the number of allocated and stored lines (excluding the sentinel NULL).
* returns valid pointer on success, NULL otherwise. caller is responsible for
* freeing both allocated lines and pointers.
*/
char **readfile (FILE *fp, size_t *n)
{
size_t nptrs = LINSZ; /* no. of allocated pointers */
char **lines = malloc (nptrs * sizeof *lines); /* allocated bock of pointers */
void *tmp = NULL; /* temp pointer for realloc use */
/* read each line from 'fp' into allocated block, assign to next pointer */
while (fgetcline (&lines[*n], fp) != -1) {
lines[++(*n)] = NULL; /* set next pointer NULL as sentinel */
if (*n + 1 >= nptrs) { /* check if realloc required */
/* allocate using temporary pointer to prevent memory leak on failure */
if (!(tmp = realloc (lines, 2 * nptrs * sizeof *lines))) {
perror ("realloc-lines");
return lines; /* return original poiner on failure */
}
lines = tmp; /* assign reallocated block to pointer */
nptrs *= 2; /* update no. of pointers allocated */
}
}
/* final realloc sizing exact no. of pointers required */
if (!(tmp = realloc (lines, (*n + 1) * sizeof *lines)))
return lines; /* return original block on failure */
return tmp; /* return updated block of pointers on success */
}
Note above, the function takes an open FILE* parameter for the file rather than taking a filename to open within the function. You generally want to open the file in the calling function and validate that it is open for reading before calling a function to read all the lines. If the file cannot be opened in the caller, there is no reason to make the function all to read the line from the file to begin with.
With a way to read an store all lines from your file done, you next need to turn to sorting the lines by length so you can output the 5 shortest lines (quotes). Since you will normally want to preserve the lines from your file in-order, the easiest way to sort the lines by length while preserving the original order is just to make a copy of the pointers and sort the copy of pointers by line length. For example, your lines pointer can continue to contain the pointers in original order, while the set of pointers sortedlines can hold the pointers in order sorted by line length, e.g.
int main (int argc, char **argv) {
char **lines = NULL, /* pointer to allocated block of pointers */
**sortedlines = NULL; /* copy of lines pointers to sort by length */
After reading the file and filling the lines pointer, you can copy the pointers to sortedlines (including the sentinel NULL), e.g.
/* alocate storage for copy of lines pointers (plus sentinel NULL) */
if (!(sortedlines = malloc ((n + 1) * sizeof *sortedlines))) {
perror ("malloc-sortedlines");
return 1;
}
/* copy pointers from lines to sorted lines (plus sentinel NULL) */
memcpy (sortedlines, lines, (n + 1) * sizeof *sortedlines);
Then you simply call qsort to sort the pointers in sortedlines by length. Your only job with qsort is to write the *compare` function. The prototype for the compare function is:
int compare (const void *a, const void *b);
Both a and b will be pointers-to elements being sorted. In your case with char **sortedlines;, the elements will be pointer-to-char, so a and b will both have type pointer-to-pointer to char. You simply write a compare function so it will return less than zero if the length of line pointed to by a is less than b (already in the right order), return zero if the length is the same (no action needed) and return greater than zero if the length of a is greater than b (a swap is required). Writing the compare a the difference of two conditionals rather than simple a - b will prevent all potential overflow, e.g.
/** compare funciton for qsort, takes pointer-to-element in a & b */
int complength (const void *a, const void *b)
{
/* a & b are pointer-to-pointer to char */
char *pa = *(char * const *)a, /* pa is pointer to string */
*pb = *(char * const *)b; /* pb is pointer to string */
size_t lena = strlen(pa), /* length of pa */
lenb = strlen(pb); /* length of pb */
/* for numeric types returing result of (a > b) - (a < b) instead
* of result of a - b avoids potential overflow. returns -1, 0, 1.
*/
return (lena > lenb) - (lena < lenb);
}
Now you can simply pass the collection of objects, the number of object, the size of each object and the function to use to sort the objects to qsort. It doesn't matter what you need to sort -- it works the same way every time. There is no reason you should ever need to "go write" a sort (except for educational purposes) -- that is what qsort is provided for. For example, here with sortedlines, all you need is:
qsort (sortedlines, n, sizeof *sortedlines, complength); /* sort by length */
Now you can display all lines by iterating through lines and display all lines in ascending line length through sortedlines. Obviously to display the first 5 lines, just iterate over the first 5 valid pointers in sortedlines. The same applies to opening another file for writing and writing those 5 lines to a new file. (that is left to you)
That's it. Is any of it difficult -- No. Is it trivial to do -- No. It is a basic part of programming in C that takes work to learn and to understand, but that is no different than anything worth learning. Putting all the pieces together in a working program to read and display all lines in a file and then sort and display the first 5 shortest lines you could do:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#define NSHORT 5 /* no. of shortest lines to display */
#define LINSZ 128 /* initial allocation size for each line */
/** compare funciton for qsort, takes pointer-to-element in a & b */
int complength (const void *a, const void *b)
{
/* a & b are pointer-to-pointer to char */
char *pa = *(char * const *)a, /* pa is pointer to string */
*pb = *(char * const *)b; /* pb is pointer to string */
size_t lena = strlen(pa), /* length of pa */
lenb = strlen(pb); /* length of pb */
/* for numeric types returing result of (a > b) - (a < b) instead
* of result of a - b avoids potential overflow. returns -1, 0, 1.
*/
return (lena > lenb) - (lena < lenb);
}
/** read line from 'fp' stored in allocated block assinged to '*s' and
* return length of string stored on success, on EOF with no characters
* read, or on failure, return -1. Block of memory sized to accommodate
* exact length of string with nul-terminating char. unless -1 returned,
* *s guaranteed to contain nul-terminated string (empty-string allowed).
* caller responsible for freeing allocated memory.
*/
ssize_t fgetcline (char **s, FILE *fp)
{
int c; /* char read from fp */
size_t n = 0, size = LINSZ; /* no. of chars and allocation size */
void *tmp = realloc (NULL, size); /* tmp pointer for realloc use */
if (!tmp) /* validate every allocation/reallocation */
return -1;
*s = tmp; /* assign reallocated block to pointer */
while ((c = fgetc(fp)) != '\n' && c != EOF) { /* read chars until \n or EOF */
if (n + 1 == size) { /* check if realloc required */
/* realloc using temporary pointer */
if (!(tmp = realloc (*s, size + LINSZ))) {
free (*s); /* on failure, free partial line */
return -1; /* return -1 */
}
*s = tmp; /* assign reallocated block to pointer */
size += LINSZ; /* update allocated size */
}
(*s)[n++] = c; /* assign char to index, increment */
}
(*s)[n] = 0; /* nul-terminate string */
if (n == 0 && c == EOF) { /* if nothing read and EOF, free mem return -1 */
free (*s);
return -1;
}
if ((tmp = realloc (*s, n + 1))) /* final realloc to exact length */
*s = tmp; /* assign reallocated block to pointer */
return (ssize_t)n; /* return length (excluding nul-terminating char) */
}
/** read each line from `fp` and store in allocated block returning pointer to
* allocateted block of pointers to each stored line with the final pointer
* after the last stored string set to NULL as a sentinel. 'n' is updated to
* the number of allocated and stored lines (excluding the sentinel NULL).
* returns valid pointer on success, NULL otherwise. caller is responsible for
* freeing both allocated lines and pointers.
*/
char **readfile (FILE *fp, size_t *n)
{
size_t nptrs = LINSZ; /* no. of allocated pointers */
char **lines = malloc (nptrs * sizeof *lines); /* allocated bock of pointers */
void *tmp = NULL; /* temp pointer for realloc use */
/* read each line from 'fp' into allocated block, assign to next pointer */
while (fgetcline (&lines[*n], fp) != -1) {
lines[++(*n)] = NULL; /* set next pointer NULL as sentinel */
if (*n + 1 >= nptrs) { /* check if realloc required */
/* allocate using temporary pointer to prevent memory leak on failure */
if (!(tmp = realloc (lines, 2 * nptrs * sizeof *lines))) {
perror ("realloc-lines");
return lines; /* return original poiner on failure */
}
lines = tmp; /* assign reallocated block to pointer */
nptrs *= 2; /* update no. of pointers allocated */
}
}
/* final realloc sizing exact no. of pointers required */
if (!(tmp = realloc (lines, (*n + 1) * sizeof *lines)))
return lines; /* return original block on failure */
return tmp; /* return updated block of pointers on success */
}
/** free all allocated memory (both lines and pointers) */
void freelines (char **lines, size_t nlines)
{
for (size_t i = 0; i < nlines; i++) /* loop over each pointer */
free (lines[i]); /* free allocated line */
free (lines); /* free pointers */
}
int main (int argc, char **argv) {
char **lines = NULL, /* pointer to allocated block of pointers */
**sortedlines = NULL; /* copy of lines pointers to sort by length */
size_t n = 0; /* no. of pointers with allocated lines */
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
if (!(lines = readfile (fp, &n))) /* read all lines in file, fill lines */
return 1;
if (fp != stdin) /* close file if not stdin */
fclose (fp);
/* alocate storage for copy of lines pointers (plus sentinel NULL) */
if (!(sortedlines = malloc ((n + 1) * sizeof *sortedlines))) {
perror ("malloc-sortedlines");
return 1;
}
/* copy pointers from lines to sorted lines (plus sentinel NULL) */
memcpy (sortedlines, lines, (n + 1) * sizeof *sortedlines);
qsort (sortedlines, n, sizeof *sortedlines, complength); /* sort by length */
/* output all lines from file (first screen) */
puts ("All lines:\n\nline : text");
for (size_t i = 0; i < n; i++)
printf ("%4zu : %s\n", i + 1, lines[i]);
/* output first five shortest lines (second screen) */
puts ("\n5 shortest lines:\n\nline : text");
for (size_t i = 0; i < (n >= NSHORT ? NSHORT : n); i++)
printf ("%4zu : %s\n", i + 1, sortedlines[i]);
freelines (lines, n); /* free all allocated memory for lines */
free (sortedlines); /* free block of pointers */
}
(note: the file reads from the filename passed as the first argument to the program, or from stdin if no argument is given)
Example Input File
$ cat dat/fleascatsdogs.txt
My dog
My fat cat
My snake
My dog has fleas
My cat has none
Lucky cat
My snake has scales
Example Use/Output
$ ./bin/fgetclinesimple dat/fleascatsdogs.txt
All lines:
line : text
1 : My dog
2 : My fat cat
3 : My snake
4 : My dog has fleas
5 : My cat has none
6 : Lucky cat
7 : My snake has scales
5 shortest lines:
line : text
1 : My dog
2 : My snake
3 : Lucky cat
4 : My fat cat
5 : My cat has none
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to ensure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/fgetclinesimple dat/fleascatsdogs.txt
==5900== Memcheck, a memory error detector
==5900== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==5900== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==5900== Command: ./bin/fgetclinesimple dat/fleascatsdogs.txt
==5900==
All lines:
line : text
1 : My dog
2 : My fat cat
3 : My snake
4 : My dog has fleas
5 : My cat has none
6 : Lucky cat
7 : My snake has scales
5 shortest lines:
line : text
1 : My dog
2 : My snake
3 : Lucky cat
4 : My fat cat
5 : My cat has none
==5900==
==5900== HEAP SUMMARY:
==5900== in use at exit: 0 bytes in 0 blocks
==5900== total heap usage: 21 allocs, 21 frees, 7,938 bytes allocated
==5900==
==5900== All heap blocks were freed -- no leaks are possible
==5900==
==5900== For counts of detected and suppressed errors, rerun with: -v
==5900== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
There is a lot here, and as with any "how do it do X?" question, the devil is always in the detail, the proper use of each function, the proper validation of each input or allocation/reallocation. Each part is just as important as the other to ensure your code does what you need it to do -- in a defined way. Look things over, take your time to digest the parts, and let me know if you have further questions.
If you are using Linux you can try to use getline instead of fgetc and fgets because getline takes care of memory allocation.
Example:
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
FILE *fp;
char *line = NULL;
size_t len = 0;
ssize_t read;
if (argc != 2)
{
printf("usage: rf <filename>\n");
exit(EXIT_FAILURE);
}
fp = fopen(argv[1], "r");
if (fp == NULL)
{
perror("fopen");
exit(EXIT_FAILURE);
}
while ((read = getline(&line, &len, fp)) != -1) {
printf("Retrieved line of length %zu :\n", read);
printf("%s", line);
}
free(line);
exit(EXIT_SUCCESS);
}
How do I copy the values in an array of char pointers into another array? I am really lost please help me out. This is what I have so far.
char **copy_values(char **values) {
char **val;
//do something
return val;
}
To fully copy the pointers and values (what they point to), you need to know three things:
How many pointers do I have?
Do I need to also copy the things they point to?; and if so
How many pointers do I have that point to something I need to copy.
When you declare a pointer to pointer to char, it is simply an uninitialized pointer. To provide the ability to address more than a single address, you then allocate some needed number of pointers (say MAXPTRS). For instance:
#define MAXPTRS 32
...
char **values = NULL;
values = malloc (MAXPTRS * sizeof *values); /* allocate MAXPTRS pointers */
if (values == NULL) { /* validate/handle error */
perror ("malloc-values");
/* handle error */
}
You now have MAXPTRS (32) pointers to work with. If you then allocate storage for say 8 of them, e.g.
size_t count = 0;
while (fgets (buf, sizeof buf, stdin)) {
size_t len = strlen (buf);
values[count] = malloc (len + 1);
if (!values[count]) {
perror ("malloc-values[count]");
break;
}
memcpy (values[count++], buf, len + 1);
}
So at this point in the example, you have 32 pointers allocated and 8 of those pointers pointing to blocks of memory holding strings read from stdin.
If you then want to copy the entire structure, you must not only copy the original 32 pointers, but also the 8 blocks of memory the initialized pointers point to. If you fail to copy the blocks of memory pointed to and simply assign pointers (e.g. val[i] = values[i]), then any changes to the original, e.g. values[2] will automatically be reflected in val[2] (this may, or may not, be what you want)
To fully copy your values pointer to pointer to char and return val containing the same number of pointers with a copy of each string allocated and contained in the original values array of pointers, you would need a "deep copy" where you copy not only the pointers, but duplicate the contents as well, e.g.
char **copystrings (const char **values, size_t nptrs, size_t filled)
{
char **val = NULL;
val = malloc (nptrs * sizeof *val); /* allocate nptrs pointers */
if (!val) { /* validate */
perror ("malloc-val");
return NULL;
}
for (size_t i = 0; i < filled; i++) { /* loop over each filled ptr */
size_t len = strlen (values[i]); /* get length of string */
val[i] = malloc (len + 1) /* allocate storare val[i] */
if (!val[i]) { /* validate */
perror ("malloc-val[i]");
break;
}
memcpy (val[i], values[i], len + 1); /* copy to val[i] */
}
return val; /* return val (may contain less than filled allocated) */
}
(note: not compiled, also it would be advisable to pass filled as a pointer and update filled with the value of i before returning to provide a means of validating that all blocks of memory for filled pointers were duplicated. Otherwise, a malloc failure of val[i] would result in less than filled being allocated and copied and no way to tell -- unless you used calloc to zero the new memory when you allocate for val)
A deep copy allows you to modify the copied values without altering the data at the addresses pointed to by the pointers in values. If you are not modifying the data (e.g. you simply want to sort the pointers with qsort, then there is no need for a "deep copy" and you need only allocate for the filled number of pointers and assign the address from values to val.
Understanding what you need and the differences in how you go about achieving it is key.
(thanks to #4386427 for catching a couple of omissions)
How do I copy the values in an array of char pointers into another array?
Something like:
char **copy_values(char **values, int size) {
char **val = malloc(size * sizeof *val);
if (val) memcpy(val, values, size * sizeof *val);
return val;
}
Remember to free the allocated memory when done with it
This question already has answers here:
Facing an error "*** glibc detected *** free(): invalid next size (fast)"
(2 answers)
Closed 8 years ago.
When I compile and run this code, I get an error. The error message is:
realloc(): invalid next size: 0x0000000002119010
The file input has about 4000 words.
I debugged it, but I can not find any problem.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#define BUF_LEN 10 //buf length
int main(int argc, char *argv[])
{
int file_d; //file descriptor
char *ct_head; //content_first_char
char *content;
ssize_t read_len = BUF_LEN; //had read length
int mem_block_count = 0;
file_d = open("input", O_RDONLY);
if (file_d < 0)
{
perror("open");
exit (1);
}
content = ct_head = (char *) malloc(sizeof(char) * BUF_LEN);
mem_block_count = 1;
while (read_len == BUF_LEN)
{
read_len = read(file_d, ct_head, BUF_LEN);
if (read_len < 0)
{
perror("read");
exit(2);
}
if (read_len == BUF_LEN)
{
ct_head = (char *)realloc(content, sizeof(char) *(++mem_block_count));
ct_head = &content[(mem_block_count-1) * BUF_LEN];
}
else
ct_head[read_len] = '\0';
}
printf("%s", content);
close(file_d);
free(content);
return 0;
}
I'm not sure what your problem is but these lines:
ct_head = (char *)realloc(content, sizeof(char) *(++mem_block_count));
ct_head = &content[(mem_block_count-1) * BUF_LEN];
Are very dodgy. After the first line, ct_head points to the realloced block and content points to garbage. Then the second line reads content and re-assigns ct_head - leaking the realloced memory.
I suspect you may just have memory corruption in your program?
I think
ct_head = (char *)realloc(content, sizeof(char) *(++mem_block_count));
should be:
content = (char *)realloc(content, sizeof(char) *(++mem_block_count) * BUF_LEN);
if (content == NULL)
{
// do something if the realloc fails
}
The first time you use realloc, it allocates only 2 bytes because, going into the call, sizeof(char) is 1 and mem_block_count is also 1 (and it is then pre-incremented to 2).
This means the next read will overrun the buffer it has allocated. I suspect you need to also multiply by BUF_LEN in your realloc.
Edit
I've just realised, it's even worse: straight after allocating content to 2 bytes, you set ct_head to BUF_LEN bytes beyond the start of content. This means your read overwrites an area that is totally outside the buffer.
You've already got some good answers here, but here's a little bit of advice that won't fit in a comment. In C, sizeof(char) is always 1. It is 1 by definition, so it is as redundant as using (1 * BUF_LEN). Also in C, you don't have to [and shouldn't] cast the result of malloc. Doing so can mask a fundamental error1.
If you want to allocate space depending on the size of a type, use:
ct_head = malloc(sizeof(*ct_head) * BUF_LEN);
That way, if the type of ct_head changes, you'll still be allocating enough space, without having to change all the calls to malloc and realloc.
1. If you do not include the appropriate header for malloc, then a C compiler will assume that malloc returns int, and this may cause issues on platforms where the size of an int differs from the size of a pointer type. Also, conversions from integer types to pointer types are implementation-defined.