I'm new to C and I'm reading "The C Programming Language" by K&R to learn it. I had a question about this example function appearing on pg 109 of the 2nd edition:
/* readlines: read input lines */
int readlines(char *lineptr[], int maxlines)
{
int len, nlines;
char *p, line[MAXLEN];
nlines = 0;
while ((len = getline(line, MAXLEN)) > 0)
if (nlines >= maxlines || p = alloc(len) == NULL)
return -1;
else {
line[len-1] = '\0'; /* delete newline */
strcpy(p, line);
lineptr[nlines++] = p;
}
return nlines;
}
I was wondering why *p is at all necessary here? p is allocated memory and then line is copied into it. Why can't just line be used, so at the end lineptr[nlines++] = p could be replaced by lineptr[nlines++] = line.
If you don't allocate memory for each line, you'll end up with lineptr being an array full of pointers to just the last line you read (not to mention to stack memory which is likely to be overwritten). Allocating memory for each line as you read makes the returned array make sense. As an example, let's say that line happens to get allocated on the stack at address 0x1000. If you make your suggested change, the resulting lineptr array for an 8 line file would be:
0x1000, 0x1000, 0x1000, 0x1000, 0x1000, 0x1000, 0x1000, 0x1000
Yowch! Allocating memory for each line as you read it, and then copying the line into that allocated memory is the only solution.
lineptr[nlines++] = line;
would fill lineptr with pointers to memory that is local to that function, and that memory becomes invalid as soon as the function returns. The values of all the elements of the lineptr array would all be identical and equal to line.
So the allocation is necessary here. You really need to copy the contents of line into a newly allocated memory location that persists after the function has returns.
You'd need storage for each line to be allocated somewhere. You can't just capture the value of line as you suggest because that is a local variable which will be out of scope after the function returns (and in this particular example, it's overwritten on each iteration).
You could avoid having line by doing getline directly into the elements of lineptr (which you would allocate as you go), but you cannot get rid of p.
A new chunk of memory needs to be allocated for each line. The pointer p is the only handle we have to that memory. We assign lineptr[nlines++] = p so we can reference each chunk of memory (e.g. line) as part of the array lineptr.
An assignment of the kind lineptr[nlines++] = p in c/c++, sets the address of lineptr[nlines++] to the address where p points to, no data is copied here.
so, the address of line is always the same address; so lineptr[nlines++] = line whould mean that all all lineptr[i] would point to the same address.
the worst part, after the function returns, line does no longer exist, so every lineptr[i] then points to some invalid address.
using the p allocates new memory for each line, and ensures that the address of that memory is still valid between functions (until you free it).
Okay, so let's show you how to do the same thing without char *p... I am going to slightly modify the code.
/* readlines: read input lines */
#include <string.h>
/* put that include line below #include <stdio.h> if you don't already have this
string.h defines strdup() function which we use below */
int readlines(char *lineptr[], int maxlines)
{
int len, nlines;
char line[MAXLEN];
nlines = 0;
while ((len = getline(line, MAXLEN)) > 0) {
line[len-1] = '\0'; /* delete newline */
lineptr[nlines] = strdup(line); /* allocate memory and make a copy */
if (lineptr[nlines] == NULL) {
return -1;
}
nlines++;
if (nlines >= marlines)
break;
}
return nlines;
}
This code is closest without the temporary char *p use.
The thing is that while this is functionally correct, use of the temporary variable char *p to test out all allocations and retrieval makes the code cleaner to read, easy to follow for teaching purposes. It also shows explicitly the allocation of the string memory as a separate step, which is hidden in strdup.
Related
I am creating a function to load a Hash Table and I'm getting a segmentation fault if my code looks like this
bool load(const char *dictionary)
{
// initialize vars
char *line = NULL;
size_t len = 0;
unsigned int hashed;
//open file and check it
FILE *fp = fopen(dictionary, "r");
if (fp == NULL)
{
return false;
}
while (fscanf(fp, "%s", line) != EOF)
{
//create node
node *data = malloc(sizeof(node));
//clear memory if things go south
if (data == NULL)
{
fclose(fp);
unload();
return false;
}
//put data in node
//data->word = *line;
strcpy(data->word, line);
hashed = hash(line);
hashed = hashed % N;
data->next = table[hashed];
table[hashed] = data;
dictionary_size++;
}
fclose(fp);
return true;
}
However If I replace
char *line = NULL; by char line[LENGTH + 1]; (where length is 45)
It works. What is going on aren't they "equivalent"?
When you do fscanf(fp, "%s", line) it'll try to read data into the memory pointed to by line - but char *line = NULL; does not allocate any memory.
When you do char line[LENGTH + 1]; you allocate an array of LENGTH + 1 chars.
Note that if a word in the file is longer than LENGTH your program will write out of bounds. Always use bounds checking operations.
Example:
while (fscanf(fp, "%*s", LENGTH, line) != EOF)
They are not equivalent.
In the first case char *line = NULL; you have a pointer-to-char which is initialised to NULL. When you call fscanf() it tries to write data to it and this will cause it to dereference the NULL pointer. Hence segfault.
One option to fix that would have been to allocate (malloc() and friends) the required memory first, check the pointer is not NULL (allocation failed) before using it. Then you would need to free() the resources once you no longer need the data.
In the second case char line[LENGTH +1] you have an array-of-char of size LENGTH + 1. This memory has been allocated for you on the stack (the compiler ensures this happens automatically for arrays), and the memory is only 'valid' for use during the lifetime of the function: once you return you must no longer use it. Now, when you pass the pointer to fscanf() (to the first element of the array in this case), fscanf() has a memory buffer to write to. As long as the buffer is large enough to hold the data being written this works correctly.
char *line = NULL;
Says "I want a variable named 'line' that can point to characters, but is not currently pointing to anything." The compiler will allocate memory that can hold a memory address, and will fill it with zero (or some other internal representation of "points to nothing").
char line[10];
Says "allocate memory for 10 characters, and I would like to use the name 'line' for the address of the first one". It does not allocate space to hold the memory address, because that's a constant, but it does allocate space for the characters (and does not initialize them).
Declaring a pointer as NULL doesn't allocate memory for the array. When you access the pointer, then what gets executed is reading / writing to a null pointer, which is not what you want. How fscanf works is it writes out to the buffer you sent, hence meaning that the buffer must be allocated before hand. If you want to use a pointer, then you ought to do:
char* line = malloc(LEN + 1);
When declaring as an array, then the compiler allocates memory for it, not you. This is better, in case you forget to free the memory, which the compiler won't do. Note that if you do use an array (which is a local variable in this case), it cannot be used by functions higher up on the call stack, because as I stated above, the memory gets freed upon return from the function.
My understanding:
char * c means c is pointing nothing.
When I type "Hello World", c is now pointing the first address of "Hello World".
It should print H and e, but I got "Segmentation fault: 11" error.
Can anyone please enlighten me why and how char * c = NULL; is causing an error?
Thanks in advance!
#include <stdio.h>
int main(void)
{
char * c = NULL;
gets(c);
printf("%c, %c\n", c[0], c[1]);
return 0;
}
gets doesn't allocate memory. Your pointer is pointing to NULL, which cannot be written to, so when gets tries to write the first character there, you seg fault.
The solution is:
Use a stack or global array (char c[1000];) or a pointer to dynamically allocated memory (char *c = malloc(1000);), not a NULL pointer.
Never use gets, which is intrinsically broken/insecure (it can't limit the read to match the size of the available buffer); use fgets instead.
char *c = NULL; declares the pointer c initialized to NULL. It is a pointer to nowhere.
Recall, A pointer is just a variable that holds the address to something else as its value. Where you normally think of a variable holding an immediate values, such as int a = 5;, a pointer would simply hold the address where 5 is stored in memory, e.g. int *b = &a;. Before you can use a pointer to cause data to be stored in memory -- the pointer must hold the address for (e.g. it must point to) the beginning of a valid block of memory that you have access to.
You can either provide that valid block of memory by assigning the address of an array to your pointer (where the pointer points to where the array is stored on the stack), or you can allocate a block of memory (using malloc, calloc or realloc) and assign the beginning address for that block to your pointer. (don't forget to free() what you allocate).
The simplest way is to declare a character array and then assign the address to the first element to your pointer (an array is converted to a pointer to the first element on access, so simply assigning the character array to your pointer is fine). For example with the array buf providing the storage and the pointer p holding the address of the first character in buf, you could do:
#include <stdio.h>
#include <string.h> /* for strcspn & strlen */
#define MAXC 1024 /* if you need a constant, #define one (or more) */
int main (void)
{
char buf[MAXC], /* an array of MAXC chars */
*p = buf; /* a pointer to buf */
if (fgets (p, MAXC, stdin)) { /* read line from stdin */
p[strcspn (p, "\n")] = 0; /* trim \n by overwriting with 0 */
if (strlen (p) > 1) { /* validate at least 2-chars */
printf("%c, %c\n", p[0], p[1]); /* output them */
}
}
return 0;
}
(note: strcspn above simply returns the number of character in your string up to the '\n' character allowing you to simply overwrite the '\n' included by fgets() with '\0' -- which is numerically equivalent to 0)
Example Use/Output
$ ./bin/fgetsmin
Hello
H, e
Look things over and let me know if you have further questions.
I am writing a program where the input will be taken from stdin. The first input will be an integer which says the number of strings to be read from stdin.
I just read the string character-by-character into a dynamically allocated memory and displays it once the string ends.
But when the string is larger than allocated size, I am reallocating the memory using realloc. But even if I use memcpy, the program works. Is it undefined behavior to not use memcpy? But the example Using Realloc in C does not use memcpy. So which one is the correct way to do it? And is my program shown below correct?
/* ss.c
* Gets number of input strings to be read from the stdin and displays them.
* Realloc dynamically allocated memory to get strings from stdin depending on
* the string length.
*/
#include <stdio.h>
#include <stdlib.h>
int display_mem_alloc_error();
enum {
CHUNK_SIZE = 31,
};
int display_mem_alloc_error() {
fprintf(stderr, "\nError allocating memory");
exit(1);
}
int main(int argc, char **argv) {
int numStr; //number of input strings
int curSize = CHUNK_SIZE; //currently allocated chunk size
int i = 0; //counter
int len = 0; //length of the current string
int c; //will contain a character
char *str = NULL; //will contain the input string
char *str_cp = NULL; //will point to str
char *str_tmp = NULL; //used for realloc
str = malloc(sizeof(*str) * CHUNK_SIZE);
if (str == NULL) {
display_mem_alloc_error();
}
str_cp = str; //store the reference to the allocated memory
scanf("%d\n", &numStr); //get the number of input strings
while (i != numStr) {
if (i >= 1) { //reset
str = str_cp;
len = 0;
}
c = getchar();
while (c != '\n' && c != '\r') {
*str = (char *) c;
printf("\nlen: %d -> *str: %c", len, *str);
str = str + 1;
len = len + 1;
*str = '\0';
c = getchar();
if (curSize/len == 1) {
curSize = curSize + CHUNK_SIZE;
str_tmp = realloc(str_cp, sizeof(*str_cp) * curSize);
if (str_tmp == NULL) {
display_mem_alloc_error();
}
memcpy(str_tmp, str_cp, curSize); // NB: seems to work without memcpy
printf("\nstr_tmp: %d", str_tmp);
printf("\nstr: %d", str);
printf("\nstr_cp: %d\n", str_cp);
}
}
i = i + 1;
printf("\nEntered string: %s\n", str_cp);
}
return 0;
}
/* -----------------
//input-output
gcc -o ss ss.c
./ss < in.txt
// in.txt
1
abcdefghijklmnopqrstuvwxyzabcdefghij
// output
// [..snip..]
Entered string:
abcdefghijklmnopqrstuvwxyzabcdefghij
-------------------- */
Thanks.
Your program is not quite correct. You need to remove the call to memcpy to avoid an occasional, hard to diagnose bug.
From the realloc man page
The realloc() function changes the size of the memory block pointed to
by ptr to size bytes. The contents will be unchanged in the range
from the start of the region up to the minimum of the old and new
sizes
So, you don't need to call memcpy after realloc. In fact, doing so is wrong because your previous heap cell may have been freed inside the realloc call. If it was freed, it now points to memory with unpredictable content.
C11 standard (PDF), section 7.22.3.4 paragraph 2:
The realloc function deallocates the old object pointed to by ptr and returns a pointer to a new object that has the size specified by size. The contents of the new object shall be the same as that of the old object prior to deallocation, up to the lesser of the new and old sizes. Any bytes in the new object beyond the size of the old object have indeterminate values.
So in short, the memcpy is unnecessary and indeed wrong. Wrong for two reasons:
If realloc has freed your previous memory, then you are accessing memory that is not yours.
If realloc has just enlarged your previous memory, you are giving memcpy two pointers that point to the same area. memcpy has a restrict qualifier on both its input pointers which means it is undefined behavior if they point to the same object. (Side note: memmove doesn't have this restriction)
Realloc enlarge the memory size where reserved for your string. If it is possible to enlarge it without moving the datas, those will stay in place. If it cannot, it malloc a lager memory plage, and memcpy itself the data contained in the previous memory plage.
In short, it is normal that you dont have to call memcpy after realloc.
From the man page:
The realloc() function tries to change the size of the allocation pointed
to by ptr to size, and returns ptr. If there is not enough room to
enlarge the memory allocation pointed to by ptr, realloc() creates a new
allocation, copies as much of the old data pointed to by ptr as will fit
to the new allocation, frees the old allocation, and returns a pointer to
the allocated memory. If ptr is NULL, realloc() is identical to a call
to malloc() for size bytes. If size is zero and ptr is not NULL, a new,
minimum sized object is allocated and the original object is freed. When
extending a region allocated with calloc(3), realloc(3) does not guaran-
tee that the additional memory is also zero-filled.
I'm trying reallocate more 256 bytes to buffer on each loop call. In this buffer, I will store the buffer obtained from read().
Here is my code:
#define MAX_BUFFER_SIZE 256
//....
int sockfd = socket( ... );
char *buffer;
buffer = malloc( MAX_BUFFER_SIZE );
assert(NULL != buffer);
char *tbuf = malloc(MAX_BUFFER_SIZE);
char *p = buffer;
int size = MAX_BUFFER_SIZE;
while( read(sockfd, tbuf, MAX_BUFFER_SIZE) > 0 ) {
while(*tbuf) *p++ = *tbuf++;
size = size + MAX_BUFFER_SIZE; // it is the right size for it?
buffer = realloc(buffer, size);
assert(NULL != buffer);
}
printf("%s", buffer);
free(tbuf);
free(p);
free(buffer);
close(sockfd);
But the above code returns segment fault. Where am I wrong?
These are the problems that are apparent to me:
Your realloc can modify the location to which buffer points. But you fail to modify p accordingly and it is left pointing into the previous buffer. That's clearly an error.
I see potential for another error in that the while loop need not terminate and could run off the end of the buffer. This is the most likely cause of your segmentation fault.
The way you use realloc is wrong. If the call to realloc fails then you can no longer free the original buffer. You should assign the return value of realloc to a temporary variable and check for errors before overwriting the buffer variable.
You should not call free on the pointer p. Since that is meant to point into the block owned by buffer, you call free on buffer alone.
Thing is read doesn't add a 0-terminator. So your inner while is undoubtedly stepping outside the allocated memory:
while(*tbuf) *p++ = *tbuf++;
Another problem is that you are freeing stuff you didn't receive via malloc. By the time you call free, you will have incremented both p and tbuff which you try to free.
The whole buffer allocation things looks useless as you're not actually using it anywhere.
When you use realloc on buffer, it is possible that the address of buffer is changed as a result of changing the size. Once that happens, p is no longer holding the correct address.
Also towards the end, you are freeing both p and buffer while they point to the same location. You should only free one of them.
I'm trying to read a line from a file character by character and place the characters in a string; here' my code:
char *str = "";
size_t len = 1; /* I also count the terminating character */
char temp;
while ((temp = getc(file)) != EOF)
{
str = realloc(str, ++len * sizeof(char));
str[len-2] = temp;
str[len-1] = '\0';
}
The program crashes on the realloc line. If I move that line outside of the loop or comment it out, it doesn't crash. If I'm just reading the characters and then sending them to stdout, it all works fine (ie. the file is opened correctly). Where's the problem?
You can't realloc a pointer that wasn't generated with malloc in the first place.
You also have an off-by-one error that will give you some trouble.
Change your code to:
char *str = NULL; // realloc can be called with NULL
size_t len = 1; /* I also count the terminating character */
char temp;
while ((temp = getc(file)) != EOF)
{
str = (char *)realloc(str, ++len * sizeof(char));
str[len-2] = temp;
str[len-1] = '\0';
}
Your issue is because you were calling realloc with a pointer to memory that was not allocated with either malloc or realloc which is not allowed.
From the realloc manpage:
realloc() changes the size of the memory block pointed to by ptr to size bytes.
The contents will be unchanged to the minimum of the old and new
sizes; newly allocated memory will be uninitialized. If ptr is NULL,
then the call is equivalent to malloc(size), for all values of size;
if size is equal to zero, and ptr is not NULL, then the call is
equivalent to free(ptr). Unless ptr is NULL, it must have been
returned by an earlier call to malloc(), calloc() or realloc(). If
the area pointed to was moved, a free(ptr) is done.
On a side note, you should really not grow the buffer one character at a time, but keep two counter, one for the buffer capacity, and one for the number of character used and only increase the buffer when it is full. Otherwise, your algorithm will have really poor performance.
You can't realloc a string literal. Also, reallocing every new char isn't a very efficient way of doing this. Look into getline, a gnu extension.