(C) realloc array modifies data _pointed_ by items [closed] - c

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
(C) realloc array modifies data pointed by items
Hello,
A nice weird bug I feel like sharing ;-) Requires some preliminary explanations:
First, I have a type of strings PString which hold their size (and a hash value), followed by a flexible array member with the bytes. Here is the type and kind of constructor (the printfl statement at the end is debug):
typedef struct {
size_t size;
uint hash;
char bytes[];
} PString;
// offset from start of pstring struct to start of data bytes:
static const size_t PSTRING_OFFSET = sizeof(size_t) + sizeof(uint);
PString * pstring_struct (string str, size_t size, uint hash) {
// memory zone
char *mem = malloc(PSTRING_OFFSET + size * sizeof(char));
check_mem(mem);
// string data bytes:
memcpy(mem + PSTRING_OFFSET, str, size);
mem[PSTRING_OFFSET + size] = NUL;
// pstring struct:
PString * pstr = (PString *) mem;
pstr->size = size;
pstr->hash = hash;
printfl("*** str:'%s' (%u) --> pstr:'%s' (%u) 0x%X",
str, size, pstr->bytes, pstr->size, pstr); ///////////////////////
return pstr;
}
[Any comment on this construction welcome: I'm not sure at all to do things right, here. It's the first time I use flexible array members, and I could not find exemples of using them in allocated structs.]
Second, those pstrings are stored in a string pool, meaning a set implemented as hash table. As usual, "buckets" for collisions (after hash & modulo) are plain linked lists of cells, each holding a pstring pointer and a pointer to next cell. The only special detail is that the cells themselves are stored in an array, instead of beeing allocated anywhere on the heap [1]. Hope the picture is clear. Here is the definition of Cell:
typedef struct SCell {
PString * pstr;
struct SCell * next;
} Cell;
All seemed to work fine, including a battery of tests of the pool itself. Now, when testing a pstring routine (search), I noticed a string changed. After some research, I finally guessed the problem is related to pool growing, and endly could reduce the issue exactly around the growing of the array of cells (so, well before redistributing cells into lists). Here is the lines of debug prints around this growing, with copy of the show_pool routine producing the output (just shows the strings), and the output itself:
static void pool_grow (StringPool * pool, uint n_new) {
...
// Grow arrays:
show_pool(pool); /////////////////////
pool->cells = realloc(pool->cells, pool->n_cells * sizeof(Cell));
check_mem(pool->cells);
show_pool(pool); ////////////////////
...
static void show_pool (StringPool * pool) {
if (pool->n == 0) {
printfl("{}");
return;
}
printf("pool : {\"%s\"", pool->cells[0].pstr->bytes);
PString * pstr;
uint i;
for (i = 1; i < pool->n; i++) {
pstr = pool->cells[i].pstr;
printf(", \"%s\"", pstr->bytes);
}
printl("}");
}
// output:
pool : {"", "abc", "b", "abcXXXabcXXX"}
pool : {"", "abc", "b", "abcXXXabcXXXI"}
As you can see, the last string stored has an additional byte 'I'. Since in the meanwhile I'm just calling realloc, I find myself a bit blocked for further debugging; and thinking hard does not help in throwing light on this mystery. (Note that cells just hold pstring pointers, so how can growing the array of cells alter the string bytes?) Also, I'm bluffed by the fact there seems to be a quite convenient NUL just after the mysterious 'I', since printf halts there.
Thank you.
Can you help?
[1] There is no special reason for doing that here, with a string pool. I usually do that to get for free an ordered set or map, and in addition locality of reference. (The only overhead is that the array of cells must grow in addition to the array of buckets, but one can reduce the number of growings by predimensioning.)

Since size doesn't include the null terminator,
mem[PSTRING_OFFSET + size] = NUL;
is invalid. Every other issue stems from this.

Related

C - Append strings until end of allocated memory

Let's consider following piece of code:
int len = 100;
char *buf = (char*)malloc(sizeof(char)*len);
printf("Appended: %s\n",struct_to_string(some_struct,buf,len));
Someone allocated amount of memory in order to get it filled with string data. The problem is that string data taken from some_struct could be ANY length. So what i want to achieve is to make struct_to_string function do the following:
Do not allocate any memory that goes outside (so, buf has to be allocated outside of the function, and passed)
Inside the struct_to_string I want to do something like:
char* struct_to_string(const struct type* some_struct, char* buf, int len) {
//it will be more like pseudo code to show the idea :)
char var1_name[] = "int l1";
buf += var1_name + " = " + some_struct->l1;
//when l1 is a int or some non char, I need to cast it
char var2_name[] = "bool t1";
buf += var2_name + " = " + some_struct->t1;
// buf+= (I mean appending function) should check if there is a place in a buf,
//if there is not it should fill buf with
//as many characters as possible (without writting to memory) and stop
//etc.
return buf;
}
Output should be like:
Appended: int l1 = 10 bool t1 = 20 //if there was good amount of memory allocated or
ex: Appended: int l1 = 10 bo //if there was not enough memory allocated
To sum up:
I need a function (or couple of functions) that adds given strings to the base string without overwritting base string;
do nothing when base string memory is full
I can not use C++ libraries
Another things that I could ask but are not so important right now:
Is there a way (in C) iterate through structure variable list to get their names, or at least to get their values without their names? (for example iterate through structure like through array ;d)
I do not normally use C, but for now I'm obligated to do, so I have very basic knowledge.
(sorry for my English)
Edit:
Good way to solve that problem is shown in post below: stackoverflow.com/a/2674354/2630520
I'd say all you need is the standard strncat function defined in the string.h header.
About the 'iterate through structure variable list' part, I'm not exactly sure what you mean. If your talking about iterating over the structure's members, a short answer would be : you can't introspect C structs for free.
You need to know beforehand what structure type you're using so that the compiler know at what offset in the memory it can find each member of your struct. Otherwise it's just an array of bytes like any other.
Don't mind asking if I wasn't clear enough or if you want more details.
Good luck.
So basically I did it like here: stackoverflow.com/a/2674354/2630520
int struct_to_string(const struct struct_type* struct_var, char* buf, const int len)
{
unsigned int length = 0;
unsigned int i;
length += snprintf(buf+length, len-length, "v0[%d]", struct_var->v0);
length += other_struct_to_string(struct_var->sub, buf+length, len-length);
length += snprintf(buf+length, len-length, "v2[%d]", struct_var->v2);
length += snprintf(buf+length, len-length, "v3[%d]", struct_var->v3);
....
return length;
}
snprintf writes as much as possible and discards everything left, so it was exactly what I was looking for.

What is the most efficient way to implement mutable strings in C?

I'm currently implementing a very simple JSON parser in C and I would like to be able to use mutable strings (I can do it without mutable strings, however I would like to learn the best way of doing them anyway). My current method is as follows:
char * str = calloc(0, sizeof(char));
//The following is executed in a loop
int length = strlen(str);
str = realloc(str, sizeof(char) * (length + 2));
//I had to reallocate with +2 in order to ensure I still had a zero value at the end
str[length] = newChar;
str[length + 1] = 0;
I am comfortable with this approach, however it strikes me as a little inefficient given that I am always only appending one character each time (and for the sake of argument, I'm not doing any lookaheads to find the final length of my string). The alternative would be to use a linked list:
struct linked_string
{
char character;
struct linked_string * next;
}
Then, once I've finished processing I can find the length, allocate a char * of the appropriate length, and iterate through the linked list to create my string.
However, this approach seems memory inefficient, because I have to allocate memory for both each character and the pointer to the following character. Therefore my question is two-fold:
Is creating a linked list and then a C-string faster than reallocing the C-string each time?
If so, is the gained speed worth the greater memory overhead?
The standard way for dynamic arrays, regardless of whether you store chars or something else, is to double the capacity when you grow it. (Technically, any multiple works, but doubling is easy and strikes a good balance between speed and memory.) You should also ditch the 0 terminator (add one at the end if you need to return a 0 terminated string) and keep track of the allocated size (also known as capacity) and the number of characters actually stored. Otherwise, your loop has quadratic time complexity by virtue of using strlen repeatedly (Shlemiel the painter's algorithm).
With these changes, time complexity is linear (amortized constant time per append operation) and practical performance is quite good for a variety of ugly low-level reasons.
The theoretical downside is that you use up to twice as much memory as strictly necessary, but the linked list needs at least five times as much memory for the same amount of characters (with 64 bit pointers, padding and typical malloc overhead, more like 24 or 32 times). It's not usually a problem in practice.
No, linked lists are most certainly not "faster" (how and wherever you measure such a thing). This is a terrible overhead.
If you really find that your current approach is a bottleneck, you could always allocate or reallocate your strings in sizes of powers of 2. Then you would only have to do realloc when you cross such a boundary for the total size of the char array.
I would suggest that it would be reasonable to read the entire set of text into one memory allocation, then go through and NUL terminate each string. Then count the number of strings, and make a array of pointers to each of the strings. That way you have one memory allocation for the text area, and one for the array of pointers.
Most implementations of variable-length arrays/strings/whatever have a size and a capacity. The capacity is the allocated size, and the size is what's actually used.
struct mutable_string {
char* data;
size_t capacity;
};
Allocating a new string looks like this:
#define INITIAL_CAPACITY 10
mutable_string* mutable_string_create_empty() {
mutable_string* str = malloc(sizeof(mutable_string));
if (!str) return NULL;
str->data = calloc(INITIAL_CAPACITY, 1);
if (!str->data) { free(str); return NULL; }
str->capacity = INITIAL_CAPACITY;
return str;
}
Now, any time you need to add a character to the string, you'd do this:
int mutable_string_concat_char(mutable_string* str, char chr) {
size_t len = strlen(str->data);
if (len < str->capacity) {
str->data[len] = chr;
return 1; //Success
}
size_t new_capacity = str->capacity * 2;
char* new_data = realloc(str->data, new_capacity);
if (!new_data) return 0;
str->data = new_data;
str->data[len] = chr;
str->data[len + 1] = '\0';
str->capacity = new_capacity;
}
The linked list approach is worse because:
You still need to do an allocation call every time you add a character;
It consumes a LOT more memory. This approach consumes up to sizeof(size_t) + (string_length + 1) * 2. That approach consumes string_length * sizeof(linked_string).
Generally, linked lists are less cache-friendly than arrays.

Seg Fault when Parsing a CSV Line in C [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 9 years ago.
I am working on a project in which I need to read CSV lines from a text file into my program. I was given a code skeleton, and asked to fill in functionality. I have a struct containing a variable for each type of value I am going to receive, but my char array is causing segmentation faults.
Here is an excerpt of my code.
None of the exerpt is part of the given code, this is all mine:
My error is a Segmentation Fault(Core Dumped), due to the code within the get timestamp space.
my test file contained only one line,
5, 10:00:10, 1, 997
/*
* YOUR CODE GOES HERE:
* (1) Read an input csv line from stdin
* (2) Parse csv line into appropriate fields
* (3) Take action based on input type:
* - Check-in or check-out a patient with a given ID
* - Add a new health data type for a given patient
* - Store health data in patient record or print if requested
* (4) Continue (1)-(3) until EOF
*/
/* A new struct to hold all of the values from the csv file */
typedef struct {
int iD;
char *time[MAXTIME + 1];
int value;
int type;
}csv_input;
/* Declare an instance of the struct, and assign pointers for its values */
csv_input aLine;
int *idptr;
char timeval[MAXTIME + 1];
int *valueptr;
int *typeptr;
/*Note: because the time char is already a pointer, I did not make another one for it but instead dereferenced the pointer I was given */
idptr = &aLine.iD;
int j; /* iterator variable */
for(j; j < MAXTIME; j++){
*aLine.time[j] = timeval[j];
}
valueptr = &aLine.value;
typeptr = &aLine.type;
/* Get the Patient ID */
*idptr = getchar();
printf("%c", aLine.iD); /* a test to see if my pointers worked and the correct value was read */
/*Skip the first comma */
int next;
next = getchar();
/* get the timestamp */
int i;
for(i = 0; i < MAXTIME; i++)
{
while ((next = getchar()) != ',')
{
timeval[i] = next;
//printf("%s", aLine.time[i]);
}
}
First:
int j; /* iterator variable */
for(j; j < MAXTIME; j++){
You need to set j to some value, j=0 makes sense. Without this you're accessing an array with an uninitialized value and you're going to get UB with that.
Second:
/*Note: because the time char is already a pointer,
No, time is an array of pointers to characters, there is a difference there.
This line:
*aLine.time[j] = timeval[j];
won't work because, for one thing, of your statement but instead dereference the pointer I was given is making an incorrect assumption. Yes, you were given an array of pointers, but they don't point to anything, they are uninitialized and as such you can't deference them until you initialize them to a valid non-NULL value.
I think you were trying to do something like this:
aLine.time[j] = &timeval; //set the pointer to the local static array
but that's only going to work in the local function scope. It'd be better if you malloc to your array of pointers.
char *time[MAXTIME + 1];
this is an array of pointers (pointer to char array) and not an array of chars
The crash come from this line
*aLine.time[j] = timeval[j];
Because as I said aLine.time[j] is a pointer and you have not allocated memory for this pointer before filling its content

Word Extraction C [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 9 years ago.
I want to extract the words from a file (and later, from console input), count their appearances and store them in my Word structure:
typedef struct cell{
char *info; /* word itself */
int nr; /* number of appearances of the word *
}*Word;
This structure will be allocated dynamically for as many words are contained in the file. Consider this function:
void Word_Allocation (Word* a) /* The function that allocates space for one structure */
My questions are:
How do I correctly open a file and read it line by line?
How do I correctly store words and number of appearances in my structure?
As for file io, this is the basics.
As for the algorithm, since you are not using C++, so map is not available which is trivial for this problem. A straightforward solution in C might be:
Allocated an array of cell and read in words
sort the array on char *info.
count
Your allocator function should return a Word* and receive a size to allocate for the word itself. Something like this, perhaps:
Word * Word_Allocation (size_t size) {
Word *w = malloc(sizeof(*w));
if (w) w->info = malloc(size);
if (!w->info)
{
free(w);
w = NULL;
}
return w;
}
You can read a word at a time with:
#define STR(x) #x
enum {MAX_BUF = 100};
char buf[MAX_BUF];
fscanf(infile, "%" STR(MAX_BUF) "s", buf);
And then strlen(buf)+1 is the size to pass to Word_Allocation. Or you can pass buf and have Word_Allocation call strlen and copy the data over.

Seeding Arrays With Randomly Generated Numbers and Sorting [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
I'm trying to practice my C by creating arrays of equal size (determined by the user's input) and populating them with random numbers. I'm running into some issues. First, my creation and allocation of the arrays:
srandom((int)time(NULL)); /* initialize random seed for later use*/
unsigned long arraySize = strtoul(argv[1], NULL, 0);
int *originalArray = malloc(arraySize * sizeof(int));
int *ascendingOrderArray = malloc(arraySize * sizeof(int));
int *descendingOrderArray = malloc(arraySize * sizeof(int));
Next, the loop wherein I populate the arrays with random numbers:
int i = 0;
int randInt;
/* Seed each array with identical randomly generated numbers */
while(i++ < arraySize){
randInt = (int)random()%100;
*originalArray++ = randInt;
printf("original array assign: %d\n", originalArray);
*ascendingOrderArray++ = originalArray;
printf("ascending array assign: %d\n", ascendingOrderArray);
*descendingOrderArray++ = originalArray;
printf("descending array assign: %d\n", descendingOrderArray);
}
Here, things get weird. I'm expecting each array to have the same number as that assigned to originalArray for each index. Yet the output I get from the printf statement does not reflect that. Moreover, the numbers are gigantic but they all have a similar base, starting with a 2 and extending for many digits. Should that be happening?
Finally, when I'm done with everything and do a little inconsequential printing and formatting, I try to free my arrays:
free(originalArray);
originalArray = NULL;
free(ascendingOrderArray);
ascendingOrderArray = NULL;
free(descendingOrderArray);
descendingOrderArray = NULL;
return 0;
But I'm getting a pointer being freed was not allocated error.
I'm sure I'm screwing up on multiple levels here, but the question is: where?
You have the same fundamental problem with all 3 arrays, but let's look at just one. Here you allocate it:
int *originalArray = malloc(arraySize * sizeof(int));
So the variable originalArray points to the block of allocated memory. Then you start using it to assign to the array:
*originalArray++ = randInt;
This is bad. You just incremented originalArray, and now have no pointer to the beginning of the memory block. What you should do is
originalArray[i] = randInt;
printf("original array assign: %d\n", originalArray[i]);
This way you're using the index i to say where in the array the number should go, and originalArray itself is still the pointer to the block of allocated memory, i.e. the beginning of your array.
Since originalArray is unchanged through all of this, when you
free(originalArray);
then originalArray is still the value that was returned from malloc, which is what free wants to get back.
Summary: you should always free the exact pointer that malloc returns, and you should never discard/change the value of that pointer.

Resources