Coredump when parsing chapters - c

I am doing a homework assignment that reads in a book. First, a line is read in and a pointer made to point at that line. Then a paragraph function reads in lines and stores their address into a array of pointers. Now, I am on reading a chapter (a paragraph recognized by the next line being broke). It should call get_paragraph() and store the address of paragraphs until it comes to a new chapter.
A new chapter is the only time in the book where the first character in the line is not a space. I think this is were I am having problems in my code. All functions up to this point work. I hope I have provided enough information. The code compiles but core dumps when started.
I am a student and learning so please be kind. Thanks.
char*** get_chapter(FILE * infile){
int i=0;
char **chapter[10000];//an array of pointers
// Populate the array
while(chapter[i]=get_paragraph(infile)) { //get address store into array
if(!isspace(**chapter[0])){ //check to see if it is a new chapter<---problem line?
// save paragraph not used in chapter using static to put into next chapter
break;
}
i++;//increment array
}
//add the null
chapter[++i]='\0';//put a null at the end to signify end of array
//Malloc the pointer
char**(*chap) = malloc(i * sizeof(*chap));//malloc space
//Copy the array to the pointer
i=0;//reset address
while(chapter[i]){//while there are addresses in chapter
chap[i] = chapter[i++];//change addresses into chap
}
chap[i]='\0';//null to signify end of chapter
//Return the pointer
return(chap);//return pointer to array
}
For those who would rather see without comments:
char*** get_chapter(FILE * infile){
int i=0;
char **chapter[10000];
while(chapter[i]=get_paragraph(infile)) {
if(!isspace(**chapter[0])){
break;
}
i++;
}
chapter[++i]='\0';
char**(*chap) = malloc(i * sizeof(*chap));//malloc space
i=0;
while(chapter[i]){
chap[i] = chapter[i++];
}
chap[i]='\0';
return(chap);
}

Comments inline.
char*** get_chapter(FILE * infile) {
int i=0;
// This is a zero length array!
// (The comma operator returns its right-hand value.)
// Trying to modify any element here can cause havoc.
char **chapter[10,000];
while(chapter[i]=get_paragraph(infile)) {
// Do I read this right? I translate it as "if the first character of
// the first line of the first paragraph is not whitespace, we're done."
// Not the paragraph just read in -- the first paragraph. So this will exit
// immediately or else loop forever and walk off the end of the array
// of paragraphs. I think you mean **chapter[i] here.
if(!isspace(**chapter[0])){
break;
}
i++;
}
// Using pre-increment here means you leave one item in the array uninitialized
// which can also cause a fault later on. Use post-increment instead.
// Also '\0' here is the wrong sort of zero; I think you need NULL instead.
chapter[++i]='\0';
char**(*chap) = malloc(i * sizeof(*chap));
i=0;
while(chapter[i]) {
// This statement looks ambiguous to me. Referencing a variable twice and
// incrementing it in the same statement? You may end up with an off-by-one error.
chap[i] = chapter[i++];
}
// Wrong flavor of zero again.
chap[i]='\0';
return(chap);
}

Can I suggest that you use for loops instead of whiles? You need to stop if you run out of space, so you might as well use the appropriate construct.
I suspect you have a bug in this code:
while(chapter[i]=get_paragraph(infile)) {
if(!isspace(**chapter[0])){
break;
}
i++;
}
chapter[++i]='\0';
Firstly, shouldn't it be chapter[i] instead of chapter[0]? You want to know if the pointer at chapter[i] points to a space, not the first pointer in chapter. So this will probably loop indefinitely - hence the need for a for loop, so you don't just loop forever accidentally.
Secondly, you increment i at the end of the while block, and then again in the chapter[++i] assignment. i has already been incremented by the final loop execution before the while condition breaks, so it is already the correct position to use. ++i increments before yielding the value, so presumably you meant to have i++ here, so that it would increment after yielding the current value of i. Either way, it's confusing one of us as to what you mean, so maybe just put the increment on a separate line for clarity. The compiler will sort out any available optimisation.
Finally (and I might well be wrong here) why are you setting the value to '\0'? That's a null character, isn't it? But your array is of pointers. The null pointer would be 0, rather than '\0', I think. If I'm right, you might have still got away with it if '\0' yields the same set of zeroes as the null pointer.

Have you tried single stepping though it in gdb, and occasionally dumping the local variables to see the current state? It's a good way to learn. You may want to add a few extra intermediate variables that "info locals" will automatically dump as well (pointers to the current XXX, where XXX is various items in your hierarchy)
I assume a GNU environment:
% gcc -g homework.c -o hw
% gdb hw
(gdb) b 10
(gdb) r
(gdb) info locals
(gdb) n
(gdb) info locals
...
Replace "10" with a suitable line number near the beginning of the function.

Shouldn't it be:
if(!isspace(**chapter[i])){
Each chapter[i] is a pointer to a pointer to a char, this char is the first character in each chapter. So **chapter[i] represents the first character in chapter i. Using chapter[0] will only look at the first chapter.

Related

What does a while statement that only sets 2 pointers equal, do?

we are currently learning how Pointers work in C.
I have this very short code of a copymethod of Strings in C, that was given to us from a tutor. I tried to explain its function in my own words but I am unsure if I have understood it correctly and would appreciate if somebody could correct my mistakes and answer my questions about it.
void copy ( char ∗ source , char ∗ dest) {
while (∗dest++ = ∗source++);
}
"Copy is a function with 2 Paramaters source and dest, which are both pointers of type char. The function calls a while statement which sets the dereferenced dest, incremented by 1 * sizeof(char), equal to the dereferenced and incremented (by 1*sizeof(char)) source."
What exactly does the while statement do? From my understanding *dest means that I am getting the char which dest points to, is that correct? But why would a while statement only set 2 pointers equal to each other, I don't really get it.
I appreciate any help, thank you!
This is the terse way to write an implementation of strcpy.
It is equivalent to the longer, but easier to follow
while (*dest = *source) {
dest++;
source++;
}
*dest = *source copies one char from the source to the destination. The result is zero if a zero byte was copied, at which time the loop ends.
Adding the increment to the condition is possible because the pointer dereferenced is the pointer value before the increment.
First, you need to start evaluating the expressions from the inner to the outer side. An important thing is to take into consideration that the body of the loop is empty (the ; at the right parenthesis indicates that the loop is executing the null statement --do nothing--, so everything must happen in the test expression).
The expression in the while test is not a test for equity, but an assignment. The equals operator is written doubling the =, as in ==, while = is used to assign the value of the right subexpression to the variable on the left side.
The variable on the left side is *dest++. A bit complicated expression that includes a dereference operator * and an autoincrement operator ++. The ++ takes higher preference, so it is acted first: the pointer is incremented, and the value returned by this subexpression is the value of the pointer before it was incremented. This means that the value of the pointer being used is the one it had before the expression dest++ was evaluated, and the * operator states the char variable pointed to by dest. So the place where the value will be stored in the assignment is the value pointed to by dest, and the pointer will be incremented before the end of the whole expression ends --but after the value is used--. The right side of the assignment show what is going to be assigned. As the expression is the same, I will pass quickly over it. The value assigned is the one pointed to by the source variable. It is incremented once the value is taken, and the value stored in the place of dest is filled with the character of the variable pointed by src. After that, both pointers are incremented, so the next time the test expression is evaluated, the characters involved will be the next source assigned to the next destination character. So, in this point, when the loop will be finished? Well, the value of an assignment expression is precisely the value assigned to the destination target, so in this case is the character copied from source to destination. And the test will fail, when the character copied happens to be zero (or false in C terms). As you see, there's nothing left to be put in the body of the loop.
This sample is famous for being an example of how cryptic C can get to. It appears in the two editions of "The C programming language" of Kernighan & Ritchie, the inventors of the language, and is accompanied by a comment that says something more or less like this: This is a bit obscure but every programmer that is proud of being proficient at C coding must be capable of interpreting this with a bit of care. (indeed the loop variables in the source are one letter s and d, i think)

If in if condition string is given it is treated as true but what it return?

Why exactly is a string literal in an if-condition treated as true?
if("whatiamreturning")
//this is true. I want to know y?
Based on the above, what happens here?
#‎include‬<stdio.h>
void main() {
static int i;
for(;;) { //infinite loop
if(i+++"The Matrix")
// what is happening in the above line?
printf("Memento");
else
break;
}
}
if("whatiamreturning")
is equivalent to
if (1)
This is because "whatiamreturning" is a char [] that decays into a non-NULL char const* inside the if(). Any non-NULL pointer evaluates to true in the context of a boolean expression.
The line
if(i+++"The Matrix")
can be simplified to:
if( (i++) + "The Matrix")
In the first iteration of the loop, the value of i is 0. Hence, the (i++) + "The Matrix" evaluates to "The Matrix".
In the second iteration of the loop, the value of i is 1. Hence, the (i++) + "The Matrix" evaluates to "he Matrix".
However, the loop never ends and goes into the territory of undefined behavior since (i++) + "The Matrix" never evaluates to 0 and the value of i keeps on increasing.
Perhaps they meant to use:
if(i++["The Matrix"])
which will allow the expression inside if() it to be 0 after 10 iterations.
Update
If you are following somebody else's code, stay away anything else that they have written. The main function can be cleaned up to:
int main() {
char name[] = "The Matrix";
int i = 0;
for( ; name[i] != '\0'; ++i )
{
printf("Memento\n");
}
}
if(i+++"The Matrix") // what is happening here please help here to understand
This will take the value of i, add the pointer value of the location of the string "The Matrix" in memory and compare it to zero. After that it will increase the value of i by one.
It's not very useful, since the pointer value could be basically any random number (it depends on architecture, OS, etc). And thus the whole program amounts to printing Memento a random number of times (likely the same number each run though).
Perhaps you meant to write if(*(i+++"The Matrix")). That would loop 10 times until it i+"The Matrix" evaluates to the address pointing to the NUL byte at the end of the string, and *(i+"The Matrix") will thus return 0.
Btw, spaces are a nice way to make your code more readable.
It will return the address of first element of the string whatiamreturning.
Basically when you assign a string literal to a char pointer
char *p;
p = "whatiamreturning";
the assignment doesn't copy the the characters in whatiamreturning, instead it makes p point to the first character of the string and that's why string literals can be sub-scripted
char ch = "whatiamreturning"[1];
ch will will have character h now. This worked because compiler treated whatiamreturning as a char * and calculated the base address of the literal.
if(i+++"The Matrix") is equivalent to
if( i++ + "The Matrix")
or it can be rewritten as
if(&("The Matrix"[i++]))
which will be true for every i and results in an infinite loop. Ultimately, the code will suffer from undefined behavior due to integer overflow for variable i.
Why exactly is a string literal in an if-condition treated as true?
if("whatiamreturning")
The string literal "whatiamreturning" is a constant of type char[].
In nearly all contexts, including this one, arrays decay to pointers to their first element.
In a boolean context, like the condition of an if-statement, all non-zero values are true.
As the pointer points to an object, it is not the null-pointer, and thus is true.
Based on the above, what happens here?
#‎include‬<stdio.h>
void main() {
The above is your first instance of Undefined Behavior, whatever happens, it is right.
We will now pretend the error is corrected by substituting int for void.
Now, your loop:
static int i;
Static variables are default initialized, so i starts with value 0.
for(;;) { //infinite loop
if(i+++"The Matrix")
// what is happening in the above line?
printf("Memento");
else
break;
}
This loop has Undefined Behavior as well.
The condition takes i and adds it to the string literal "Memento" which decayed to a pointer like in the previous example, interpreting the resultant pointer in a boolean context, and as a side-effect incrementing i.
As long as i is no more than strlen("The Matrix")+1 on entry, everything is ok, the pointer points to an element of the string literal or one past, and the standard guarantees that's not a null pointer.
The moment it is though, all hell breaks loose because calculating such a pointer is Undefined Behavior.
Well, now that we know the loop is UB, let's ignore the loop too.
The rest of the program is:
}
Which is ok, because even though main has a return type of int, there's a special rule which states that if control reaches the end of main without executing a return-statement, an implicit return 0; is added.
Side-note: If an execution of a program encounters Undefined Behavior anywhere, the whole program is undefined, not only from that point on:
Undefined behavior can result in time travel (among other things, but time travel is the funkiest)

Arithmetic operations in IF loop

What does the below code do? I'm very confused with its working. Because I thought that the if loop runs till the range of int. But I'm confused when I try to print the value of i. Please help me out with this.
#include<stdio.h>
void main()
{
static int i;
for (;;)
if (i+++”Apple”)
printf(“Banana”);
else
break;
}
It is interpreted as i++ + "Apple". Since i is static and does not have an initializer, i++ yields 0. So the whole expression is 0 + some address or equivalent to if ("Apple").
EDIT
As Jonathan Leffler correctly notes in the comments, what I said above only applies to the first iteration. After that it will keep incrementing i and will keep printing "Banana".
I think at some point, due to overflows (if it doesn't crash) "Apple" + i will yield 0 and the loop will break. Again, I don't really know what a well-meaning compiler should do when one adds a pointer and a large number.
As Eric Postpischil commented, you can only advance the pointer until it points to one-past the allocated space. In your exxample adding 7 will advance the pointer one-past the allocated space ("Apples\0"). Adding more is undefined behavior and technically strange things can happen.
Use int main(void) instead of void main().
The expression i+++"Apple" is parsed as (i++) + "Apple"; the string literal "Apple" is converted from an expression of type "6-element array of char" to "pointer to char", and its value is the address of the first element of the array. The expression i++ evaluates to the current value of i, and as a side effect, the value in i is incremented by 1.
So, we're adding the result of the integer expression i++ to the pointer value resulting from the expression "Apple"; this gives us a new pointer value that's equal or greater than the address of "Apple". So assuming the address of the string literal "Apple" is 0x80123450, then basically we're evaluating the values
0x80123450 + 0
0x80123450 + 1
0x80123450 + 2
...
all of which should evaluate to non-zero, which causes the printf statement to be executed. The question is what happens when i++ results in an integer overflow (the behavior of which is not well defined) or the value of i+++"Apple" results in an overflow for a pointer value. It's not clear that i+++"Apple" will ever result in a 0-valued expression.
This code SHOULD Have been written like this:
char *apple = "Apple";
for(i = 0; apple[i++];)
printf("Banana");
Not only is it clearer than the code posted in the original, it is also clearer to see what it does. But I guess this came from "Look how bizarre we can write things in C". There are lots of things that are possible in C that isn't a great idea.
It is also possible to learn to balance a plate of hot food on your head for the purpose of serving yourself dinner. It doesn't make it a particularly great idea - unless you don't have hands and feet, I suppose... ;)
Edit: Except this is wrong... The equivalent is:
char *apple = "Apple";
for(i = 0; apple+i++ != NULL;)
printf("Banana");
On a 64-bit machine, that will take a while. If it finishes in reasonable time (sending output to /dev/null), I will update. It takes approximitely three minutes on my machine (AMD 3.4GHz Phenom II).

two mallocs returning same pointer value

I'm filling a structure with data from a line, the line format could be 3 different forms:
1.-"LD "(Just one word)
2.-"LD A "(Just 2 words)
3.- "LD A,B "(The second word separated by a coma).
The structure called instruccion has only the 3 pointers to point each part (mnemo, op1 and op2), but when allocating memory for the second word sometimes malloc returns the same value that was given for the first word. Here is the code with the mallocs pointed:
instruccion sepInst(char *linea){
instruccion nueva;
char *et;
while(linea[strlen(linea)-1]==32||linea[strlen(linea)-1]==9)//Eliminating spaces and tabs at the end of the line
linea[strlen(linea)-1]=0;
et=nextET(linea);//Save the direction of the next space or tab
if(*et==0){//If there is not, i save all in mnemo
nueva.mnemo=malloc(strlen(linea)+1);
strcpy(nueva.mnemo,linea);
nueva.op1=malloc(2);
nueva.op1[0]='k';nueva.op1[1]=0;//And set a "K" for op1
nueva.op2=NULL;
return nueva;
}
nueva.mnemo=malloc(et-linea+1);<-----------------------------------
strncpy(nueva.mnemo,linea,et-linea);
nueva.mnemo[et-linea]=0;printf("\nj%xj",nueva.mnemo);
linea=et;
while(*linea==9||*linea==32)//Move pointer to the second word
linea++;
if(strchr(linea,',')==NULL){//Check if there is a coma
nueva.op1=malloc(strlen(linea)+1);//Do this if there wasn't any coma
strcpy(nueva.op1,linea);
nueva.op2=NULL;
}
else{//Do this if there was a coma
nueva.op1=malloc(strchr(linea,',')-linea+1);<----------------------------------
strncpy(nueva.op1,linea,strchr(linea,',')-linea);
nueva.op1[strchr(linea,',')-linea]=0;
linea=strchr(linea,',')+1;
nueva.op2=malloc(strlen(linea)+1);
strcpy(nueva.op2,linea);printf("\n2j%xj2",nueva.op2);
}
return nueva;
}
When I print the pointers it happens to be the same number.
note: the function char *nextET(char *line) returns the direction of the first space or tab in the line, if there is not it returns the direction of the end of the line.
sepInst() is called several times in a program and only after it has been called several times it starts failing. These mallocs across all my program are giving me such a headache.
There are two main possibilities.
Either you are freeing the memory somewhere else in your program (search for calls to free or realloc). In this case the effect that you see is completely benign.
Or, you might be suffering from memory corruption, most likely a buffer overflow. The short term cure is to use a specialized tool (a memory debugger). Pick one that is available on your platform. The tool will require recompilation (relinking) and eventually tell you where exactly is your code stepping beyond previously defined buffer limits. There may be multiple offending code locations. Treat each one as a serious defect.
Once you get tired of this kind of research, learn to use the const qualifier and use it with all variable/parameter declarations where you can do it cleanly. This cannot completely prevent buffer overflows, but it will restrict them to variables intended to be writable buffers (which, for example, those involved in your question apparently are not).
On a side note, personally, I think you should work harder to call malloc less. It's a good idea for performance, and it also causes corruption less.
nueva.mnemo=malloc(strlen(linea)+1);
strcpy(nueva.mnemo,linea);
nueva.op1=malloc(2);
should be
// strlen has to traverse your string to get the length,
// so if you need it more than once, save its value.
cbLineA = strlen(linea);
// malloc for the string, and the 2 bytes you need for op1.
nueva.mnemo=malloc(cbLineA + 3);
// strcpy checks for \0 again, so use memcpy
memcpy(nueva.mnemo, linea, cbLineA);
nueva.mnemo[cbLineA] = 0;
// here we avoid a second malloc by pointing op1 to the space we left after linea
nueva.op1 = nueva.mnemo + cbLinea + 1;
Whenever you can reduce the number of mallocs by pre-calculation....do it. You are using C! This is not some higher level language that abuses the heap or does garbage collection!

String concats onto another without an assignment, why is this?

Below is a function from a program:
//read the specified file and check for the input ssn
int readfile(FILE *fptr, PERSON **rptr){
int v=0, i, j;
char n2[MAXS+1], b[1]=" ";
for(i=0; i<MAXR; i++){
j=i;
if(fscanf(fptr, "%c\n%d\n%19s %19s\n%d\n%19s\n%d\n%19s\n%19s\n%d\n%d\n%19s\n\n",
&rptr[j]->gender, &rptr[j]->ssn, rptr[j]->name, n2, &rptr[j]->age,
rptr[j]->job, &rptr[j]->income, rptr[j]->major, rptr[j]->minor,
&rptr[j]->height, &rptr[j]->weight, rptr[j]->religion)==EOF) {
i=MAXR;
}
strcat(rptr[j]->name, b);
//strcat(rptr[j]->name, n2);
if(&rptr[MAXR]->ssn==&rptr[j]->ssn)
v=j;
}
return v;
}
the commented line is like that because for some reason the array 'b' contains the string 'n2' despite an obvious lack of an assignment. This occurs before the first strcat call, but after/during the fscanf call.
it does accomplish the desired goal, but why is n2 concatenated onto the end of b, especially when b only has reserved space for 1 array element?
Here is a snippet of variable definitions after the fscanf call:
*rptr[j]->name = "Rob"
b = " Low"
n2= "Low"
It works, because you got lucky. b and n2 happened to be next to each other in memory, in the right order. C doesn't do boundary checking on arrays and will quite happily let you overflow them. So you can declare an array like this:
char someArray[1] = "lots and lots of characters";
The C compiler (certainly old ones) is going to think this is fine, even though there clearly isn't enough space in the someArray to store that many characters. I'm not sure if it's defined what it'll do in this situation (I suspect not), but on my compiler it limits the population to the size of the array, so it doesn't overflow the boundary (someArray=={'l'}).
Your situation is the same (although less extreme). char b[1] is creating an array with enough room to store 1 byte. You're putting a space in that byte, so there's no room for the null terminator. strcat, keeps copying memory until it gets to a null terminator, consequently it'll keep going until it finds one, even if it's not until the end of the next string (which is what's happening in your case).
If you had been using a C++ compiler, it would have thrown at least a warning (or more likely an error) to tell you that you were trying to put too many items into the array.
B needs to be size 2, 1 for the space, 1 for the null.

Resources