C - reading past end of file with fgetc - c

I have the weirdest thing happening, and I'm not quite sure why it's happening. Basically what I need to do is use fgetc to get the contents of a simple ASCII file byte by byte. The weird part is it worked, but then I added a few more characters and all of a sudden it added a newline that wasn't there and read past the end of the file or something. Literally all I did was
do {
temp = (char*) checked_realloc (temp, n+1);
e = fgetc(get_next_byte_argument);
temp[n] = e;
if (e != EOF)
n++;
}
while (e != EOF);
And then to check I just printed each character out
temp_size = strlen(temp)-1;
for(debug_k = 0; debug_k < temp_size; debug_k++){
printf("%c", temp[debug_k]);
}
And it outputs everything correctly except it added an extra newline that wasn't in the file. Before that, I had
temp_size = strlen(temp);
But then it ended on some unknown byte (that printed gibberish). I tried strlen(temp)-2 just in case and it worked for that particular file, but then I added an extra "a" to the end and it broke again.
I'm honestly stumped. I have no idea why it's doing this.
EDIT: checked_realloc is just realloc but with a quick check to make sure I'm not out of memory. I realize this is not the most efficient way to do this, but I'm more worried about why I seem to be magically reading in extra bytes.

A safer way to write such operation is:
memset the memory bulk before use with zeros, if you are allocating memory prior to realloc.And every time you realloc, initialize it to zero.
If you are using a memory to access strings or use string functions on that memory always ensure you are terminating that memory with a NULL byte.
do{
temp = (char*) checked_realloc (temp, n+1);//I guess you are starting n with 0?
temp[n]=0;
e = fgetc(get_next_byte_argument);
temp[n] = e;
if (e != EOF)
n++;
} while (e != EOF);
temp[n]=0;
n=0;
I guess the above code change should fix your issue. You don't need strlen -1 anymore. :)
Cheers.

It sounds like you forgot to null terminate your string. Add temp[n] = 0; just after the while.

Related

Why doesn't strcpy work?

char sentence2[10];
strncpy(sentence2, second, sizeof(sentence2)); //shouldn't I specify the sizeof(source) instead of sizeof(destination)?
sentence2[10] = '\0'; //Is this okay since strncpy does not provide the null character.
puts(sentence2);
//////////////////////////////////////////////////////////////
char *pointer = first;
for(int i =0; i < 500; i++) //Why does it crashes without this meaningless loop?!
{
printf("%c", *pointer);
if(*pointer == '\n')
putchar('\n');
pointer++;
}
So here's the problem. When I run the first part of this code, the program crashes.
However, when I add the for loop that just prints garbage values in memory locations, it does not crash but still won't strcpy properly.
Second, when using strncpy, shouldn't I specify the sizeof(source) instead of sizeof(destination) since I'm moving the bytes of the source ?
Third, It makes sense to me to add the the null terminating character after strncpy, since I've read that it doesn't add the null character on its own, but I get a warning that it's a possible out of bounds store from my pelles c IDE.
fourth and most importantly, why doesn't the simply strcpy work ?!?!
////////////////////////////////////////////////////////////////////////////////////
UPDATE:
#include <stdio.h>
#include <string.h>
void main3(void)
{
puts("\n\n-----main3 reporting for duty!------\n");
char *first = "Metal Gear";
char *second = "Suikoden";
printf("strcmp(first, first) = %d\n", strcmp(first, first)); //returns 0 when both strings are identical.
printf("strcmp(first, second) = %d\n", strcmp(first, second)); //returns a negative when the first differenet char is less in first string. (M=77 S=83)
printf("strcmp(second, first) = %d\n", strcmp(second, first)); //returns a positive when the first different char is greater in first string.(M=77 S=83)
char sentence1[10];
strcpy(sentence1, first);
puts(sentence1);
char sentence2[10];
strncpy(sentence2, second, 10); //shouldn't I specify the sizeof(source) instead of sizeof(destination).
sentence2[9] = '\0'; //Is this okay since strncpy does not provide the null character.
puts(sentence2);
char *pointer = first;
for(int i =0; i < 500; i++) //Why does it crashes without this nonsensical loop?!
{
printf("%c", *pointer);
if(*pointer == '\n')
putchar('\n');
pointer++;
}
}
This is how I teach myself to program. I write code and comment all I know about it so that
the next time I need to look up something, I just look at my own code in my files. In this one, I'm trying to learn the string library in c.
char *first = "Metal Gear";
char sentence1[10];
strcpy(sentence1, first);
This doesn't work because first has 11 characters: the ten in the string, plus the null terminator. So you would need char sentence1[11]; or more.
strncpy(sentence2, second, sizeof(sentence2));
//shouldn't I specify the sizeof(source) instead of sizeof(destination)?
No. The third argument to strncpy is supposed to be the size of the destination. The strncpy function will always write exactly that many bytes.
If you want to use strncpy you must also put a null terminator on (and there must be enough space for that terminator), unless you are sure that strlen(second) < sizeof sentence2.
Generally speaking, strncpy is almost never a good idea. If you want to put a null-terminated string into a buffer that might be too small, use snprintf.
This is how I teach myself to program.
Learning C by trial and error is not good. The problem is that if you write bad code, you may never know. It might appear to work , and then fail later on. For example it depends on what lies in memory after sentence1 as to whether your strcpy would step on any other variable's toes or not.
Learning from a book is by far and away the best idea. K&R 2 is a decent starting place if you don't have any other.
If you don't have a book, do look up online documentation for standard functions anyway. You could have learnt all this about strcpy and strncpy by reading their man pages, or their definitions in a C standard draft, etc.
Your problems start from here:
char sentence1[10];
strcpy(sentence1, first);
The number of characters in first, excluding the terminating null character, is 10. The space allocated for sentence1 has to be at least 11 for the program to behave in a predictable way. Since you have already used memory that you are not supposed to use, expecting anything to behave after that is not right.
You can fix this problem by changing
char sentence1[10];
to
char sentence1[N]; // where N > 10.
But then, you have to ask yourself. What are you trying to accomplish by allocating memory on the stack that's on the edge of being wrong? Are you trying to learn how things behave at the boundary of being wrong/right? If the answer to the second question is yes, hopefully you learned from it. If not, I hope you learned how to allocate adequate memory.
this is an array bounds write error. The indices are only 0-9
sentence2[10] = '\0';
it should be
sentence2[9] = '\0';
second, you're protecting the destination from buffer overflow, so specifying its size is appropriate.
EDIT:
Lastly, in this amazingly bad piece of code, which really isn't worth mentioning, is relevant to neither strcpy() nor strncpy(), yet seems to have earned me the disfavor of #nonsensicke, who seems to write very verbose and thoughtful posts... there are the following:
char *pointer = first;
for(int i =0; i < 500; i++)
{
printf("%c", *pointer);
if(*pointer == '\n')
putchar('\n');
pointer++;
}
Your use of int i=0 in the for loop is C99 specific. Depending on your compiler and compiler arguments, it can result in a compilation error.
for(int i =0; i < 500; i++)
better
int i = 0;
...
for(i=0;i<500;i++)
You neglect to check the return code of printf or indicate that you are deliberately ignoring it. I/O can fail after all...
printf("%c", *pointer);
better
int n = 0;
...
n = printf("%c", *pointer);
if(n!=1) { // error! }
or
(void) printf("%c", *pointer);
some folks will get onto you for not using {} with your if statements
if(*pointer == '\n') putchar('\n');
better
if(*pointer == '\n') {
putchar('\n');
}
but wait there's more... you didn't check the return code of putchar()... dang
better
unsigned char c = 0x00;
...
if(*pointer == '\n') {
c = putchar('\n');
if(c!=*pointer) // error
}
and lastly, with this nasty little loop you're basically romping through memory like a Kiwi in a Tulip field and lucky if you hit a newline. Depending on the OS (if you even have an OS), you might actually encounter some type of fault, e.g. outside your process space, maybe outside addressable RAM, etc. There's just not enough info provided to say actually, but it could happen.
My recommendation, beyond the absurdity of actually performing some type of detailed analysis on the rest of that code, would be to just remove it altogether.
Cheers!

C - Array of strings (2D array) and memory allocation gets me unwanted characters

I have a tiny problem with my assignment. The whole program is about tree data structures but I do not have problems with that.
My problem is about some basic stuff: reading strings from user input and then storing them in an array list.
char str[1000];
fgets(str, 1000, stdin);
int x = 0;
int y = 0;
int z = 0;
char **list;
list = (char**)malloc((x+1)*sizeof(char));
list[x] = (char*)malloc((y+1)*sizeof(char));
while(str[z] != '\n')
{
list[x][y] = str[z];
z++;
if(str[z] == ',')
{
x++;
y = 0;
list = (char**)realloc(list, (x+1) * sizeof(char*));
list[x] = (char*)malloc((y + 1)*sizeof(char));
z++;
if(str[z] == ' ') // Skips space after the comma
{
z++;
}
}
else if(str[z] == '\n')
{
break;
}
else
{
y++;
list[x] = (char*)realloc(list[x], (y+1)*sizeof(char));
}
}
I pass this list array into another function.
As an example, inputs could be something like
Abcde, Fghijk, Lmnop, Qrstu
and I am trying to split each of these words into the array list.
Abcde
Fghijk
Lmnop
Qrstu
When I try to output the strings I sometimes get weird, excessive characters such as upside down question marks and numbers.
printf("%s ", list[some_number]);
gets me
Fghijk¿
or
Fghijk\200
All of my program works as expected except for this minor problem which I am having trouble solving. Even with the same exact inputs the bugs may or may not appear. I am guessing it has to do with memory allocation?
Thanks for your help!
You need to put '\0' at the end of your new string.
See most of the C library functions such as printf and strlen process strings assuming \0 as the end character of all. Otherwise, they keep on reading the memory out of bounds either making a memory violation or gets some where the value 0 and stops and all the bytes in between in the memory are interpreted to their extended ascii equivalent hence you are getting such a strange behaviour.
So, allocate an extra byte for \0 character and assign it to the last byte.
Either initialize your variables to null, or as tomato said, put a null character at the end of the new string.
C lacks many of the luxuries programmers now take for granted when it comes to memory management. You're on the right path with malloc but that function only allocates memory... it doesn't clear it out. As a result, your variables will have the correct amount of space (critical for reducing memory leaks and overflow errors), but will be filled with garbage. This garbage could be anything, and in your case, it's an upside down question mark. Appropriate, don't you think?
I could be mistaken since I can't run the code myself without more information, but after your
char **list;
list = (char**)malloc((x+1)*sizeof(char));
list[x] = (char*)malloc((y+1)*sizeof(char));
statements, you'll want to do something like this:
list = NULL;
and the like to clear out the garbage.
Furthermore, you may care to use the strlen() function (contained in string.h) to figure out just how many blocks of memory you need to allocate.
Clearing out the spaces you use for variables is a good practice to get into with C. Good to see you learning it as well.

realloc missunderstanding

Can somebody explain to me, why this block of code does not work. I was looking through some of questions around but failed finding the answer. Probably because of (huge) lack of knowledge.
Thank you for any given help.
char** sentence = malloc(min);
char* temp = malloc(min2);
int i = 0;
while(i<5)
{
sentence = realloc(sentence, i+2);
scanf("%s", temp);
sentence[i] = malloc(strlen(temp));
strcpy(sentence[i], temp);
printf("%s\n", sentence[i]);
i++;
}
You forgot to account for the fact that strings have null terminators.
sentence[i] = malloc(strlen(temp));
Should be:
sentence[i] = malloc(strlen(temp)+1);
You need enough space both for the length of the string (strlen) AND also for its null-terminator.
sentence = realloc(sentence, (i+1) * sizeof(*sentence));
would make more sense: you're trying to store i+1 char*s, not i+2 bytes.
BTW, you can just replace the malloc/strlen/strcpy with:
sentence[i] = strdup(temp);
(that takes care of the nul terminator for you).
sentence = realloc(sentence, i+2);
is a common anti-pattern. If realloc returns NULL, you've just leaked sentence. Instead you need to write
temp = realloc(sentence, i+2);
if(temp == NULL)
// out of memory - do something here
sentence = temp;
To make life worse, you're using
using scanf which is a common cause of security errors
using strcpy which is a common cause of security errors
not checking the result of any of your mallocs to see if it returns NULL (if it doesn't you'll get a write-access violation)
Not adding +1 to the strlen() before calling malloc, and hence getting a 1-byte heap-overflow from the strcpy.
And using a while loop where a for loop would clearly be more appropriate.
Apart from those six security bugs, you're doing well though.

C segmentation fault errors with feof() and fgetc()

Can anyone help me solve my dilemma? When I compile my program I get no errors or warnings. When I go to actually run the executable, though, I get a segmentation error. If I'm to understand correctly, this happens because a pointer is in short being used incorrectly. I get a specific error on the feof(srcIn) line and I'm not sure why. The FILE* srcIn is never assigned a new value aside from the srcIn = fopen(argv[0], "r") value at the beginning of the program. I had originally had this solution implemented in C++ and needed to changed it to C for reasons. Anyways, in the C++ one I did essentially the same exact thing except using srcIn.eof() as the the condition and srcIn.get(something) as the reading method. and it compiled and ran without any problems.
int chara;
int line[maxLineLength+1];
void nextch(void){
const int charPerTab = 8;
if(charCounter == charLineCounter){
if(feof(srcIn)){
printf("\n");
isEOF = TRUE;
return;
}
printf("\n"); lineCounter++;
if(chara != '\0'){ printf("%c", line[charLineCounter-1]); } // first character each line after the first line will be skipped otherwise
charLineCounter = 0; charCounter = 0;
while(chara != '\n'){
chara = fgetc(srcIn);
if(chara >= ' '){
printf("%c", chara);
line[charLineCounter] = chara; charLineCounter++;
}
else if(chara == '\t'){ // add blanks to next tab
do{ printf(" "); line[charLineCounter] = ' '; charLineCounter++; }
while(charLineCounter % charPerTab != 1);
}
}
printf("\n"); line[charLineCounter] = chara; charLineCounter++; line[charLineCounter] = fgetc(srcIn); charLineCounter++;
// have to get the next character otherwise it will be skipped
}
chara = line[charCounter]; charCounter++;
}
EDIT:
I forgot to mention that I'm not even actually going into main when I get the seg fault. This leads me to believe that the executable itself has some sort of problem. gdb tells me the seg fault is happening at line:
if(feof(srcIn))
Any ideas?
I've got a haunting suspicion that your two-or-four-character indents aren't sufficient to let you see the real scope of the program; it could be as easy as #mu is too short and #Null Set point out, that you've got an argv[0] when you meant argv[1], and it could be as #Lou Franco points out and you're writing past the end of your array, but this code sure smells funny. Here's your code, run through Lindent to get larger tabs and one-statement-per-line:
int chara;
int line[maxLineLength + 1];
void nextch(void)
{
const int charPerTab = 8;
if (charCounter == charLineCounter) {
if (feof(srcIn)) {
printf("\n");
isEOF = TRUE;
return;
}
printf("\n");
lineCounter++;
if (chara != '\0') {
printf("%c", line[charLineCounter - 1]);
} // first character each line after the first line will be skipped otherwise
charLineCounter = 0;
charCounter = 0;
while (chara != '\n') {
chara = fgetc(srcIn);
if (chara >= ' ') {
printf("%c", chara);
line[charLineCounter] = chara;
charLineCounter++;
} else if (chara == '\t') { // add blanks to next tab
do {
printf(" ");
line[charLineCounter] = ' ';
charLineCounter++;
}
while (charLineCounter % charPerTab != 1);
}
}
printf("\n");
line[charLineCounter] = chara;
charLineCounter++;
line[charLineCounter] = fgetc(srcIn);
charLineCounter++;
// have to get the next character otherwise it will be skipped
}
chara = line[charCounter];
charCounter++;
}
You're checking whether or not you've read to the end of the file at the top, in an if statement, but you never check for eof again. Never. When you read from input in your while() loop, you use '\n' as your exit condition, print the output if the character is above ' ', do some tab expansion if you read a '\t', and you forgot to handle the EOF return from fgetc(3). If your input file doesn't have an '\n', then this program will probably write -1 into your line array until you segfault. If your input file does not end directly on a '\n', this program will probably write -1 into your line array until you segfault.
Most loops that read one character from an input stream and operate on it are written like this:
int c;
FILE *f = fopen("foo", "r");
if (!f) {
/* error message if appropriate */
return;
}
while ((c=fgetc(f)) != EOF) {
if (' ' < c) {
putchar(c);
line[counter++] = c;
} else if ('\t' == c) {
/* complex tab code */
} else if ('\n' == c) {
putchar('\n');
line[counter++] = c;
}
}
Check the input for EOF. Read input from only one spot, if you can. Use one table or if/else if/else if/else tree to decide what to do with your input character. It might not come natural to use the array[index++] = value; idiom at first, but it is common in C.
Feel free to steal my suggested loop format for your own code, and pop in the complex tab expansion code. It looked like you got that right, but I'm not positive on that, and I didn't want it to distract from the overall style of the loop. I think you'll find extending my code to solve your problem is easier than making yours work. (I fully expect you can, but I don't think it'd be fun to maintain.)
argv[0] is the name of your program so your fopen(argv[0], 'r') is probably failing. I'd guess that you want to open argv[1] instead. And, of course, check that the fopen succeeds before trying to use its return value.
It should probably be srcIn = fopen(argv[1], "r") instead. The 0th string parameter your main gets is normally the name of the program, and the 1st parameter is the first command line paramter you passed to the program.
It might not be in this function, but if the problem is here, I'd be most suspect of going out of bounds on line. Are you ever writing more than maxLineLength characters? You should put a check before you ever index into line.
Edit: You seem to be confused about what this error even means -- I will try to clear it up.
When you get a segmentation fault, the line that it happens on is just the line of code where it was finally detected that you have corrupted memory. It doesn't necessarily have anything to do with the real problem. What you need to do is to figure out where the corruption happened in the first place.
Very common causes:
calling free or delete on a pointer more than once
calling the wrong delete on a pointer (delete or delete[])
using an uninitialized pointer
using a pointer after free or delete was called on it
going out of bounds of an array (this is what I think you did)
casting a pointer to a wrong type
doing a reinterpret_cast where the target type cannot be reinterpreted correctly
calling functions with improper calling conventions
keeping a pointer to a temporary object
And there are many other ways.
The key to figuring this out is to
assume that your code is wrong
look for these kinds of problems by inspection in the code path (if short)
use tools that can tell you that you have these problems at the line of code where you did it
realizing that the line of code where the segmentation fault happens is not necessarily the bug.

K&R Chapter 1 - Exercise 22 solution, what do you think?

I'm learning C from the k&r as a first language, and I just wanted to ask, if you thought this exercise was being solved the right way, I'm aware that it's probably not as complete as you'd like, but I wanted views, so I'd know I'm learning C right.
Thanks
/* Exercise 1-22. Write a program to "fold" long input lines into two or
* more shorter lines, after the last non-blank character that occurs
* before then n-th column of input. Make sure your program does something
* intelligent with very long lines, and if there are no blanks or tabs
* before the specified column.
*
* ~svr
*
* [NOTE: Unfinished, but functional in a generic capacity]
* Todo:
* Handling of spaceless lines
* Handling of lines consisting entirely of whitespace
*/
#include <stdio.h>
#define FOLD 25
#define MAX 200
#define NEWLINE '\n'
#define BLANK ' '
#define DELIM 5
#define TAB '\t'
int
main(void)
{
int line = 0,
space = 0,
newls = 0,
i = 0,
c = 0,
j = 0;
char array[MAX] = {0};
while((c = getchar()) != EOF) {
++line;
if(c == NEWLINE)
++newls;
if((FOLD - line) < DELIM) {
if(c == BLANK) {
if(newls > 0) {
c = BLANK;
newls = 0;
}
else
c = NEWLINE;
line = 0;
}
}
array[i++] = c;
}
for(line = 0; line < i; line++) {
if(array[0] == NEWLINE)
;
else
printf("%c", array[line]);
}
return 0;
}
I'm sure you on the rigth track, but some pointers for readability:
comment your stuff
name the variables properly and at least give a description if you refuse
be consequent, some single-line if's you use and some you don't. (imho, always use {} so it's more readable)
the if statement in the last for-loop can be better, like
if(array[0] != NEWLINE)
{
printf("%c", array[line]);
}
That's no good IMHO.
First, it doesn't do what you were asked for. You were supposed to find the last blank after a nonblank before the output line boundary. Your program doesn't even remotely try to do it, it seems to strive for finding the first blank after (margin - 5) characters (where did the 5 came from? what if all the words had 9 letters?). However it doesn't do that either, because of your manipulation with the newls variable. Also, this:
for(line = 0; line < i; line++) {
if(array[0] == NEWLINE)
;
else
printf("%c", array[line]);
}
is probably wrong, because you check for a condition that never changes throughout the loop.
And, last but not least, storing the whole file in a fixed-size buffer is not good, because of two reasons:
the buffer is bound to overflow on large files
even if it would never overflow, people still wouldn't like you for storing eg. a gigabyte file in memory just to cut it into 25-character chunks
I think you should start again, rethink your algorithm (incl. corner cases), and only after that, start coding. I suggest you:
process the file line-by-line (meaning output lines)
store the line in a buffer big enough to hold the largest output line
search for the character you'll break at in the buffer
then print it (hint: you can terminate the string with '\0' and print with printf("%s", ...)), copy what you didn't print to the start of the buffer, proceed from that
An obvious problem is that you statically allocate 'array' and never check the index limits while accessing it. Buffer overflow waiting to happen. In fact, you never reset the i variable within the first loop, so I'm kinda confused about how the program is supposed to work. It seems that you're storing the complete input in memory before printing it word-wrapped?
So, suggestions: merge the two loops together and print the output for each line that you have completed. Then you can re-use the array for the next line.
Oh, and better variable names and some comments. I have no idea what 'DELIM' is supposed to do.
It looks (without testing) like it could work, but it seems kind of complicated.
Here's some pseudocode for my first thought
const int MAXLINE = ?? — maximum line length parameter
int chrIdx = 0 — index of the current character being considered
int cand = -1 — "candidate index", Set to a potential break character
char linebuf[bufsiz]
int lineIdx = 0 — index into the output line
char buffer[bufsiz] — a character buffer
read input into buffer
for ix = 0 to bufsiz -1
do
if buffer[ix] == ' ' then
cand = ix
fi
linebuf[lineIdx] = buffer[ix]
lineIdx += 1
if lineIdx >= MAXLINE then
linebuf[cand] = NULL — end the string
print linebuf
do something to move remnants to front of line (memmove?)
fi
od
It's late and I just had a belt, so there may be flaws, but it shows the general idea — load a buffer, and copy the contents of the buffer to a line buffer, keeping track of the possible break points. When you get close to the end, use the breakpoint.

Resources