File Reading in C, Random Error - c

Thank you everybody so far for your input and advice!
Additionally:
After testing and toying further, it seems individual calls to FileReader succeed. But calling FileReader multiple times (these might be separate versions of FileReader) causes the issue to occur.
End Add
Hello,
I have a very unusual problem [please read this fully: it's important] (Code::Blocks compiler, Windows Vista Home) [no replicable code] with the C File Reading functions (fread, fgetc). Now, normally, the File Reading functions load up the data correctly to a self-allocating and self-deallocating string (and it's not the string's issue), but this is where it gets bizarre (and where Quantum Physics fits in):
An error catching statement reports that EOF occurred too early (IE inside the comments section at the start of the text file it's loading). Printing out the string [after it's loaded] reports that indeed, it's too short (24 chars) (but it has enough space to fit it [~400] and no allocation issues). The fgetc loop iterator reports it's terminating at just 24 (the file is roughly 300 chars long) with an EOF: This is where it goes whacky:
Temporarily checking Read->_base reports the entire (~300) chars are loaded - no EOF at 24. Perplexed, [given it's an fgetc loop] I added a printf to display each char [as a %d so I could spot the -1 EOF] at every step so I could see what it was doing, and modified it so it was a single char. It loops fine, reaching the ~300 mark instead of 24 - but freezes up randomly moments later. BUT, when I removed printf, it terminated at 24 again and got caught by the error-catching statement.
Summary:
So, basically: I have a bug that is affected by the 'Observer Effect' out of quantum physics: When I try to observe the chars I get from fgetc via printf, the problem (early EOF termination at 24) disappears, but when I stop viewing it, the error-catch statement reports early termination.
The more bizarre thing is, this isn't the first time it's occurred. Fread had a similar problem, and I was unable to figure out why, and replaced it with the fgetc loop.
[Code can't really be supplied as the code base is 5 headers in size].
Snippet:
int X = 0;
int C = 0;
int I = 0;
while(Copy.Array[X] != EOF)
{
//Copy.Array[X] = fgetc(Read);
C = fgetc(Read);
Copy.Array[X] = C;
printf("%d %c\n",C,C); //Remove/add this as necessary
if(C == EOF){break;}
X++;
}
Side-Note: Breaking it down into the simplest format does not reproduce the error.

This is the oldest error in the book, kind of.
You can't use a variable of type char to read characters (!), since the EOF constant doesn't fit.
You need:
int C;
Also, the while condition looks scary, you are incrementing X in the loop, then checking the (new) position, is that properly initialized? You don't show how Copy.Array is set up before starting the loop.
I would suggest removing that altogether, it's very strange code.
In fact, I don't understand why you loop reading single characters at all, why not just use fread() to read as much as you need?

Firstly, unwind's answer is a valid point although I'm not sure whether it explains the issues you are seeing.
Secondly,
printf("%d %c\n",C,C); //Remove/add this as necessary
might be a problem. The %d and %c format specifiers expect an int to be the parameter, you are only passing a char. Depending on your compiler, this might mean that they are too small.
This is what I think the problem is:
How are you allocating Copy.Array? Are you making sure all its elements are zeroed before you start? If you malloc it (malloc just leaves whatever garbage was in the memory it returns) and an element just happens to contain 0xFF, your loop will exit prematurely because your while condition tests Copy.Array[X] before you have placed a character in that location.
This is one of the few cases where I allow myself to put an assignment in a condition because the pattern
int c;
while ((c = fgetc(fileStream)) != EOF)
{
doSomethingWithC(c);
}
is really common
Edit
Just read your "Additionally" comment. I think it is highly likely you are overrunning your output buffer. I think you should change your code to something like:
int X = 0; int C = 0; int I = 0;
while(X < arraySize && (C = fgetc(Read)) != EOF)
{
Copy.Array[X] = C;
printf("%d %c\n", (int)C, (int)C);
X++;
}
printf("\n");
Note that I am assuming that you have a variable called arraySize that is set to the number of characters you can write to the array without overrunning it. Note also, I am not writing the EOF to your array.

You probably have some heap corruption going on. Without seeing code it's impossible to say.

Not sure if this is your error but this code:
C = fgetc(Read);
Copy.Array[X] = C;
if(C == EOF){break;}
Means you are adding the EOF value into your array - I'm pretty sure you don't want to do that, especially as your array is presumably char and EOF is int, so you'll actually end up with some other value in there (which could mess up later loops etc).
Instead I suggest you change the order so C is only put in the array once you know it is not EOF:
C = fgetc(Read);
if(C == EOF){break;}
Copy.Array[X] = C;

Whilst this isn't what I'd call a 'complete' answer (as the bug remains), this does solve the 'observer effect' element: I found, for some reason, printf was somehow 'fixing' the code, and using std::cout seemed to (well, I can't say 'fix' the problem) prevent the observer effect happening. That is to say, use std::cout instead of printf (as printf is the origin of the observer effect).
It seems to me that printf does something in memory on a lower level that seems to partially correct what does indeed seem to be a memory allocation error.

Related

Runtime error when using char as int to scanf and printf

int main()
{
char ch = 0;
int c = 0;
scanf("%d", &ch);
printf("%c", ch);
}
I can input the ASCII number and output the correct character. But the program crashes at the end of the code, the } sign.
Using Visual Studio 2019
Run-Time Check Failure #2 - Stack around the variable 'ch' was corrupted.
Since scanf has non-existent type safety, it can't know the type of parameters passed (one reason why the format string is there in the first place).
The stack corruption is caused by you lying to the compiler here: scanf("%d", &ch);. You tell it "trust me, ch is an int". But it isn't. So scanf tries to store 4 bytes in an area where only 1 byte is allocated and you get a crash.
Decent compilers can warn for incorrect format specifiers.
There are multiple problems in your code:
missing #include <stdio.h> ;
unused variable c ;
type mismatch passing a char variable address when scanf() expects an int * for the %d conversion specifier ;
missing test on scanf() return value ;
missing newline at the end of the output ;
missing return 0; (implied as of c99, but sloppy).
Here is a modified version:
#include <stdio.h>
int main() {
int ch;
if (scanf("%d", &ch) == 1) {
printf("%c\n", ch);
}
return 0;
}
I guess your question was not, "What did I do wrong?", or "Why didn't it work?".
I guess your question is, "Why did it work?", or, "Why did it print an error message after seeming to work correctly?"
There are lots of mistakes you can make which will corrupt the stack. This is one of them. Much of the time, these errors are rather inscrutable: If the memory corruption is severe enough that the memory management hardware can detect it, you may get a generic message like "Segmentation violation" at the instant the bad access occurs, but if not, if the damage isn't bad enough to cause any overt problems, your program may seem to work as you expected, despite the error.
It would be prohibitively expensive to perform explicit tests (in software) to check the stack for damage during every operation. Therefore, no attempt is made to do so, and the primary responsibility is placed on you, the programmer, not to do obviously wrong things like telling scanf to store an int in a char-sized box.
In this case, your system did make a check for stack damage, but as some kind of a one-time operation, only after the main function had returned. That's why the scanf and the printf seemed to work correctly, and the error message seemed to coincide with the closing } at the end of main.
Echoing an analogy I made in a comment, this is sort of like having a policeman write you a ticket for having done something dangerous that did not cause an accident. Why not write the ticket as you are doing the dangerous thing? Why write the ticket after the fact at all, since your dangerous behavior didn't cause an accident? Well, because sometimes that's just the way the world works.
I think you are trying to input an integer and print its char value so in my opinion this is what you are trying to do:
#include <iostream>
using namespace std;
int main()
{
int c = 0;
scanf("%d", &c);
printf("%c", c);
return 0;
}
Note: This is a C++ code but the syntax inside main function will work for C language as well

Segmentation fault as a result of using fgets() to read lines from a file into a struct array

I am currently learning C and need a little bit of help with my code.
Say there is a file called "books.txt" that contains the names of several books each on a new line of the file. I am trying to grab the names of each book to use for the rest of my program.
To do this I have created the following struck:
struct bookData { // This is my struct to encapsulate book information
char name[50]; // Name of book
// Other struct variable
// Other struct variable
};
Now I need to get the names of each book and put them into a struct array. Below is how I've gone about doing so.
struct bookData booksList[numBooks]; // numBooks is the number of books in "books.txt"
int i;
for(i = 0; i < numBooks; i++) {
fgets(booksList[i].name, 50, books);
// Books is the "books.txt" file that was opened for reading
}
When I run this code I encounter a Segmentation fault. I believe the problem lies with the use of the for-loop. However I am not sure how to correct this problem or even why the loop is causing the Segmentation fault to occur. When I simply place the line,
fgets(booksList[0].name, 50, books);
without the for-loop, no error occurs and the code runs and prints the name of the book just fine.
I am trying to understand why an error is occurring in my code. I would be very grateful if anyone could give me advice on how to fix the error. Thank you in advance for taking the time to read/answer my question!
EDIT: numBooks is essentially the number of lines in the "books.txt" files. Which translates into the number of books for this particular problem. numBooks was calculated with the following code:
char c;
int numBooks;
while((c = fgetc(books)) != EOF) {
if(c == '\n') {
numBooks++;
}
}
EDIT#2: Thank you everyone for your help!!!
The following code is erroneous, and could be reasonably expected to cause problems:
char c;
int numBooks;
while((c = fgetc(books)) != EOF) {
if(c == '\n') {
numBooks++;
}
}
Firstly, numBooks is uninitialised, and later use of that is likely to invoke undefined behaviour.
Secondly, though much less likely to cause problems on most systems, fgetc returns int, which typically has a wider domain (can represent more values than unsigned char). This is done for a reason. Any actual character values returned by fgetc will be as an unsigned char (i.e. positive only) value. Failure will cause EOF (negative only). This means fgetc could typically return one of 257 values, and by converting straight to char you're discarding one of those values: the error-handling one. In other words, you can no longer tell if fgetc succeeded or not. What happens when you reach EOF? You convert it to a char, treat it as a character value (when it isn't) and then try again..? Wrong answer!
In summary, fgetc returns int, so store the return value in int...
Another problem arises when numBooks reaches INT_MAX, and numBooks++; will cause an overflow. Technically this is undefined behaviour, and could theoretically cause segmentation faults... but I've never personally seen that. Nonetheless, you should probably use an unsigned type, as it wouldn't make sense to have a negative number of entries in a file, would it?
Come to think about it, if numBooks in struct bookData booksList[numBooks]; were negative (or a large enough number), you might start to see segmentation violations when you access the higher elements.
In summary: Use an unsigned type when you only expect to see positive numbers, and use dynamic allocation (e.g. malloc) for large arrays.
Note that this doesn't cover all possibilities, as you haven't provided an MCVE so it's not possible/practical to do so; there's a big possibility that your segfault is caused by other code which you've not provided. Please let me know if you update this question, so I can update this answer and keep the world spinning :)

How to handle array bounds out in C

Is there any way to handle error index out of bounds in C
i just want to to know, please explain it in context of this example.
if i enter a string more than 20 char i get * stack smashing detected *: ./upper1 terminated
Aborted (core dumped)
main()
{
char st[20];
int i;
/* accept a string */
printf("Enter a string : ");
gets(st);
/* display it in upper case */
for ( i = 0 ; st[i] != '\0'; i++)
if ( st[i] >= 'a' && st[i] <= 'z' )
putchar( st[i] - 32);
else
putchar( st[i]);
}
I want to handle those and stop them and display a custom message as done in Java's Exception Handling. Is it possible ? If yes how
Thanks in advance
To answer the original question: there is no way to handle implicitly out-of-bound array indexes in C. You should add that check explicitly in your code or you should prove (or at least be absolutely sure) that it does not happen. Beware of buffer overflow and other undefined behavior, it can hurt a lot.
Remember that C arrays don't "know" their size at runtime. You should know and manage that size, especially when passing arrays (which become decayed into pointers). Read also about flexible array members in struct-s (like here).
BTW, your code is poor taste. First, the char st[20]; is really too small these days: an input line can have really a hundred of characters (I often have terminal emulators wider than 80 columns). So make it e.g.
char st[128];
Then, as every one told you, gets(3) is dangerous, it is documented as "Never use this function". Take the habit of reading the documentation of every function that you dare use.
I would suggest to always clear such a string buffer with
memset (st, 0, sizeof st);
You should at the very least use fgets(3), but read the documentation first. You'll need to handle the failure case.
Also, your conversion to upper-case is specific to ASCII (and some other encodings). It won't work on old EBCDIC machine. And it is unreadable. So use isalpha(3) to detect letters (in ASCII or other single-byte encoding); but in UTF-8 it is more complex, since some letters -eg cyrillic ones- are encoded on several bytes). My family name (СТАРЫНКЕВИЧ when spelt in Russian) contains an Ы -which is a single letter called yery - whose UTF-8 encoding for the capital letter is 0xD0 0xAB on two bytes. You'll need an UTF-8 library like unistring to handle these. And use toupper(3) to convert (e.g. ASCII) letters to upper-case.
Notice that your main function is wrongly defined. It should return an int and preferably be declared as int main(int argc, char**argv).
At last, on Posix systems, the "right" way to read a line is to use the getline(3) function. It can read a line as wide as permitted by system resources (so it might read a line of a million characters on my machine). See this answer.
Regarding exceptions, C don't really have these (so most programmers take the habit to have functions giving some error code). However, for non-local jumps consider setjmp(3) to be used with great caution. (In C++, you have exceptions and they are related to destructors).
Don't forget to compile with all warnings and debug info (e.g. with gcc -Wall -g if using GCC). You absolutely need to learn how to use the debugger (e.g. gdb) and you also should use a memory leak detector like valgrind.
Yes, you must use fgets() instead of gets(). In fact, gets() is officially deprecated and should never, ever be used, because it is impossible to use safely as you discovered.
Though its not directly possible to detect that the user has written out of bounds it, we can add some logic to make sure to throw an error without crashing.
int main (int argc, char **argv)
{
char user_input [USERINPUT_MAX];
for (int i = 0; i < USERINPUT_MAX; ++i)
{
// read the character
// check for enter key, if enter break out of loop after adding null at end
// if not enter,store it in array
}
if (i == USERINPUT_MAX)
{
printf ("you have exceeded the character range");
}
}
I guess you get the idea of how to handle such situations from user input.

Flushing stdin after every input - which approach is not buggy?

After Mark Lakata pointed out that the garbage isn't properly defined in my question I came up with this. I'll keep this updated to avoid confusions.
I am trying to get a function that I can call before a prompt for user input like printf("Enter your choice:); followed a scanf and be sure that only the things entered after the prompt would be scanned in by scanf as valid input.
As far as I can understand the function that is needed is something that flushes standard input completely. That is what I want. So for the purpose of this function the "garbage" is everything in user input i.e. the whole user input before that user prompt.
While using scanf() in C there is always the problem of extra input lying in the input buffer. So I was looking for a function that I call after every scanf call to remedy this problem. I used this, this, this and this to get these answers
//First approach
scanf("%*[^\n]\n");
//2ndapproach
scanf("%*[^\n]%*c");
//3rd approach
int c;
while((c = getchar()) != EOF)
if (c == '\n')
break;
All three are working as far as I could find by hit-and-trial and going by the references. But before using any of these in all of my codes I wanted to know whether any of these have any bugs?
EDIT:
Thanks to Mark Lakata for one bug in 3rd. I corrected it in the question.
EDIT2:
After Jerry Coffin answered I tested the 1st 2 approaches using this program in code:blocks IDE 12.11 using GNU GCC Compiler(Version not stated in the compiler settings).
#include<stdio.h>
int main()
{
int x = 3; //Some arbitrary value
//1st one
scanf("%*[^\n]\n");
scanf("%d", &x);
printf("%d\n", x);
x = 3;
//2nd one
scanf("%*[^\n]%*c");
scanf("%d", &x);
printf("%d", x);
}
I used the following 2 inputs
First Test Input (2 Newlines but no spaces in the middle of garbage input)
abhabdjasxd
23
bbhvdahdbkajdnalkalkd
46
For the first I got the following output by the printf statements
23
46
i.e. both codes worked properly.
Second Test input: (2 Newlines with spaces in the middle of garbage input)
hahasjbas asasadlk
23
manbdjas sadjadja a
46
For the second I got the following output by the printf statements
23
3
Hence I found that the second one won't be taking care of extra garbage input whitespaces. Hence, it isn't foolproof against garbage input.
I decided to try out a 3rd test case (garbage includes newline before and after the non-whitespace character)
``
hahasjbas asasadlk
23
manbdjas sadjadja a
46
The answer was
3
3
i.e. both failed in this test case.
The first two are subtly different: they both read and ignore all the characters up to a new-line. Then the first skips all consecutive white space so after it executes, the next character you read will be non-whitespace.
The second reads and ignores characters until it encounters a new-line then reads (and discards) exactly one more character.
The difference will show up if you have (for example) double-spaced text, like:
line 1
line 2
Let's assume you read to somewhere in the middle of line 1. If you then execute the first one, the next character you read in will be the 'l' on line 2. If you execute the second, the next character you read in will be the new-line between line 1 and line 2.
As for the third, if I were going to do this at all, I'd do something like:
int ch;
while ((ch=getchar()) != EOF && ch != '\n')
;
...and yes, this does work correctly -- && forces a sequence point, so its left operand is evaluated first. Then there's a sequence point. Then, if and only if the left operand evaluated to true, it evaluates its right operand.
As for performance differences: since you're dealing with I/O to start with, there's little reasonable question that all of these will always be I/O bound. Despite its apparent complexity, scanf (and company) are usually code that's been used and carefully optimized over years of use. In this case, the hand-rolled loop may be quite a bit slower (e.g., if the code for getchar doesn't get expanded inline) or it may be about the same speed. The only way it stands any chance of being significantly faster is if the person who wrote your standard library was incompetent.
As far maintainability: IMO, anybody who claims to know C should know the scan set conversion for scanf. This is neither new nor rocket science. Anybody who doesn't know it really isn't a competent C programmer.
The first 2 examples use a feature of scanf that I didn't even know existed, and I'm sure a lot of other people didn't know. Being able to support a feature in the future is important. Even if it was a well known feature, it will be less efficient and harder to read the format string than your 3rd example.
The third example looks fine.
(edit history: I made a mistake saying that ANSI-C did not guarantee left-to-right evaluation of && and proposed a change. However, ANSI-C does guarantee left-to-right evaluation of &&. I'm not sure about K&R C, but I can't find any reference to it and no one uses it anyways...)
Many other solutions have the problem that they cause the program to hang and wait for input when there is nothing left to flush. Waiting for EOF is wrong because you don't get that until the user closes the input completely!
On Linux, the following will do a non-blocking flush:
// flush any data from the internal buffers
fflush (stdin);
// read any data from the kernel buffers
char buffer[100];
while (-1 != recv (0, buffer, 100, MSG_DONTWAIT))
{
}
The Linux man page says that fflush on stdin is non-standard, but "Most other implementations behave the same as Linux."
The MSG_DONTWAIT flag is also non-standard (it causes recv to return immediately if there is no data to be delivered).
You should use getline/getchar:
#include <stdio.h>
int main()
{
int bytes_read;
int nbytes = 100;
char *my_string;
puts ("Please enter a line of text.");
/* These 2 lines are the heart of the program. */
my_string = (char *) malloc (nbytes + 1);
bytes_read = getline (&my_string, &nbytes, stdin);
if (bytes_read == -1)
{
puts ("ERROR!");
}
else
{
puts ("You typed:");
puts (my_string);
}
return 0;
I think if you see carefully at right hand side of this page you will see many questions similar to yours. You can use fflush() on windows.

why does this happen (see image)?

Why does the following have the effect it does - it prints a terminal full of random characters and then exits leaving a command prompt that produces garbage when you type in it. (I tried it because I thought it would produce a seg fault).
#include <stdio.h>
int main(){
char* s = "lololololololol";
while(1){
printf("%c", *s);
s++;
}
}
it was compiled with:
gcc -std=c99 hello.c
It will eventually seg fault, but before that it'll print out whatever bytes are in the same page. That's why you see random chars on the screen.
Those may well include escape sequences to change (say) the character encoding of the console. That's why you end up with gibberish when you type on the console after it's exited, too.
Because you have an infinite loop (while(1)), and you keep getting the current value of pointer (*s), and then moving the pointer one char forward (s++). This has the effect of marching well past the end of the string into "garbage" (uninitialized memory), which gets printed to the console as a result.
In addition to what everyone else said in regards to you ignoring the string terminal character and just printing willy-nilly what's in memory past the string, the reason why your command prompt is also "garbage" is that by printing a particular "unprintable" character, your terminal session was left in a strange character mode. (I don't know which character it is or what mode change it does, but maybe someone else can pipe in about it that knows better than I.)
You are just printing out what is in memory because your loop doesn't stop at the end of the string. Each random byte is interpreted as a character. It will seg fault when you reach the end of the memory page (and get into unreadable territory).
Expanding ever so slightly on the answers given here (which are all excellent) ... I ran into this more than once myself when I was just beginning with C, and it's an easy mistake to make.
A quick tweak to your while loop will fix it. Everyone else has given you the why, I'll hook you up with the how:
#include <stdio.h>
int main() {
char *s = "lolololololololol";
while (*s != '\0') {
printf("%c", *s);
s++;
}
}
Note that instead of an infinite loop (while(1)), we're doing a loop check to ensure that the pointer we're pulling isn't the null-terminator for the string, thus avoiding the overrun you're encountering.
If you're stuck absolutely needing while(1) (for example, if this is homework and the instructor wants you to use it), use the break keyword to exit the loop. The following code smells, at least to me, but it works:
#include <stdio.h>
int main() {
char *s = "lolololololololol";
while (1) {
if (*s == '\0')
break;
printf("%c", *s);
s++;
}
}
Both produce the same console output, with no line break at the end:
lolololololololol
Your loop doesn't terminate, so println prints whatever is in the memory after the text you write; eventually it will access memory it is not allowed to read, causing it to segfault.
You can change the loop as the others suggested, or you can take advantage of fact that in c, zero is false and null (which terminates all strings) is also zero, so you can construct the loop as:
while (*s) {
Rather than:
while (*s != '\0')
The first one may be more difficult to understand, but it does have the advantage of brevity so it is often used to save a bit of typing.
Also, you can usually get back to your command prompt by using the 'reset' command, typing blindly of course. (type Enter, reset, Enter)

Resources