I wrote the following code:
main()
{
FILE *fp;
fp=fopen("ftest.txt","r");
char c, filestring[100];
int i=0;
while((c=getc(fp))!=EOF)
{
filestring[i]=c;
}
printf("str is %s",filestring);
fclose(fp);
}
The file ftest.txt contains the words Hello World.
The output displayed is not correct, it is either some other font or some other encoding.
What is the reason for this? And how do I solve this problem?
At the same time, this code runs well (shows output on stdout in "English"):
main()
{
FILE *fp;
fp=fopen("ftest.txt","r");
char c;
while((c=getc(fp))!=EOF)
{
printf("%c",c);
}
fclose(fp);
}
I need the first code to work, as I've to search in the text file. How to solve this?
The question is different from Output is not displying correctly in file operation as I'm able to "display" the correct output (as in second code), but I'm not able to write the contents of the file into a string.
Things I've tried:
1) Changing the mode in which the file is opened from "r" to "rb".
2) Changing the Notepad encoding to all available options: ANSI, UTF etc.
There are two parts of the answer:
You never increment i. This means you're just overwriting the same spot (the first space in the array) in the while loop. That's why the first value of the junk is a 'd' (the last character of your input).
The junk after the 'd' is because the array is never initialized, meaning that there is random junk already there that is never overwritten.
Another note: doing the first way would require manually adding a null byte \0 to the end of the array (either by initializing the whole thing to \0s or just after the last character is read in. This is so the string is read correctly by printf.
... and there's also a third part that's wrong here:
getc() returns an int, you're assigning it to a char before comparing it with EOF, which is defined as -1. If it just so happens that getc returns character 255, it gets assigned to a char, a signed 8 bit value, which results in, in a manner of speaking (char)-1, which then gets signed-extended to -1.
Related
The text contains:
..... (some characters can't be posted on SO)
xxxxxxxx=xxx xxxxxxx=xxxxx://xxx..xxx/xxxxx/xx9528994
(for full text & data please see https://github.com/ggaarder/snippets/raw/master/x.txt)
which is ended in xxxxx://xxx..xxx/xxxxx/xx9528994, however, when reading it then puts, it only gives out
..... (some characters can't be posted on SO)
xxxxxxxx=xxx xxxxxxx=xxxxx:/
which only prints to xxxxx:/, and /xxx..xxx/xxxxx/xx9528994 is missed.
Code to test:
#include <stdio.h>
int main(void)
{
char s[30000];
FILE *f = fopen("x.txt", "r");
fread(s, sizeof(s), 1, f);
puts(s);
return 0;
}
The buffer size 30000 is adequate. x.txt is 1049 bytes.
You can download x.txt at https://github.com/ggaarder/snippets/raw/master/x.txt, for convenience I have packed everything to https://github.com/ggaarder/snippets/raw/master/foo.zip.
It will be very kind of you to download and take a look of x.txt, since most part of it can't be posted on SO because of the special characters, including some CJK.
Attempts:
The whole file is read properly. #pmg notices that fread returns zero, while #Someprogrammerdude points out that if fread's size and count arguments are swapped fread returns 1049, and this supports the guess.
If the CJK letters are removed, the output will be totally OK. So I think there is no '\0' in the middle.
By adding
ret = puts(s);
printf("\nret: %d, %s", ret, strerror(errno));
We will get ret: 0, No error. puts return zero and there's nothing in errno.
You may notice that there's a heading \n in 3.. Yes, puts doesn't gives out the newline as usual - does this suggest that puts failed?
But why does it returns zero and there's nothing in errno?
May it be related to Windows NT cmd? Maybe some special terminal control letters are unintentionally out.
Reading by rb is the same. x.txt is an XML text, just for convenience I removed part of it that are the irrelevant, so it looks like spam.
I guess this is just yet another encoding issue, plus some magical secret Windows commandline control sequence .... I'm not taking it. I will just erase all non-ASCII characters.
The order of the "size" and "count" arguments to fread is crucial.
The first argument is the "element" size, and the second argument is the number of elements to attempt to read.
In the case of a text file, the element size is a single character, usually a single byte. The number of elements to attempt to read is the size of the destination array.
So your call should be
fread(s, 1, sizeof s, f);
instead.
What happens now when you have the opposite is that you say that the "element" size is 30000 bytes, and that fread should read one such element. Since the size of the file is less than 30000 bytes, it just can't read even a single element, and returns 0 to indicate it.
open the file in binary mode
switch arguments and check the return value of fread().
#include <stdio.h>
#include <stdlib.h>
int main(void) {
char s[30000];
FILE *f = fopen("x.txt", "rb"); // binary mode
unsigned long len = fread(s, 1, sizeof(s), f); // switch args, check value
if (len < 1) {
perror("bad fread");
exit(EXIT_FAILURE);
}
s[len] = 0; // properly terminate s
puts(s);
return 0;
}
It's just yet another encoding issue happening everyday. Just SetConsoleOutputCP(65001) or /utf-8 or set execution code page in #pragma and everything will be fine.
The objective is to write an int function that returns the number of occurrences of numbers greater than 100 in a text file. The function should receive the file pointer as an argument.
Here is my code so far:
#include <stdio.h>
int function(FILE *infp);
int main ()
{
FILE *infp;
printf("\n%d\n",function(infp));
}
int function(FILE *infp)
{
int num, counter=0;
if ((infp = fopen ("text.txt", "r")) == NULL)
printf ("\ncannot open the file text.txt\n");
while ((num = getc())!=EOF)
{
if (num>100)
counter++;
}
fclose(infp);
return (counter);
}
It is always outputting 0. I'm thinking either getc is not the right command to use here or maybe I am formatting the text file wrong? Any help would be great
here you are using getc() to catch numbers from file but your getc() will give you only one character at a time so
for example :
if your file content is like : "103 9";
then your getc() will give "1" at 1st time then it will give you "0" and then "3"..
in this way you will never be reading a number completely and you are getting one character at a time.
To get rid of this you can use : fscanf(infp, "%d", &num);....
this will give you one complete number in one go.
then you can easily match the numbers.
It's going to be a tad bit of work to do this in C. How about grabbing the text content. Splitting on space. Verifying the correct ASCII characters that represent a base 10 number [48,57] (interval notation), from there you can apply a conversion algorithm, such as atoi
getc() reads the next character from a stream. What you want is to possibly tokenize the file via some delimiter (let's say space). See here for more details.
This question already has answers here:
copying the contents of a binary file
(4 answers)
Closed 9 years ago.
The following program is intended to make a copy of one .exe application file.But just one little thing determines whether it indeed gives me a proper copy of the intended file RealPlayer.exe or gives me a corrupted file.
What I do is read from the source file in binary mode and write to the new copy in the same mode.For this I use a variable ch.But if ch is of type char, I get a corrupted file which has a size of few bytes while the original file is 26MB.But if I change the type of ch to int, the program works fine and gives me the exact copy of RealPlayer.exe sized 26MB.So let me ask two questions that arise from this premise.I would appreciate if you can answer both parts:
1) Why does using type char for ch mess things up while int type works?What is wrong with char type?After all, shouldn't it read byte by byte from the original file(as char is one byte itself) and write it byte by byte to the new copy file?After all isn't what the int type does,ie, read 4 bytes from original file and then write that to the copy file?Why the difference between the two?
2) Why is the file so small-sized compared to original file if we use char type for ch?Let's forget for a moment that the copied file is corrupt to begin with and focus on the size.Why is it that the size is so small if we copy character by character (or byte by byte), but is big(original size) when we copy "integer by integer" (or 4-bytes by 4-bytes)?
I was suggested by a friend to simply stop asking questions and use int because it works while char doesn't!!.But I need to understand what's going on here as I see a serious lapse in my understanding in this matter.Your detailed answers are much sought.Thanks.
#include<stdio.h>
#include<stdlib.h>
int main()
{
char ch; //This is the cause of problem
//int ch; //This solves the problem
FILE *fp,*tp;
fp=fopen("D:\\RealPlayer.exe","rb");
tp=fopen("D:\\copy.exe","wb");
if(fp==NULL||tp==NULL)
{
printf("Error opening files");
exit(-1);
}
while((ch=getc(fp))!=EOF)
putc(ch,tp);
fclose(fp);
fclose(tp);
}
The problem is in the termination condition for the loop. In particular, the type of the variable ch, combined with rules for implicit type conversions.
while((ch=getc(fp))!=EOF)
getc() returns int - either a value from 0-255 (i.e. a char) or -1 (EOF).
You stuff the result into a char, then promote it back to int to do the comparison. Unpleasant things happen, such as sign extension.
Let's assume your compiler treats "char" as "signed char" (the standard gives it a choice).
You read a bit pattern of 0xff (255) from your binary file - that's -1, expressed as a char. That gets promoted to int, giving you 0xffffffff, and compared with EOF (also -1, i.e 0xffffffff). They match, and your program thinks it found the end of file, and obediently stops copying. Oops!
One other note - you wrote:
After all isn't what the int type does,ie, read 4 bytes from original
file and then write that to the copy file?
That's incorrect. getc(fp) behaves the same regardless of what you do with the value returned - it reads exactly one byte from the file, if there's one available, and returns that value - as an int.
int getc ( FILE * stream );
Returns the character currently pointed by the internal file position indicator of the specified stream.
On success, the character read is returned (promoted to an int value).If you have already defined ch as int all works fine but if ch is defined as char, returned value from getc() is supressed back to char.
above reasons are causing corruption in data and loss in size.
I am new to C programming, so I am having difficulties with the problem below.
I have a text file inp.txt which contains information like the following:
400;499;FIRST;
500;599;SECOND;
670;679;THIRD;
I need to type a number and my program needs to compare it with numbers from the inp.txt file.
For example, if I type 450, it's between 400 and 499, so I need write to the word FIRST to the file out.txt
I have no idea how to convert a character array to an int.
I think you'll want these general steps in your program (but I'll leave it to you to figure out how you want to do it exactly)
Load each of the ranges and the text "FIRST", "SECOND", etc. from the file inp.txt, into an array, or several arrays, or similar. As I said in the comment above, fscanf might be handy. This page describes how to use it - the page is about C++, but using it in C should be the same http://www.cplusplus.com/reference/clibrary/cstdio/fscanf/. Roughly speaking, the idea is that you give fscanf a format specifier for what you want to extract from a line in a file, and it puts the bits it finds into the variables you specify)
Prompt the user to enter a number.
Look through the array(s) to work out which range the number fits into, and therefore which text to output
Edit: I'll put some more detail in, as asker requested. This is still a kind of skeleton to give you some ideas.
Use the fopen function, something like this (declare a pointer FILE* input_file):
input_file = fopen("c:\\test\\inp.txt", "r") /* "r" opens inp.txt for reading */
Then, it's good to check that the file was successfully opened, by checking if input_file == NULL.
Then use fscanf to read details from one line of the file. Loop through the lines of the file until you've read the whole thing. You give fscanf pointers to the variables you want it to put the information from each line of the file into. (It's a bit like a printf formatting specifier in reverse).
So, you could declare int range_start, range_end, and char range_name[20]. (To make things simple, let's assume that all the words are at most 20 characters long. This might not be a good plan in the long-run though).
while (!feof(input_file)) { /* check for end-of-file */
if(fscanf(input_file, "%d;%d;%s", &range_start, &range_end, range_name) != 3) {
break; /* Something weird happened on this line, so let's give up */
else {
printf("I got the following numbers: %d, %d, %s\n", range_start, range_end, range_name);
}
}
Hopefully that gives you a few ideas. I've tried running this code and it did seem to work. However, worth saying that fscanf has some drawbacks (see e.g. http://mrx.net/c/readfunctions.html), so another approach is to use fgets to get each line (the advantage of fgets is that you get to specify a maximum number of characters to read, so there's no danger of overrunning a string buffer length) and then sscanf to read from the string into your integer variables. I haven't tried this way though.
I'm trying to read in a text file line by line and process each character individually.
For example, one line in my text file might look like this:
ABC XXXX XXXXXXXX ABC
There will always be a different amount of spaces in the line. But the same number of characters (including spaces).
This is what I have so far...
char currentLine[100];
fgets(currentLine, 22, inputFile);
I'm then trying to iterate through the currentLine Array and work with each character...
for (j = 0; j<22; j++) {
if (¤tLine[j] == 'x') {
// character is an x... do something
}
}
Can anyone help me with how I should be doing this?
As you can probably tell - I've just started using C.
Something like the following is the canonical way to process a file character by character:
#include <stdio.h>
int main(int argc, char **argv)
{
FILE *fp;
int c;
if (argc != 2) {
fprintf(stderr, "Usage: %s file.txt\n", argv[0]);
exit(1);
}
if (!(fp = fopen(argv[1], "rt"))) {
perror(argv[1]);
exit(1);
}
while ((c = fgetc(fp)) != EOF) {
// now do something with each character, c.
}
fclose(fp);
return 0;
}
Note that c is declared int, not char because EOF has a value that is distinct from all characters that can be stored in a char.
For more complex parsing, then reading the file a line at a time is generally the right approach. You will, however, want to be much more defensive against input data that is not formatted correctly. Essentially, write the code to assume that the outside world is hostile. Never assume that the file is intact, even if it is a file that you just wrote.
For example, you are using a 100 character buffer to read lines, but limiting the amount read to 22 characters (probably because you know that 22 is the "correct" line length). The extra buffer space is fine, but you should allow for the possibility that the file might contain a line that is the wrong length. Even if that is an error, you have to decide how to handle that error and either resynchronize your process or abandon it.
Edit: I've added some skeleton of an assumed rest of the program for the canonical simple case. There are couple of things to point out there for new users of C. First, I've assumed a simple command line interface to get the name of the file to process, and verified using argc that an argument is really present. If not, I print a brief usage message taking advantage of the content of argv[0] which by convention names the current program in some useful way, and exit with a non-zero status.
I open the file for reading in text mode. The distinction between text and binary modes is unimportant on Unix platforms, but can be important on others, especially Windows. Since the discussion is of processing the file a character at a time, I'm assuming that the file is text and not binary. If fopen() fails, then it returns NULL and sets the global variable errno to a descriptive code for why it failed. The call to perror() translates errno to something human-readable and prints it along with a provided string. Here I've provided the name of the file we attempted to open. The result will look something like "foo.txt: no such file". We also exit with non-zero status in this case. I haven't bothered, but it is often sensible to exit with distinct non-zero status codes for distinct reasons, which can help shell scripts make better sense of errors.
Finally, I close the file. In principle, I should also test the fclose() for failure. For a process that just reads a file, most error conditions will already have been detected as some kind of content error, and there will be no useful status added at the close. For file writing, however, you might not discover certain I/O errors until the call to fclose(). When writing a file it is good practice to check return codes and expect to handle I/O errors at any call that touches the file.
You don't need the address operator (&). You're trying to compare the value of the variable currentLine[j] to 'x', not it's address.
ABC XXXX XXXXXXXX ABC has 21 characters. There's also the line break (22 chars) and the terminating null byte (23 chars).
You need to fgets(currentLine, 23, inputFile); to read the full line.
But you declared currentLine as an array of 100. Why not use all of it?
fgets(currentLine, sizeof currentLine, inputFile);
When using all of it, it doesn't mean that the system will put more than a line each time fgets is called. fgets always stops after reading a '\n'.
Try
while( fgets(currentLine, 100, inputFile) ) {
for (j = 0; j<22; j++) {
if (/*&*/currentLine[j] == 'x') { /* <--- without & */
// character is an x... do something
}
}
}