Discrepancy with fgetc while reading a text file - c

I´m beginning with C and I´m willing to understand certain conditions.
I have a text file, generated by notepad or direct via shell by echo in a windows os.
When running this the output show extra chars. What I ´m doing wrong? How I can read text files in a secure way char by char?
Using codeblocks with minggw.
file.txt:
TEST
C program
void main()
{
int i;
FILE *fp;
fp = fopen("file.txt","r");
while ((i = fgetc(fp)) != EOF)
{
printf("%c",i);
}
}
Output
 ■T E S T

Your code has issues, but the result is fine.
Your file is likely UTF-8 with a (confusingly enough) byte order mark in the beginning. Your program is (correctly) reading and printing the bytes of the BOM, which then appear in the output as strange characters before the proper text.
Of course, UTF-8 should never need a byte order mark (it's 8-bit bytes!), but that doesn't prevent some less clued-in programs from incuding one. Window's Notepad is the first program on the list of such programs.
UPDATE: I didn't consider the spacing between your letters, which of course indicate 16-bit input. That's your problem right there, then. Your C code is not reading wide characters.

Try this code
void main()
{
int c,i;
FILE *fp;
fp = fopen("file.txt","r");
while ((i = fgetc(fp)) != EOF)
{
printf("%c",i);
}
}'

Related

Error with binary-writting mode with C, on Windows?

I am learning how to write a simple CGI page with C language. I tried with Apache on both Linux and Windows. I compiled my scripts on 2 different computers that run different OSes.
Firstly, I created a simple CGI page for getting a static plain-text content:
#include
int main()
{
FILE *fp = fopen("plain_text.txt", "r"); // text-mode only.
if (fp)
{
int ch;
printf("content-type: text/plain\n\n");
while ((ch = fgetc(fp)) != EOF)
{
printf("%c", ch);
}
fclose(fp);
}
return 0;
}
I compiled it into an executable and put it in cgi-bin directory. When I browse it with my web-browser, it returns the plain-text content correctly (both Linux and Windows).
Then, I modified above script for getting a simple JPEG content.
(I understand that: every JPEG picture is a binary file)
#include
int main()
{
FILE *fp = fopen("cat_original.jpg", "rb"); // with binary-mode.
if (fp)
{
int ch;
printf("content-type: image/jpg\n\n");
while (((ch = fgetc(fp)) != EOF) || (!feof(f1))) // can read whole content of any binary file.
{
printf("%c", ch);
}
fclose(fp);
}
return 0;
}
I compiled it into an executable and put it in cgi-bin directory, too.
I can get the correct returned-image with Linux compiled-executable files; but, the Windows does not.
To understand the problem, I downloaded the returned-image with Windows compiled-execute files.
(I named this image: cat_downloaded_windows.jpg)
Then, I used VBinDiff for compare 2 images: cat_original.jpg (68,603 bytes) and cat_downloaded_windows.jpg (68,871 bytes).
There are many lines in cat_downloaded_windows.jpg (like the row I marked) have a character which cat_original.jpg does not have.
VBinDiff
So, I guess that the Windows OS causes the problem (Windows add some characters automatically, and Linux does not)
(Apache and web-browsers do not cause problem)
So, I posted this topic into StackOverflow for getting your helps. I have 2 questions:
Is there any problem with the printf("%c", ch); (in my script) on Windows?
Is there any way to print binary content into stdout, both Linux and Windows?
I am learning programming myself, and this is the first time I ask on StakOverflow.
So, if my question is not clear, please comment below this question; I will try to explain it more.
Thank you for your time!
When you use printf() to write to standard output, it is working in text mode, not binary mode, so every time your program encounters a newline \n in the JPEG file, it writes \r\n on Windows, which corrupts the JPEG file.
You'll need to know how to put standard output into binary mode and you'll need to ensure that you generate \r\n in place of \n in the headers.
The MSDN documentation says you can use _setmode(), and shows an example (setting stdin instead of stdout):
#include <stdio.h>
#include <fcntl.h>
#include <io.h>
int main(void)
{
int result;
// Set "stdin" to have binary mode:
result = _setmode(_fileno(stdin), _O_BINARY);
if (result == -1)
perror("Cannot set mode");
else
printf("'stdin' successfully changed to binary mode\n");
}

I/O redirection

#include<stdio.h>
#include<stdlib.h>
int main()
{
int i;
for(i=1; i<=255; i++)
{
printf("%d %c\n",i,i);
}
}
Hey i am working my way out from i/o redirection, and i got stuck in outputting ascii table from command prompt i done this.
C:\New folder\practice> main.exe > temp.txt
C:\New folder\practice> type temp.txt
and after hitting enter (after type temp.txt) it only outputs first 26 numbers. My question is why?
Also can someone explain me how to just copy the code into text file using redirection I know how to do using FILE I/O.
Because you're using MS-DOS... er MS WinDOS, and there ASCII number 26/^Z is the end-of-text-file mark.
The feature exists so that the environment is compatible with the CP/M operating system of the early 1970s, in case you'd need to use some files that originate from that. As you've noticed, only type works like that, but more would display more... (no pun intended).
No kidding.
It is very dangerous to write non ASCII characters in a text stream. 0x10 is \n and and can be changed into the underlying system end of line which is \r\n on Windows.
The correct way is to open a file in binary mode:
#include<stdio.h>
#include<stdlib.h>
int main()
{
int i;
FILE *fd = fopen("temp.txt", "wb");
if (NULL == fd) {
perror("Error opening file");
return 1;
}
for(i=1; i<=255; i++)
{
fprintf(fd, "%d %c\n",i,i);
}
fclose(fd);
return 0;
}
That being said, commands expecting text files may stop when they read a SUB control character (CtrlZ code 0x1A), which is you current problem...

C program to get first word of each line from a .txt file and print that word onto another .txt file: Kind of works but also prints random letters

So we have this file called dictionary1.txt and it has words with their pronounciation right next to them. What I want to do is to get the first word from each line and print them onto another txt file that the program creates from scratch. My code does it but it also prints random Chinese letters in between English words, I don't know why.
Here's what the ouput file looks like: https://imgur.com/a/pZthP
(Pronounciations are seperated from the actual words in each line with a blankspace in dictionary1.txt)
My code:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
char line[100];
int i = 0;
FILE* fp1 = fopen("dictionary1.txt", "r");
FILE* fp2 = fopen("dictionary2.txt", "w");
if (fp1 == NULL || fp2 == NULL){
printf("ERROR");
return -1;
}
while (fgets(line, 100, fp1) != NULL){
while (line[i] != ' '){
fputc(line[i], fp2);
i++;
}
i=0;
fputc('\0', fp2);
}
return 0;
}
I tried fputc('\n', fp2) as well bu t no matter what I couldn't get onto the next line in the file I created from scratch. I also can't get rid of all the random Chinese letters.
EDIT: I figured it out. The .txt file I was working on was saved in Unicode formatting, which didn't work well with my program. I turned it into ANSI and now it works like a charm.
\n is not the right line separator on all operating systems and all editors.
If you are editing your txt files on Notepad, try fputs ("\r\n", fp2);, where \r means carriage return (cursor returns at the first character of the line) and \n new line.
Generally speaking, Windows uses '\r\n' as line separator, the '\n' character is displayed as something else than end line, at least in Notepad. Linux and Mac OS use different line separators. You may also want to try fprintf(fp2, "\n");
Check this out
\n and \r seem to work everywhere. Why is line.separator more portable?
If you don't mind using C++, you could try to create an output stream os and write os << endl
Note that some compilers may automatically convert '\n' into the corresponding operating system end line character/caracther sequence, whereas some may not.
Another thing, change the while loop condition into line[i] != ' ' && line[i] != '\0' and close the file fp2 using fclose.
.txt file was saved using Unicode formatting. I turned it into ANSI and everything was suddenly fixed.

Comparing newline doesn't work properly

I have the following code in C to make an input file using an existing input file, but without newlines:
int main()
{
int T;
char c;
FILE *fi,*fo;
fi=fopen("Square-practice.in","r");
fo=fopen("Square-practice-a.in","w");
fscanf(fi,"%d",&T);
fprintf(fo,"%d",T);
while(fscanf(fi,"%c",&c)==1){
if(c=='\n') printf("qwert");
else fprintf(fo,"%c",c);
}
return 0;
}
There is no compiling error.
However, the output file is exactly the same as the input file, with the newline included.
"qwert" is printed 8 times (same as the number of newlines in file fi). So why doesn't the "else" work?
The compiler is MinGW.
Both the fi,fo files are here
I think you have '\r\n' instead of '\n'. So try
int main()
{
int T;
char c;
FILE *fi,*fo;
fi=fopen("Square-practice.in","r");
fo=fopen("Square-practice-a.in","w");
fscanf(fi,"%d",&T);
fprintf(fo,"%d",T);
while(fscanf(fi,"%c",&c)==1){
if(c=='\n' || c=='\r') printf("qwert");
else fprintf(fo,"%c",c);
}
return 0;
}
You can also use fgetc() and fputc(). Just skip any \r or \n before passing each char into new file:
Your code with modifications:
int main()
{
int T;
int iChr
char c;
FILE *fi,*fo;
fi=fopen("Square-practice.in","r");
fo=fopen("Square-practice-a.in","w");
//fscanf(fi,"%d",&T);
//fprintf(fo,"%d",T);
iChr = fgetc(fi)
while(iChr != EOF)
{
if((iChr =='\n')||(iChr =='\r')//skipping new file
{
printf("qwert");
}
else fputc(fo);//no \n or \r, put in new file
}
fclose(fi);
fclose(fo);
return 0;
}
I'm running this on my Linux and I'm getting just what I should be getting: the same file without new line characters and "qwert" printed to stdout. If you're getting something else, it must be an issue with CR/LF translation. Try replacing "r" and "w" with "rt" and "wt", respectively.
Two PS comments:
The given program works (with or without "rt") on my gcc 4.7.2 on Linux, provided that line terminators in the input file are converted from CRLF to LF. This is reasonable when you move a text file from Windows to Linux and can be done, e.g., with the fromdos tool.
It is true that the C standard (section 7.19.5.3, p. 271, for ISO C99, or section 7.21.5.3, p. 306, for ISO C2011) does not require "t" for text files (so, conforming implementations need not implement it), but it seems that some implementations work differently.

displaying contents of a file on monitor in C

I'm trying to recreate a program I saw in class.
The teacher made a file with 10 lines, he showed us that the file was indeed created, and then he displayed its contents.
My code doesn't work for some reason, it just prints what looks like a"=" a million times and then exits.
My code:
void main()
{
FILE* f1;
char c;
int i;
f1=fopen("Essay 4.txt","w");
for(i=0;i<10;i++)
fprintf(f1," This essay deserves a 100!\n");
do
{
c=getc(f1);
putchar(c);
}while(c!=EOF);
}
What is the problem? as far as I can see I did exactly what was in the example given.
The flow is as such:
You create a file (reset it to an empty file if it exists). That's what the "w" mode does.
Then you write stuff. Note that the file position is always considered to be at the very end, as writing moves the file position.
Now you try to read from the end. The very first thing you read would be an EOF already. Indeed, when I try your program on my Mac, I just get a single strange character just as one would expect from the fact that you're using a do { } while. I suggest you instead do something like: for (c = getc(f1); c != EOF; c = getc(f1)) { putchar(c) } or similar loop.
But also, your reading should fail anyway because the file mode is "w" (write only) instead of "w+".
So you need to do two things:
Use file mode "w+".
Reset the file position to the beginning of the file after writing to it: fseek(f1, 0, SEEK_SET);.

Resources