y with umlaut in file - c

I'm working on an example problem where I have to reverse the text in a text file using fseek() and ftell(). I was successful, but printing the same output to a file, I had some weird results.
The text file I input was the following:
redivider
racecar
kayak
civic
level
refer
These are all palindromes
The result in the command line works great. In the text file that I create however, I get the following:
ÿsemordnilap lla era esehTT
referr
levell
civicc
kayakk
racecarr
redivide
I am aware from the answer to this question says that this corresponds to the text file version of EOF in C. I'm just confused as to why the command line and text file outputs are different.
#include <stdio.h>
#include <stdlib.h>
/**********************************
This program is designed to read in a text file and then reverse the order
of the text.
The reversed text then gets output to a new file.
The new file is then opened and read.
**********************************/
int main()
{
//Open our files and check for NULL
FILE *fp = NULL;
fp = fopen("mainText.txt","r");
if (!fp)
return -1;
FILE *fnew = NULL;
fnew = fopen("reversedText.txt","w+");
if (!fnew)
return -2;
//Go to the end of the file so we can reverse it
int i = 1;
fseek(fp, 0, SEEK_END);
int endNum = ftell(fp);
while(i < endNum+1)
{
fseek(fp,-i,SEEK_END);
printf("%c",fgetc(fp));
fputc(fgetc(fp),fnew);
i++;
}
fclose(fp);
fclose(fnew);
fp = NULL;
fnew = NULL;
return 0;
}
No errors, I just want identical outputs.

The outputs are different because your loop reads two characters from fp per iteration.
For example, in the first iteration i is 1 and so fseek sets the current file position of fp just before the last byte:
...
These are all palindromes
^
Then printf("%c",fgetc(fp)); reads a byte (s) and prints it to the console. Having read the s, the file position is now
...
These are all palindromes
^
i.e. we're at the end of the file.
Then fputc(fgetc(fp),fnew); attempts to read another byte from fp. This fails and fgetc returns EOF (a negative value, usually -1) instead. However, your code is not prepared for this and blindly treats -1 as a character code. Converted to a byte, -1 corresponds to 255, which is the character code for ÿ in the ISO-8859-1 encoding. This byte is written to your file.
In the next iteration of the loop we seek back to the e:
...
These are all palindromes
^
Again the loop reads two characters: e is written to the console, and s is written to the file.
This continues backwards until we reach the beginning of the input file:
redivider
^
Yet again the loop reads two characters: r is written to the console, and e is written to the file.
This ends the loop. The end result is that your output file contains one character that doesn't exist (from the attempt to read past the end of the input file) and never sees the first character.
The fix is to only call fgetc once per loop:
while(i < endNum+1)
{
fseek(fp,-i,SEEK_END);
int c = fgetc(fp);
if (c == EOF) {
perror("error reading from mainText.txt");
exit(EXIT_FAILURE);
}
printf("%c", c);
fputc(c, fnew);
i++;
}

In addition to #melpomene correction about using only 1 fgetc() per loops, other issues exist.
fseek(questionable_offset)
fopen("mainText.txt","r"); opens the file in text mode and not binary mode. Thus the using fseek(various_values) as a valid offset into the file is prone to troubles. Usually not a problem in *nix systems.
I do not have a simple alternative.
ftell() return type
ftell() return long. Use long instead of int i, endNum. (Not a concern with small files)
Check return values
ftell() and fseek() can fail. Test for error returns.

Related

What do I need to do to read a file then pick a line and write it to another file (Using C)?

I've been trying to figure out how I would, read a .txt file, and pick a line of said file from random then write the result to a different .txt file
for example:
.txt
bark
run
car
take line 2 and 3 add them together and write it to Result.txt on a new line.
How would I go about doing this???
I've tried looking around for resources for fopen(), fgets(), fgetc(), fprintf(), puts(). Haven't found anything so far on reading a line that isn't the first line, my best guess:
-read file
-print line of file in memory I.E. an array
-pick a number from random I.E. rand()
-use random number to pick a array location
-write array cell to new file
-repeat twice
-make newline repeat task 4-6
-when done
-close read file
-close write file
Might be over thinking it or just don't know what the operation to get a single line anywhere in a file is.
just having a hard time rapping my head around it.
I'm not going to solve the whole exercise, but I will give you a hint on how to copy a line from one file to another.
You can use fgets and increment a counter each time you find a line break, if the line number is the one you want to copy, you simply dump the buffer obtained with fgets to the target file with fputs.
#include <stdio.h>
#include <string.h>
int main(void)
{
// I omit the fopen check for brevity
FILE *in = fopen("demo.c", "r");
FILE *out = fopen("out.txt", "w");
int ln = 1, at = 4; // copy line 4
char str[128];
while (fgets(str, sizeof str, in))
{
if (ln == at)
{
fputs(str, out);
}
if (strchr(str, '\n') && (ln++ == at))
{
break;
}
}
fclose(in);
fclose(out);
return 0;
}
Output:
int main(void)

fread is not reading whole file [duplicate]

What translation occurs when writing to a file that was opened in text mode that does not occur in binary mode? Specifically in MS Visual C.
unsigned char buffer[256];
for (int i = 0; i < 256; i++) buffer[i]=i;
int size = 1;
int count = 256;
Binary mode:
FILE *fp_binary = fopen(filename, "wb");
fwrite(buffer, size, count, fp_binary);
Versus text mode:
FILE *fp_text = fopen(filename, "wt");
fwrite(buffer, size, count, fp_text);
I believe that most platforms will ignore the "t" option or the "text-mode" option when dealing with streams. On windows, however, this is not the case. If you take a look at the description of the fopen() function at: MSDN, you will see that specifying the "t" option will have the following effect:
line feeds ('\n') will be translated to '\r\n" sequences on output
carriage return/line feed sequences will be translated to line feeds on input.
If the file is opened in append mode, the end of the file will be examined for a ctrl-z character (character 26) and that character removed, if possible. It will also interpret the presence of that character as being the end of file. This is an unfortunate holdover from the days of CPM (something about the sins of the parents being visited upon their children up to the 3rd or 4th generation). Contrary to previously stated opinion, the ctrl-z character will not be appended.
In text mode, a newline "\n" may be converted to a carriage return + newline "\r\n"
Usually you'll want to open in binary mode. Trying to read any binary data in text mode won't work, it will be corrupted. You can read text ok in binary mode though - it just won't do automatic translations of "\n" to "\r\n".
See fopen
Additionally, when you fopen a file with "rt" the input is terminated on a Crtl-Z character.
Another difference is when using fseek
If the stream is open in binary mode, the new position is exactly offset bytes measured from the beginning of the file if origin is SEEK_SET, from the current file position if origin is SEEK_CUR, and from the end of the file if origin is SEEK_END. Some binary streams may not support the SEEK_END.
If the stream is open in text mode, the only supported values for offset are zero (which works with any origin) and a value returned by an earlier call to std::ftell on a stream associated with the same file (which only works with origin of SEEK_SET.
Even though this question was already answered and clearly explained, I think it would be interesting to show the main issue (translation between \n and \r\n) with a simple code example. Note that I'm not addressing the issue of the Crtl-Z character at the end of the file.
#include <stdio.h>
#include <string.h>
int main() {
FILE *f;
char string[] = "A\nB";
int len;
len = strlen(string);
printf("As you'd expect string has %d characters... ", len); /* prints 3*/
f = fopen("test.txt", "w"); /* Text mode */
fwrite(string, 1, len, f); /* On windows "A\r\nB" is writen */
printf ("but %ld bytes were writen to file", ftell(f)); /* prints 4 on Windows, 3 on Linux*/
fclose(f);
return 0;
}
If you execute the program on Windows, you will see the following message printed:
As you'd expect string has 3 characters... but 4 bytes were writen to file
Of course you can also open the file with a text editor like Notepad++ and see yourself the characters:
The inverse conversion is performed on Windows when reading the file in text mode.
We had an interesting problem with opening files in text mode where the files had a mixture of line ending characters:
1\n\r
2\n\r
3\n
4\n\r
5\n\r
Our requirement is that we can store our current position in the file (we used fgetpos), close the file and then later to reopen the file and seek to that position (we used fsetpos).
However, where a file has mixtures of line endings then this process failed to seek to the actual same position. In our case (our tool parses C++), we were re-reading parts of the file we'd already seen.
Go with binary - then you can control exactly what is read and written from the file.
In 'w' mode, the file is opened in write mode and the basic coding is 'utf-8'
in 'wb' mode, the file is opened in write -binary mode and it is resposible for writing other special characters and the encoding may be 'utf-16le' or others

Why is my program perceiving an EOF condition way before my file actually ends?

My code reads line by line from a text file and stores the lines in a massive array of char pointers. When I use an ordinary text file, this works with no issues. However, when I try to read from the 'dictionary.txt' file I'm supposed to be using, my program detects EOF after reading the first of MANY lines in the file.
int i = 0;
while( 1 ) {
size_t size = 50;
fseek( dicFile, 0L, getline( &dictionary[i++], &size, dicFile) );
printf( "%d:\t%s", i, dictionary[i - 1] );
if( feof( dicFile ) ) {
fclose( dicFile );
break;
}
}
puts("finished loading dictionary");
Here is the start of the dictionary file I'm attempting to load:
A
A's
AA's
AB's
ABM's
AC's
ACTH's
AI's
AIDS's
AM's
AOL
AOL's
ASCII's
ASL's
ATM's
ATP's
AWOL's
AZ's
The output is get from this portion of the program is:
1: A
2: finished loading dictionary
Thanks for any help.
Your third argument to fseek() is nuts. I've seen at least one implementation that treated every out of range third argument as SEEK_END. Oops.
You should just call getline() in the loop instead. In fact, just check the return value of getline() for -1 and get rid of that feof().

After two successful read fgetc() is not working when used with fseek()

I am trying to encrypt(simple bit manuplation algorithm) a file, for that I created three different version
create new file in the process of encryption or decryption and deletes the old one and renames it
encryption or decryption happens on the same file, using two different File *, one opening the file in rb and another opening the same file in rb+.
encryption or decryption happens on the same file, only one FILE* is used, file opened in rb+ mode.
First two versions works as expected they don't use fseek(), but I am encountering problem in version3
code for v3:
FILE *inputFile= NULL,*outputFile = NULL;
char *inName = "file.txt",*outName;
/* I am using same source file for all three version controlled by #if, so the following assignment is necessary*/
outName = inName;
inputFile = fopen(inName,"rb+");
if(inputFile == NULL)
{
perror(inName);
exit(EXIT_FAILURE);
}
/* I am using same source file for all three version controlled by #if, so the following assignment is necessary*/
outputFile = inputFile;
long int currentLocation;
unsigned char targetChar;
int intChar;
long int offset,temp;
while(currentLocation = ftell(inputFile),(intChar = fgetc(inputFile))!= EOF)
{
targetChar = intChar;
/*#if encryption
encrypt(&targetChar);
#else
decrypt(&targetChar);
#endif // encryption*/
/* going back in the file to the starting position of currently read character*/
temp = currentLocation;
currentLocation = ftell(inputFile);
offset = temp - currentLocation; // the offset is always -1 throughout the program(gdb)
if(fseek(inputFile,offset,SEEK_CUR)==-1)
{
perror(outName);
exit(EXIT_FAILURE);
}
// writing the encrypted or decrpted character to the file
if(fputc(targetChar,outputFile) == EOF)
{
perror(outName);
exit(EXIT_FAILURE);
}
}
fclose(inputFile);
fclose(outputFile);
for the first two character fgetc() is working properly the its is not reading at all, meanwhile the currentLocation is steadily increment.
if the file has following content
Hello world !!
output is
Heelo world !!
or
Heeeeeeeeeeeeeeeeeeeeeeeee ...
the number of e depends on how long the program runs, its an infinite loop.
I am using fseek() to move backwards , does this clear EOF(causing the infinite loop condition) even though I am doing only backward fseek() ? and I checked in debugger the fgetc() is not reading more than two characters but the currentLocation is moving in the increment of 1, why fgetc() is not reading more than two characters?
if(fputc(targetChar,outputFile) == EOF)
{
perror(outName);
exit(EXIT_FAILURE);
}
fflush(outputFile);//need flush
}
fclose(inputFile);
//fclose(outputFile);//double close!

Unexpected output when using fseek

Assuming we have a text file named hi.txt that contains the following string:
AbCdE12345
Let say we run this code:
int main() {
FILE *fp;
fp = fopen("hi.txt","r");
if(NULL == fp) { return 1; }
fseek(fp,-1, SEEK_END);
while (ftell(fp) > 0) {
printf("%c",fgetc(fp));
fseek(fp,-4, SEEK_CUR);
}
fclose(fp);
return 0;
}
When I ran this code it printed: 3EbCd
When I tried to guess what it would print I thought that it should be 52d.
Can anyone explain what has happened here ?
It looks like there is a non-printable end-of-line character at the end of your file. That's what gets printed first. Then the position is moved in turn to 3, E, and b. At this point, re-positioning by -3 fails, because the location would become -2. File cursor stays where it was, i.e. at C which gets printed next. The following attempt at repositioning fails too, so d gets printed. The next repositioning succeeds, terminating the loop.
To detect situations when fseek is ignored, check its return value, like this:
while (ftell(fp) > 0) {
printf("%c",fgetc(fp));
// Successful calls of fseek return zero
if (fseek(fp,-4, SEEK_CUR)) {
// Exit the loop if you can't jump back by 4 positions
break;
}
}
For files opened in text mode the offset passed to fseek is only meaningful for values returned by ftell. So the offset may not necessarily be in bytes. Try opening the file in binary mode:
fp = fopen("hi.txt", "rb");
and see if the results are different.

Resources