A txt file is read in binary mode and stored in a buffer (I'm writing a HEX editor so it's important that files are read in binary mode):
The following code removes any new lines and prints the txt to the console:
#include <stdio.h>
#include <stdlib.h>
#include <windows.h>
FILE *fileptr;
unsigned char *buffer;
long filelen;
int main(int argc, char* argv[]){
fileptr = fopen(argv[1], "rb");
fseek(fileptr, 0, SEEK_END);
filelen = ftell(fileptr);
rewind(fileptr);
buffer = (char *)malloc((filelen+1)*sizeof(char));
fread(buffer, filelen, 1, fileptr);
fclose(fileptr); // Close the file
for (int i = 0; i < filelen; i++){
if (buffer[i] == '\n'){
printf(".");
}else{
printf("%c", buffer[i]);
}
}
}
This is the intended output, what we want:
This is the actual output, not what we want:
When a sleep(); command is added this is what seems to be occurring(Second line highlighted in green for clarity):
The first line prints fine, then the new line is reached, this is where the error is occurring the new line seems to be removed, only the cursor jumps back to the beginning of the line, this behavior is not expected nor is it wanted.
Try this
for (int i = 0; i < len; a++){
if ((buffer[i] == 10) || (buffer[i]==13)){
printf(".");
}else{
printf("%c", buffer[a]);
}
fflush(stdout);
}
as you know unix, dos and mac .txt files have different ways of indicating the start of a new line and this could be causing an issue for you - In the revised code instead of looking for \n the program looks for ascii codes 10 and 13 - line feed and carriage return. The one undesirable consequence is that you will get two .s between lines for ms-dos type files, but you could modify around that provided you knew you would only ever have ms-dos type .txt files
The other thing I have added that may or may not be necessary is fflush(stdout); because often when you printf things do not appear immediately on the screen and this should force things to be printed. It may not be necessary.
I think the reason that you get one line written on top of the other is because you have a dos type .txt file with a carriage return and a linefeed character at the end of each line - you are catching the linefeed with your \n if statement, but not the carriage return which sends the cursor to the beginning of the line and means that the first part of the text file is overwritten by the second part.
Related
I'm working on an example problem where I have to reverse the text in a text file using fseek() and ftell(). I was successful, but printing the same output to a file, I had some weird results.
The text file I input was the following:
redivider
racecar
kayak
civic
level
refer
These are all palindromes
The result in the command line works great. In the text file that I create however, I get the following:
ÿsemordnilap lla era esehTT
referr
levell
civicc
kayakk
racecarr
redivide
I am aware from the answer to this question says that this corresponds to the text file version of EOF in C. I'm just confused as to why the command line and text file outputs are different.
#include <stdio.h>
#include <stdlib.h>
/**********************************
This program is designed to read in a text file and then reverse the order
of the text.
The reversed text then gets output to a new file.
The new file is then opened and read.
**********************************/
int main()
{
//Open our files and check for NULL
FILE *fp = NULL;
fp = fopen("mainText.txt","r");
if (!fp)
return -1;
FILE *fnew = NULL;
fnew = fopen("reversedText.txt","w+");
if (!fnew)
return -2;
//Go to the end of the file so we can reverse it
int i = 1;
fseek(fp, 0, SEEK_END);
int endNum = ftell(fp);
while(i < endNum+1)
{
fseek(fp,-i,SEEK_END);
printf("%c",fgetc(fp));
fputc(fgetc(fp),fnew);
i++;
}
fclose(fp);
fclose(fnew);
fp = NULL;
fnew = NULL;
return 0;
}
No errors, I just want identical outputs.
The outputs are different because your loop reads two characters from fp per iteration.
For example, in the first iteration i is 1 and so fseek sets the current file position of fp just before the last byte:
...
These are all palindromes
^
Then printf("%c",fgetc(fp)); reads a byte (s) and prints it to the console. Having read the s, the file position is now
...
These are all palindromes
^
i.e. we're at the end of the file.
Then fputc(fgetc(fp),fnew); attempts to read another byte from fp. This fails and fgetc returns EOF (a negative value, usually -1) instead. However, your code is not prepared for this and blindly treats -1 as a character code. Converted to a byte, -1 corresponds to 255, which is the character code for ÿ in the ISO-8859-1 encoding. This byte is written to your file.
In the next iteration of the loop we seek back to the e:
...
These are all palindromes
^
Again the loop reads two characters: e is written to the console, and s is written to the file.
This continues backwards until we reach the beginning of the input file:
redivider
^
Yet again the loop reads two characters: r is written to the console, and e is written to the file.
This ends the loop. The end result is that your output file contains one character that doesn't exist (from the attempt to read past the end of the input file) and never sees the first character.
The fix is to only call fgetc once per loop:
while(i < endNum+1)
{
fseek(fp,-i,SEEK_END);
int c = fgetc(fp);
if (c == EOF) {
perror("error reading from mainText.txt");
exit(EXIT_FAILURE);
}
printf("%c", c);
fputc(c, fnew);
i++;
}
In addition to #melpomene correction about using only 1 fgetc() per loops, other issues exist.
fseek(questionable_offset)
fopen("mainText.txt","r"); opens the file in text mode and not binary mode. Thus the using fseek(various_values) as a valid offset into the file is prone to troubles. Usually not a problem in *nix systems.
I do not have a simple alternative.
ftell() return type
ftell() return long. Use long instead of int i, endNum. (Not a concern with small files)
Check return values
ftell() and fseek() can fail. Test for error returns.
So we have this file called dictionary1.txt and it has words with their pronounciation right next to them. What I want to do is to get the first word from each line and print them onto another txt file that the program creates from scratch. My code does it but it also prints random Chinese letters in between English words, I don't know why.
Here's what the ouput file looks like: https://imgur.com/a/pZthP
(Pronounciations are seperated from the actual words in each line with a blankspace in dictionary1.txt)
My code:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
char line[100];
int i = 0;
FILE* fp1 = fopen("dictionary1.txt", "r");
FILE* fp2 = fopen("dictionary2.txt", "w");
if (fp1 == NULL || fp2 == NULL){
printf("ERROR");
return -1;
}
while (fgets(line, 100, fp1) != NULL){
while (line[i] != ' '){
fputc(line[i], fp2);
i++;
}
i=0;
fputc('\0', fp2);
}
return 0;
}
I tried fputc('\n', fp2) as well bu t no matter what I couldn't get onto the next line in the file I created from scratch. I also can't get rid of all the random Chinese letters.
EDIT: I figured it out. The .txt file I was working on was saved in Unicode formatting, which didn't work well with my program. I turned it into ANSI and now it works like a charm.
\n is not the right line separator on all operating systems and all editors.
If you are editing your txt files on Notepad, try fputs ("\r\n", fp2);, where \r means carriage return (cursor returns at the first character of the line) and \n new line.
Generally speaking, Windows uses '\r\n' as line separator, the '\n' character is displayed as something else than end line, at least in Notepad. Linux and Mac OS use different line separators. You may also want to try fprintf(fp2, "\n");
Check this out
\n and \r seem to work everywhere. Why is line.separator more portable?
If you don't mind using C++, you could try to create an output stream os and write os << endl
Note that some compilers may automatically convert '\n' into the corresponding operating system end line character/caracther sequence, whereas some may not.
Another thing, change the while loop condition into line[i] != ' ' && line[i] != '\0' and close the file fp2 using fclose.
.txt file was saved using Unicode formatting. I turned it into ANSI and everything was suddenly fixed.
I am doing a coding exercise and I need to open a data file that contains lots of data. It's a .raw file. Before I build my app I open the 'card.raw' file in a texteditor and in a hexeditor. If you open it in textEdit you will see 'bit.ly/18gECvy ˇÿˇ‡JFIFHHˇ€Cˇ€Cˇ¿Vˇƒ' as the first line. (The url points to Rick Roll as a joke by the professor.)
So I start building my app to open the same 'card.raw' file. I'm doing initial checks to see the app print to the console the same "stuff" as when I open it with TextEdit. Instead of printing out I see when I open it with TextEdit (see the text above), it starts and continues printing out text that looks like this:
\377\304 'u\204\206\226\262\302\3227\205\246\266\342GSc\224\225\245\265\305\306\325\326Wgs\244\346(w\345\362\366\207\264\304ǃ\223\227\2678H\247\250\343\344\365\377\304
Now I have no idea what the '\' and numbers are called (what do I search for to read more?), why it's printing that instead of the characters (unicode?) I see when I open in TextEdit, or if I can convert this output to hex or unicode.
My code is:
#include <stdio.h>
#include <string.h>
#include <limits.h>
int main(int argc, const char * argv[]) {
FILE* file;
file = fopen("/Users/jamesgoldstein/CS50/CS50Week4/CS50Recovery/CS50Recovery/CS50Recovery/card.raw", "r");
char output[LINE_MAX];
if (file != NULL)
{
for (int i = 1; fgets(output, LINE_MAX, file) != NULL; i++)
{
printf("%s\n", output);
}
}
fclose(file);
return 0;
}
UPDATED & SIMPLIFIED CODE USING fread()
#include <stdio.h>
#include <string.h>
int main(int argc, const char * argv[]) {
FILE* fp = fopen("/Users/jamesgoldstein/CS50/CS50Week4/CS50Recovery/CS50Recovery/CS50Recovery/card.raw", "rb");
char output[256];
if (fp == NULL)
{
printf("Bad input\n");
return 1;
}
for (int i = 1; fread(output, sizeof(output), 1, fp) != NULL; i++)
{
printf("%s\n", output);
}
fclose(fp);
return 0;
}
Output is partially correct (here's a snippet of the beginning):
bit.ly/18gECvy
\377\330\377\340
\221\241\26145\301\321\341 "#&23DE\3616BFRTUe\202CVbdfrtv\222\242
'u\204\206\226\262\302\3227\205\246\266\342GSc\224\225\245\265\305\306\325\326Wgs\244\346(w\345\362\366\207\264\304ǃ\223\227\2678H\247\250\343\344\365\377\304
=\311\345\264\352\354 7\222\315\306\324+\342\364\273\274\205$z\262\313g-\343wl\306\375My:}\242o\210\377
3(\266l\356\307T饢"2\377
\267\212ǑP\2218 \344
Actual card.raw file snippet of beginning
bit.ly/18gECvy ˇÿˇ‡JFIFHHˇ€Cˇ€Cˇ¿Vˇƒ
ˇƒÖ
!1AQa$%qÅë°±45¡—· "#&23DEÒ6BFRTUeÇCVbdfrtví¢
I think you should open the .raw file in the mode "rb".
Then use fread()
From the presence of the string "JFIF" in the first line of the file card.raw ("bit.ly/18gECvy ˇÿˇ‡JFIFHHˇ€Cˇ€Cˇ¿Vˇƒ") it seems like card.raw is a JPEG image format file that had the bit.ly URL inserted at its beginning.
You are going to see weird/special characters in this case because it is not a usual text file at all.
Also, as davmac pointed out, the way you are using fgets isn't appropriate even if you were dealing with an actual text file. When dealing with plain text files in C, the best way is to read the entire file at once instead of line by line, assuming sufficient memory is available:
size_t f_len, f_actualread;
char *buffer = NULL;
fseek(file, 0, SEEK_END)
f_len = ftell(fp);
rewind(fp);
buffer = malloc(f_len + 1);
if(buffer == NULL)
{
puts("malloc failed");
return;
}
f_actualread = fread(buffer, 1, f_len, file);
buffer[f_actualread] = 0;
printf("%s\n", output);
free(buffer);
buffer = NULL;
This way, you don't need to worry about line lengths or anything like that.
You should probably use fread rather than fgets, since the latter is really designed for reading text files, and this is clearly not a text file.
Your updated code in fact does have the very problem I originally wrote about (but have since retracted), since you are now using fread rather than fgets:
for (int i = 1; fread(output, sizeof(output), 1, fp) != NULL; i++)
{
printf("%s\n", output);
}
I.e. you are printing the output buffer as if it were a null-terminated string, when in fact it is not. Better to use fwrite to STDOUT.
However, I think the essence of the problem here is trying to display arbitrary bytes (which don't actually represent a character string) to the terminal. The terminal may interpret some byte sequences as commands which affect what you see. Also, textEdit may determine that the file is in some character encoding and decode characters accordingly.
Now I have no idea what the '\' and numbers are called (what do I search for to read more?)
They look like octal escape sequences to me.
why it's printing that instead of the characters (unicode?)
It's nothing to do with unicode. Maybe it's your terminal emulator deciding that those characters are unprintable, and so replacing them with an escape sequence.
In short, I think that your method (comparing visually what you see in a text editor with what you see on the terminal) is flawed. The code you have to read from the file looks correct; I'd suggest proceeding with the exercise and checking results then, or if you really want to be sure, look at the file using a hex editor, and have your program output the byte values it reads (as numbers) - and compare those with what you see in the hex editor.
I want to change lines which contain the # symbol in a text file with heet using C.
I have tried it this way, but it did not work thoroughly, it just replaces the characters & overwrites not the whole string, like I want.
Is there any other trick to remove or delete a whole line from the file? So, we can easily replace it.
myfile.txt: (before execution)
Joy
#Smith
Lee
Sara#
Priyanka
#Addy
Code:
#include <stdio.h>
#include <string.h>
int main() {
FILE *pFile;
fpos_t pos1, pos2;
int line = 0;
char buf[68]
char *p;
char temp[10] = "heet";
pFile = fopen("myfile.txt", "r+");
printf("changes are made in this lines:\t");
while (!feof(pFile)) {
++line;
fgetpos(pFile, &pos1);
if (fgets(buf, 68, pFile) == NULL)
break;
fgetpos(pFile, &pos2);
p = strchr(buf, '#');
if (p != NULL) {
printf("%d, " , line);
fsetpos(pFile, &pos1);
fputs(temp, pFile);
}
fsetpos(pFile, &pos2);
}
fclose(pFile);
return 0;
}
myfile.txt: (after execution)
Joy
heetth
Lee
heet#
Priyanka
heety
Output:
changes are made in this lines: 2, 4, 6,
myfile.txt: (I want to get)
Joy
heet
Lee
heet
Priyanka
heet
The best way of doing what you want is to use a utility like sed. It is faster and uses less memory than anything you (or I) would write.
That aside, let's assume you want to go ahead and write it yourself anyway.
A file is just like a long array of bytes. If you want to increase or decrease the length of one line, it affects the position of every byte in the rest of the file. The result can be shorter (or longer) than the original. As the result can be shorter, modifying the file in place is a bad idea.
The following pseudo-code illustrates a simple approach:
open original file
open output file
allocate a line buffer that is large enough
read a line from the original file
do
return an error if the buffer is too small
manipulate the line
write the manipulated line to the output file
read a line from the original file
loop until read returns nothing
sed does it much smarter. I once saw an explanation on how sed works, but my google karma can't seem to find it.
Edit:
How to do it using sed:
sed -e 's/.*\#.*/heet/g' myfile.txt
The s, or substitute, command of sed can replace one string, or regular expression, with another string.
The above command is interpreted as:
replace any line that has a # somewhere in it with heet. The final g tells sed to do this globally, i.e. in the entire file.
Edit2:
By default, sed writes to standard output.
To rewrite the file you should redirect the output to a file and then rename it.
In linux, do the following (you can run command line stuff from C with system):
sed -e 's/.*\#.*/heet/g' myfile.txt > temp_file123.txt
rm myfile.txt
mv temp_file123.txt myfile.txt
From C:
system("sed -e 's/.*\#.*/heet/g' myfile.txt > temp_file123.txt");
system("rm myfile.txt");
system("mv temp_file123.txt myfile.txt");
If you want to do it with just one call to system, put all the command line stuff in a shell script.
You should probably treat input/output like a UNIX utility and replace the line by reading in the whole input and writing the whole output like sed would or something. It's going to be a pain to edit the line in place as you need to shift the following text 'down' in order to make it work.
You cannot achieve your goal by overwriting the file in place like you do in the code because heet is 3 bytes longer than # and there is no standard function to insert bytes in the middle of a file.
Note also these important issues:
you do not test if fopen() succeeds at opening the file. You have undefined behavior if the file does not exist or cannot be open for read+update mode.
while (!feof(pFile)) does not stop exactly at the end of file because the end of file indicator returned by feof() is only set when a read operation fails, not before. You should instead write:
while (fgets(buf, 68, pFile) != NULL) {
if the file has lines longer than 66 characters, the line numbers will be computed incorrectly.
There are 2 ways to replace the text in the file:
you can create a temporary file and write the modified contents to it. Once the contents have all been converted, delete the original file with remove() and rename the temporary file to the original name with rename(). This method uses extra space on the storage device, and requires that you can create a new file and determine a file name that does not conflict with existing file names.
alternately, you can read the complete contents of the original file and overwrite it with the modified contents from the start. This works because the modified contents is longer than the original contents. This method may fail if the file is very large and does not fit in memory, which is rather rare today for regular text files.
Here is a modified version using the second approach:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main() {
FILE *pFile;
int c, line, changes;
unsigned char *buf;
size_t pos, length, size;
char replacement[] = "heet";
/* open the file */
pFile = fopen("myfile.txt", "r+");
if (pFile == NULL) {
printf("cannot open myfile.txt\n");
return 1;
}
/* read the file */
buf = NULL;
length = size = 0;
while ((c = getc(pFile)) != EOF) {
if (length == size) {
size = size + size / 2 + 128;
buf = realloc(buf, size);
if (buf == NULL) {
printf("not enough memory to read myfile.txt\n");
fclose(pFile);
return 1;
}
}
buf[length++] = c;
}
/* write the modified contents */
rewind(pFile);
line = 1;
changes = 0;
for (pos = 0; pos < length; pos++) {
c = buf[pos];
if (c == '\n')
line++;
if (c == '#') {
if (changes++ == 0)
printf("changes are made in this lines:\t");
else
printf(", ");
printf("%d", line);
fputs(replacement, pFile);
} else {
putc(c, pFile);
}
}
free(buf);
fclose(pFile);
if (changes == 0)
printf("no changes were made\n");
else
printf("\n");
return 0;
}
To rewrite a word in file using fwrite or any file writing function, use fgetpos and fsetpos. Otherwise seeking file pointer alone will not work. Still this work, if the file pointer is end of the file, it means append is possible.
I write this C code so that I could test whether fwrite could update some values in a text file. I tested on Linux and it works fine. In Windows (vista 32bits), however, it simply does not work. The file remains unchanged after I write a different byte using: cont = fwrite(&newfield, sizeof(char), 1, fp);
The registers are written on the file using a "#" separator, in the format:
Reg1FirstField#Reg1SecondField#Reg2FirstField#Reg2SecondField...
The final file should be: First#1#Second#9#Third#1#
I also tried putc and fprintf, all with no result. Can someone please help me with this?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct test {
char field1[20];
char field2;
} TEST;
int main(void) {
FILE *fp;
TEST reg, regread;
char regwrite[22];
int i, cont, charwritten;
fp=fopen("testupdate.txt","w+");
strcpy(reg.field1,"First");
reg.field2 = '1';
sprintf(regwrite,"%s#%c#", reg.field1, reg.field2);
cont = (int)strlen(regwrite);
charwritten = fwrite(regwrite,cont,1,fp);
fflush(fp);
strcpy(reg.field1,"Second");
reg.field2 = '1';
sprintf(regwrite,"%s#%c#", reg.field1, reg.field2);
cont = (int)strlen(regwrite);
charwritten = fwrite(regwrite,cont,1,fp);
fflush(fp);
strcpy(reg.field1,"Third");
reg.field2 = '1';
sprintf(regwrite,"%s#%c#", reg.field1, reg.field2);
cont = (int)strlen(regwrite);
charwritten = fwrite(regwrite,cont,1,fp);
fflush(fp);
fclose(fp);
// open file to update
fp=fopen("testupdate.txt","r+");
printf("\nUpdate field 2 on the second register:\n");
char aux[22];
// search for second register and update field 2
for (i = 0; i < 3; i ++) {
fscanf(fp,"%22[^#]#", aux);
printf("%d-1: %s\n", i, aux);
if (strcmp(aux, "Second") == 0) {
char newfield = '9';
cont = fwrite(&newfield, sizeof(char), 1, fp);
printf("written: %d bytes, char: %c\n", cont, newfield);
// goes back one byte in order to read properly
// on the next fscanf
fseek(fp,-1,SEEK_CUR);
}
fscanf(fp,"%22[^#]#", aux);
printf("%d-2: %s\n",i, aux);
aux[0] = '\0';
}
fflush(fp);
fclose(fp);
// open file to see if the update was made
fp=fopen("testupdate.txt","r");
for (i = 0; i < 3; i ++) {
fscanf(fp,"%22[^#]#", aux);
printf("%d-1: %s\n", i, aux);
fscanf(fp,"%22[^#]#",aux);
printf("%d-2: %s\n",i, aux);
aux[0] = '\0';
}
fclose(fp);
getchar();
return 0;
}
You're missing a file positioning function between the read and write. The Standard says:
7.19.5.3/6
When a file is opened with update mode, both input and output may be performed on the associated stream. However, ... input shall not be directly followed by output without an intervening call to a file positioning function, unless the input operation encounters end-of-file. ...
for (i = 0; i < 3; i ++) {
fscanf(fp,"%22[^#]#", aux); /* read */
printf("%d-1: %s\n", i, aux);
if (strcmp(aux, "Second") == 0) {
char newfield = '9';
/* added a file positioning function */
fseek(fp, 0, SEEK_CUR); /* don't move */
cont = fwrite(&newfield, sizeof(char), 1, fp); /* write */
I didn't know it but here they explain it:
why fseek or fflush is always required between reading and writing in the read/write "+" modes
Conclusion: You must either fflush or fseek before every write when you use "+".
fseek(fp, 0, SEEK_CUR);
// or
fflush(fp);
cont = fwrite(&newfield, sizeof(char), 1, fp);
Fix verified on Cygwin.
You're not checking any return values for errors. I'm guessing the file is read-only and is not even opening properly.
At least here on OSX, your value 9 is begin appended to the end of the file ... so you're not updating the actual register value for Second at it's position in the file. For some reason after the scan for the appropriate point to modify the values, your stream pointer is actually at the end of the file. For instance, running and compiling your code on OSX produced the following output in the actual text file:
First#1#Second#1#Third#1#9
The reason your initial read-back is working is because the data is being written, but it's at the end of the file. So when you write the value and then back-up the stream and re-read the value, that works, but it's not being written in the location you're assuming.
Update: I've added some calls to ftell to see what's happening to the stream pointer, and it seems that your calls to fscanf are working as you'd assume, but the call to fwrite is jumping to the end of the file. Here's the modified output:
Update field 2 on the second register:
**Stream position: 0
0-1: First
0-2: 1
**Stream position: 8
1-1: Second
**Stream position before write: 15
**Stream position after write: 26
written: 1 bytes, char: 9
1-2: 9
**Stream position after read-back: 26
Update-2: It seems by simply saving the position of the stream-pointer, and then setting the position of the stream-pointer, the call to 'fwrite` worked without skipping to the end of the file. So I added:
fpos_t position;
fgetpos(fp, &position);
fsetpos(fp, &position);
right before the call to fwrite. Again, this is on OSX, you may see something different on Windows.
With this:
fp=fopen("testupdate.txt","w+");
^------ Notice the + sign
You opened the file in "append" mode -- that is what the plus sign does in this parameter. As a result, all of your fwrite() calls will be relative to the end of the file.
Using "r+" for the fopen() mode doesn't make sense -- the + means nothing in this case.
This and other issues with fopen() are why I prefer to use the POSIX-defined open().
To fix your particular case, get rid of the + characters from the fopen() modes, and consider that you might need to specify binary format on Windows ("wb" and "rb" modes).