So here is the code that I am attempting to get to work:
char* inFile = "input.txt";
FILE *out = fopen("output.txt", "a+");
int i = 0
while(i < 5){
int countFound = findWord(inFile, keyWord[i]);//returns count of keywords in given file
fprintf(out, "%s: %d\n", keyWord[i], countFound);
i++;
}
fclose(out);
The output of this code is:
youshouldsee1
: 3
youshouldsee2
: 3
youshouldsee3
: 3
youshouldsee4
: 3
youshouldsee5: 1
Expected output:
youshouldsee1: 3
youshouldsee2: 3
youshouldsee3: 3
youshouldsee4: 3
youshouldsee5: 1
I don't really understand why the output is like that, shouldn't it print the string and the int then a new line? Also note that there is not a newline after the last line and there should be. I did some testing and I noticed that if I changed the fprintf statement to fprintf(out, "%s\n", keyWord[i]); the output is:
youshouldsee1
youshouldsee2
youshouldsee3
youshouldsee4
youshouldsee5
Which is formatted much better. Again note that there is not a newline after the last line and there should be.
I noticed that while doing this with just printf statements I get the exact same problem, but the output is slightly more messed up.
Does anybody know what causes this specific issue? Much appreciated.
The array keyWord[] is a double pointer, I'm not sure if that makes a difference or not, but I thought that I would mention it. It is declared like so char** keyWord;. And it was created as follows:
char *tempWord = "something";
keyWords[x] = strdup(tempWord);
That could be totally irrelevent but I thought it was best to mention it.
You likely have some carriage return characters ('\r' or 0x0D) and/or backspace characters ('\b' or 0x08) in your keyWord[i] strings.
CRs and backspaces don't print ordinarily to terminals and instead move the terminal cursor—CRs move the cursor to the beginning of the current line, and backspaces move the cursor backwards one character. When subsequent characters are printed, they overwrite what was there before. So this code
printf("foobar\rbaz\n");
results in this output
bazbar
and this code
printf("foo\bbar\n");
results in this output
fobar
If you look at the raw bytes of the output without printing it to the terminal, you should see the CRs and backspaces plain as day. I like to use the program hexdump(1) to do that. For example:
$ echo -e 'foobar\rbaz'
bazbar
$ echo -e 'foobar\rbaz' | hexdump -C
00000000 66 6f 6f 62 61 72 0d 62 61 7a 0a |foobar.baz.|
0000000b
So I'd suggest looking at the raw data in your program's output and find out where the pesky characters are, and then figure out how to get rid of them.
Related
In the simple code below, I'm writing an int number (10) into a file and then reading it back to make sure it's done successfully and it is. However, when I open the file (tried both notepad++ and vscode) I see something like this:
???
Here's the code:
int main(){
int var = 10;
FILE* fp = fopen("testfile","w");
rewind(fp);
fwrite(&var,sizeof(int),1,fp);
fflush(fp);
fclose(fp);
int var2 = 0;
fopen("testfile","r+");
fread(&var2,sizeof(int),1,fp);
printf("num: %d\n",var2);
return 0;
}
Of course I thought maybe it's written in a special format which vscode is unable to recognize, but recently I learned coding a simple database, and it used just the same way to save the records in files and when you opened its output file with vscode, it showed both ???s AND the information, however, here it shows only ???s WITHOUT the information. So although it seems be a very basic problem, I can't find the answer to it, so how is 10 really stored in that file? Thanks in advance.
When you write to the file with fwrite, it reads the raw bytes that make up var and writes those to disk. This is the binary representation of the number.
If you use a tool like od, it will print out the bytes the files contains:
[dbush#db-centos7 ~]$ od -tx1 testfile
0000000 0a 00 00 00
0000004
You can see here that the first byte contains the value 10 and the next 3 contain the value 0. This tells us that an int takes up 4 bytes and is stored in little-endian format, meaning the least significant byte comes first.
Had you instead uses fprintf to write the value:
fprintf(fp, "%d\n", var);
It would have written the text representation to the file. The file would then look something like this:
[dbush#db-centos7 ~]$ cat testfile
10
[dbush#db-centos7 ~]$ od -tx1 testfile
0000000 31 30 0a
0000003
We can see here that printing the file shows readable text, and od shows us the ASCII codes for the characters '1' and '0', as well as a newline.
You are writing a binary file. It cannot be read with an editor. The value 10 is probably stored as 0x0000000A or 0x0A000000 something like that, depending on if the system is big or small endian.
But the point is that it is stored in binary format and not text format.
If you open this file in a text editor, it will likely be interpreted as three NULL characters and then a LF (line feed) character.
I created a file like this
a v
bb
e
And I didn't press enter after typing e in the last line.
So there are four characters in the first line 'a',' ','v','\n'.
There are three characters in the second line 'b','b','\n'.
And there is one character in the last line 'e'.
So there are totally 8 characters in this file. But when I count the characters using the following C program:
#include<stdio.h>
/* count characters in input; 1st version */
int main()
{
long nc;
nc = 0;
while (getchar() != EOF) {
++nc;
}
printf("%ld\n", nc);
return 0;
}
It gave me 9. Even when I use wc command to count, it is still 9. Why?
There's reasoning in favor of having all lines terminated with a newline character:
Why should text files end with a newline?
And there are text editors that are set up to add a trailing newline character automatically (if not already there):
How to stop Gedit, Gvim, Vim, Nano from adding End-of-File newline char?
Probably that is why you observe the unexpected file size.
To inspect the actual content of such files, I like to use hexdump:
$ hexdump -C test
00000000 61 20 76 0a 62 62 0a 65 0a |a v.bb.e.|
00000009
Hello guys I need a help. I want to read from stdin by 16 bytes. Every byte I convert into hexadecimal form. Is there a way I can use read() function to read NOT from the beginning, but for example from the second byte? Also how can I know if I have read the whole stdin? - This way I could call this function in a cycle until I have read the whole stdin
This is a function I made:
void getHexLine()
{
int n = 16;
char buffer[n];
read(STDIN_FILENO, buffer, n);
buffer[n]='\0';
//printf("%08x", 0); hex number of first byte on line - not working yet
putchar(' ');
putchar(' ');
//converting every byte into hexadecimal
for (int i = 0;i < 16;i++ )
{
printf("%x", buffer[i]);
putchar(' ');
if (i == 7 || i == 15)
putchar(' ');
}
printf("|%s|\n", buffer);
}
The output should be like this but with an option to start from second byte for example.
[vcurda#localhost proj1]$ echo "Hello, world! This is my program." | ./proj1
48 65 6c 6c 6f 2c 20 77 6f 72 6c 64 21 20 54 68 |Hello, world! Th|
69 73 20 69 73 20 6d 79 20 70 72 6f 67 72 61 6d |is is my program|
This is a school project so I cant use malloc, scanf and <string.h>. I would be really glad if I get some help and sorry for my not very understandable english.
stdin is not seekable. You can read bytes in, but you can't rewind or fast forwards. EOF (-1) means end of input in stdin as with a regular file, but it's a bit of a looser concept if you are conducting an interactive dialogue with the user.
Basically stdin is line oriented, and it's best to use the pattern printf() prompt, enter whole line from user, printf() results if applicable and another prompt, read in whole line from user, and so on, at least at first until you get used to programming stdin.
To start from the second byte then becomes easy. Read in the whole line, then start from i = 1 instead of i = 0 as you parse it.
Is there a way I can use read() function to read NOT from the
beginning, but for example from the second byte?
Most universally, you can simply ignore the read-in bytes you're not interested in.
Sometimes you will be able to lseek, e.g. if you run your program with
a regular file set to its STDIN as in:
./a.out < /etc/passwd
but lseek will fail on STDINs that are terminals, pipes, character devices, or sockets.
how can I know if I have read the whole stdin?
read will return 0 at the end of the file.
Consult the manual pages for more information.
Generally, you should check your return codes and account for short reads. Your function should probably return an int so that it has a way to communicate a possible IO error.
I am working on inputting a 2d pixel array from a PPM file with the one I am testing being of width and length 5. Also I am aware that in a ppm file that is rgb it has 3 color values not just one. I had forgotten that before I wrote this code, but the problem still persists even with the update to it and the problem still exists in the same way. I have simplified the problem to just the array as to isolate the problem. From what I can tell this seems to be both dropping characters and replacing some with new line characters as well. Any insight into why this is happening would be greatly appreciated and if I forgot to add something I will update this as soon as I am aware.
#include <stdio.h>
int main(int args, char *argv[]) {
int w = 5, h = 5;
FILE *f = fopen(argv[1], "rb");
int c = 'a';//I am setting this so as to avoid the off chance of c being defined as EOF
for(int i = 0; i < h && c != EOF; i++) {
for(int j = 0; j < w && (c = fgetc(f)) != EOF; j++) printf("%c", c);
fgetc(f);//To remove the '\n' character I am not using fgets because it stops at '\n' character and it is possible for a rgb value to be == to '\n'
printf("\n");
}
fclose(f);
return 0;
}
Test File I am using:
12345
abcde
12345
abcde
12345
Output I am getting:
12345
abcd
123
5
ab
de
1
Thanks in advance!
Edit: This is running on the windows 10 command prompt
The problem is that '\n' on a Windows machine actually ends up producing two characters, a carriage return (ASCII code 13) and a line feed (ASCII code 10). When you open a file in binary mode, those line endings are not translated back to a single character. You're only accounting for one of these characters, so you're getting off by a character on each line you read.
To illustrate this, replace your printf("%c", c);" with printf("%d ", c);. I get the following output:
49 50 51 52 53
10 97 98 99 100
13 10 49 50 51
53 13 10 97 98
100 101 13 10 49
You can see those 10s and 13s shifting through.
Now try adding a second fgetc(f); to eat the line feed and it will work much better. Keep in mind, however, that this only works on files with CRLF line endings. Port it to Linux or Mac and you will have more troubles.
I was experimenting on creating BMP files from scratch, when I found a weird bug I could not explain. I isolated the bug in this minimalist program:
int main()
{
FILE* ptr=NULL;
int success=0,pos=0;
ptr=fopen("test.bin","w");
if (ptr==NULL)
{
return 1;
}
char c[3]={10,11,10};
success=fwrite(c,1,3,ptr);
pos=ftell(ptr);
printf("success=%d, pos=%d\n",success,pos);
return 0;
}
the output is:
success=3, pos=5
with hex dump of the test.bin file being:
0D 0A 0B 0D 0A
In short, whatever value you put instead of 11 (0x0B), fwrite will write it correctly. But for some reason, when fwrite comes across a 10 (0x0A) - and precisely this value - it writes 0D 0A instead, that is, 2 bytes, although I clearly specified 1 byte per write in the fwrite arguments. Thus the 3 bytes written, as can be seen in the success variable, and the mysterious 5 in the ftell result.
Could someone please tell me what the heck is going on here...and why 10, why not 97 or 28??
Thank you very much for your help!
EDIT: oh wait, I think I have an idea...isn't this linked to \n being 0A on Unix, and 0D 0A on windows, and some inner feature of the compiler converting one to the other? how can I force it to write exactly the bytes I want?
Your file was opened in text mode so CRLF translation is being done. Try:
fopen("test.bin","wb");
You must be working on the Windows machine. In Windows, EOL is CR-LF whereas in Unix, it is a single character. Your system is replacing 0A with 0D0A.