Copy file in C doens't seem to work completely - c

For my programming course I have to make a program that copies a file.
This program asks for the following:
an input file in the command prompt
a name for the output file
The files required to copy are .WAV audio files. I tried this with an audio sample of 3 seconds.
The thing is that I do get a file back, for it to be empty. I have added the fclose and fopen statements
while((ch = fgetc(input)) != EOF)
{
fputc(ch, output);
}
I hope someone can point out where I probably made some beginners mistake.

The little while loop you show should in principle work if all prerequisites are met:
The files could be opened.
If on a Microsoft operating system, the files were opened in binary mode (see below).
ch is an int.
In other words, all problems you have are outside this code.
Binary mode: The CR-LF issue
There is a post explaining possible reasons for using a carriage return/linefeed combination; in the end, it is the natural thing to do, given that with typewriters, and by association with teletypes, the two are distinct operations: You move the large lever on the carriage to rotate the platen roller or cylinder a specified number of degrees so that the next line would not print over the previous one; that's the aptly named line feed. Only then, with the same lever, you move the carriage so that the horizontal print position is at the beginning of the line. That's the aptly named carriage return. The order of events is only a technicality.
DOS C implementations tried to be smart: A C program ported from Unix might produce text with only newlines in it; the output routines would transparently add the carriage return so that it would follow the DOS conventions and print properly. Correspondingly, CR/LF combinations in an input file would be silently converted to only LF when read by the standard library implementations.
The DOS file convention also uses CTR-Z (26) as an end-of-file marker. Again, this could be a useful hint to a printer that all data for the current job had been received.
Unfortunately, these conventions were made the default behavior, and today are typically a nuisance: Nobody sends plain text to a printer any longer (apart from the three people who will comment under this post that they still do that).
It is a nuisance because for files that are not plain text silent data changes are catastrophic and must be suppressed, with a b "flag" indicating "binary" data passed in the fopen mode argument: To faithfully read you must specify fopen(filename, "rb"), and in order to faithfully write you must specify fopen(filename, "wb").
Empty file !?
When I tried copying a wave file without the binary flags the data was changed in the described fashion, and the copy stopped before the first byte with the value 26 (CTRL-Z) in the source. In other words, while the copy was corrupt, it was not empty. By the way, all wave files start with the bytes RIFF, so that no CTR-Z can be encountered in the first position.
There are a number of possibilities for an empty target file, the most likely of which:
You didn't emit or missed an error message regarding opening the files (does your editor keep a lock on the output?), and the program crashed silently when one of the file pointers was null. Note that error messages may fail to be printed when you make error output on standard out: That stream is buffered, and buffered output may be lost in a crash. By contrast, output to stderr is unbuffered exactly to prevent message loss.
You are looking at the wrong output file. This kind of error is surprisingly common. You could perform a sanity check by deleting the file you are looking at, or by printing something manually before you start copying.
Generally, check the return value of every operation (including your fputc!).

Related

What does a file pointer point to in C?

I am trying to understand input and output files in C. In the beginning, when we want to open a file to read, we declare a file pointer as follows:
FILE *fptr1 = fopen( "filename", "r")
I understand that FILE is a data structure in the stdio.h library and that it contains information about the file. I also know that the fopen() function returns a FILE structure. But, is that the purpose of the pointer. It just points to a bunch of information about the file? I've been reading into this and I have heard the term "file streams" floating around a bit. I understand that it is a an interface of communication with the file (find it vague, but I'll take it). Is that what the pointer points to in simple terms - a file stream? In the above code example, would the pointer be pointing to an input file stream?
Thank you!
The FILE structure is intended to be opaque. In other words, you are not supposed to look into it if you want your programs to remain portable.
Further, FILE is always used through a pointer, so you don't even need to know its size.
In a way, you can consider it a void * for all intents and purposes.
Now, if you are really interested on what the FILE type may hold, the C standard itself explains it quite well! See C11 7.21.1p2:
(...) FILE which is an object type capable of recording all the information needed to control a stream, including its file position indicator, a pointer to its associated buffer (if any), an error indicator that records whether a read/write error has occurred, and an end-of-file indicator that records whether the end of the file has been reached; (...)
So as you see, at least it contains stuff like:
The position inside the file
A pointer to a buffer
Error flags
EOF flag
It mentions (as you do) streams. You can find some more details about it in section 7.21.2 Streams:
Input and output, whether to or from physical devices such as terminals and tape drives, or whether to or from files supported on structured storage devices, are mapped into logical data streams, whose properties are more uniform than their various inputs and outputs. Two forms of mapping are supported, for text streams and for binary streams.
(...)
A binary stream is an ordered sequence of characters that can transparently record internal data. (...)
As we can read, a stream is an ordered sequence of characters. Note that it does not say whether this sequence is finite or not! (More on that later)
So, how do they relate to files? Let's see section 7.21.3 Files:
A stream is associated with an external file (which may be a physical device) by opening a file, which may involve creating a new file. Creating an existing file causes its former contents to be discarded, if necessary. If a file can support positioning requests (such as a disk file, as opposed to a terminal), then a file position indicator associated with the stream is positioned at the start character number zero) of the file, unless the file is opened with append mode in which case it is implementation-defined whether the file position indicator is initially positioned at the beginning or the end of the file. The file position indicator is maintained by subsequent reads, writes, and positioning requests, to facilitate an orderly progression through the file.
(...)
See, when you open a "disk file" (the typical file in your computer), you are associating a "stream" (finite, in this case) which you can open/read/write/close/... through fread() and related functions; and the data structure that holds all the required information about it is FILE.
However, there are other kinds of files. Imagine a pseudo-random number generator. You can conceptualize it as an infinite read-only file: every time you read it gives you a different value and it never "ends". Therefore, this file would have an infinite stream associated with it. And some operations may not make sense with it (e.g. maybe you cannot seek it, i.e. move the file position indicator).
This only serves as a quick introduction, but as you can see, the FILE structure is an abstraction over the concept of a file. If you want to learn more about this kind of thing, the best you can do is reach for a good book on Operating Systems, e.g. Modern Operating Systems from Tanenbaum. This book also refers to C, so even better.

What really is EOF for binary files? Condition? Character?

I have managed this far with the knowledge that EOF is a special character inserted automatically at the end of a text file to indicate its end. But I now feel the need for some more clarification on this. I checked on Google and the Wikipedia page for EOF but they couldn't answer the following, and there are no exact Stack Overflow links for this either. So please help me on this:
My book says that binary mode files keep track of the end of file from the number of characters present in the directory entry of the file. (In contrast to text files which have a special EOF character to mark the end). So what is the story of EOF in context of binary files? I am confused because in the following program I successfully use !=EOF comparison while reading from an .exe file in binary mode:
#include<stdio.h>
#include<stdlib.h>
int main()
{
int ch;
FILE *fp1,*fp2;
fp1=fopen("source.exe","rb");
fp2=fopen("dest.exe","wb");
if(fp1==NULL||fp2==NULL)
{
printf("Error opening files");
exit(-1);
}
while((ch=getc(fp1))!=EOF)
putc(ch,fp2);
fclose(fp1);
fclose(fp2);
}
Is EOF a special "character" at all? Or is it a condition as Wikipedia says, a condition where the computer knows when to return a particular value like -1 (EOF on my computer)? Example of such "condition" being when a character-reading function finishes reading all characters present, or when character/string I/O functions encounter an error in reading/writing?
Interestingly, the Stack Overflow tag for EOF blended both those definitions of the EOF. The tag for EOF said "In programming realm, EOF is a sequence of byte (or a chacracter) which indicates that there are no more contents after this.", while it also said in the "about" section that "End of file (commonly abbreviated EOF) is a condition in a computer operating system where no more data can be read from a data source. The data source is usually called a file or stream."
But I have a strong feeling EOF won't be a character as every other function seems to be returning it when it encounters an error during I/O.
It will be really nice of you if you can clear the matter for me.
The various EOF indicators that C provides to you do not necessarily have anything to do with how the file system marks the end of a file.
Most modern file systems know the length of a file because they record it somewhere, separately from the contents of the file. The routines that read the file keep track of where you are reading and they stop when you reach the end. The C library routines generate an EOF value to return to you; they are not returning a value that is actually in the file.
Note that the EOF returned by C library routines is not actually a character. The C library routines generally return an int, and that int is either a character value or an EOF. E.g., in one implementation, the characters might have values from 0 to 255, and EOF might have the value −1. When the library routine encountered the end of the file, it did not actually see a −1 character, because there is no such character. Instead, it was told by the underlying system routine that the end of file had been reached, and it responded by returning −1 to you.
Old and crude file systems might have a value in the file that marks the end of file. For various reasons, this is usually undesirable. In its simplest implementation, it makes it impossible to store arbitrary data in the file, because you cannot store the end-of-file marker as data. One could, however, have an implementation in which the raw data in the file contains something that indicates the end of file, but data is transformed when reading or writing so that arbitrary data can be stored. (E.g., by “quoting” the end-of-file marker.)
In certain cases, things like end-of-file markers also appear in streams. This is common when reading from the terminal (or a pseudo-terminal or terminal-like device). On Windows, pressing control-Z is an indication that the user is done entering input, and it is treated similarly to reach an end-of-file. This does not mean that control-Z is an EOF. The software reading from the terminal sees control-Z, treats it as end-of-file, and returns end-of-file indications, which are likely different from control-Z. On Unix, control-D is commonly a similar sentinel marking the end of input.
This should clear it up nicely for you.
Basically, EOF is just a macro with a pre-defined value representing the error code from I/O functions indicating that there is no more data to be read.
The file doesn't actually contain an EOF. EOF isn't a character of sorts - remember a byte can be between 0 and 255, so it wouldn't make sense if a file could contain a -1. The EOF is a signal from the operating system that you're using, which indicates the end of the file has been reached. Notice how getc() returns an int - that is so it can return that -1 to tell you the stream has reached the end of the file.
The EOF signal is treated the same for binary and text files - the actual definition of binary and text stream varies between the OSes (for example on *nix binary and text mode are the same thing.) Either way, as stated above, it is not part of the file itself. The OS passes it to getc() to tell the program that the end of the stream has been reached.
From From the GNU C library:
This macro is an integer value that is returned by a number of narrow stream functions to indicate an end-of-file condition, or some other error situation. With the GNU C Library, EOF is -1. In other libraries, its value may be some other negative number.
EOF is not a character. In this context, it's -1, which, technically, isn't a character (if you wanted to be extremely precise, it could be argued that it could be a character, but that's irrelevant in this discussion). EOF, just to be clear is "End of File". While you're reading a file, you need to know when to stop, otherwise a number of things could happen depending on the environment if you try to read past the end of the file.
So, a macro was devised to signal that End of File has been reached in the course of reading a file, which is EOF. For getc this works because it returns an int rather than a char, so there's extra room to return something other than a char to signal EOF. Other I/O calls may signal EOF differently, such as by throwing an exception.
As a point of interest, in DOS (and maybe still on Windows?) an actual, physical character ^Z was placed at the end of a file to signal its end. So, on DOS, there actually was an EOF character. Unix never had such a thing.
Well it is pretty much possible to find the EOF of a binary file if you study it's structure.
No, you don't need the OS to know the EOF of an executable EOF.
Almost every type of executable has a Page Zero which describes the basic information that the OS might need while loading the code into the memory and is stored as the first page of that executable.
Let's take the example of an MZ executable.
https://wiki.osdev.org/MZ
Here at offset 2, we have the total number of complete/partial pages and right after that at offset 4 we have the number of bytes in the last page. This information is generally used by the OS to safely load the code into the memory, but you can use it to calculate the EOF of your binary file.
Algorithm:
1. Start
2. Parse the parameter and instantiate the file pointer as per your requirement.
3. Load the first page (zero) in a (char) buffer of default size of page zero and print it.
4. Get the value at *((short int*)(&buffer+2)) and store it in a loop variable called (short int) i.
5. Get the value at *((short int*)(&buffer+4)) and store it in a variable called (short int) l.
6. i--
7. Load and print (or do whatever you wanted to do) 'size of page' characters into a buffer until i equals zero.
8. Once the loop has finished executing just load `l` bytes into that buffer and again perform whatever you wanted to
9. Stop
If you're designing your own binary file format then consider adding some sort of meta data at the start of that file or a special character or word that denotes the end of that file.
And there's a good amount of probability that the OS loads the size of the file from here with the help of simple maths and by analyzing the meta-data even though it might seem that the OS has stored it somewhere along with other information it's expected to store (Abstraction to reduce redundancy).

Text files edit C

I have a program which takes data(int,floats and strings) given by the user and writes it in a text file.Now I have to update a part of that written data.
For example:
At line 4 in file I want to change the first 2 words (there's an int and a float). How can I do that?
With the information I found out, fseek() and fputs() can be used but I don't know exactly how to get to a specific line.
(Explained code will be appreciated as I'm a starter in C)
You can't "insert" characters in a file. You will have to create program, which will read whole file, then copy part before insert to a new file, your edition, rest of file.
You really need to read all the file, and ignore what is not needed.
fseek is not really useful: it positions the file at some byte offset (relative to the start or the end of the file) and don't know about line boundaries.
Actually, lines inside a file are an ill defined concept. Often a line is a sequence of bytes (different from the newline character) ended by a newline ('\n'). Some operating systems (Windows, MacOSX) read in a special manner text files (e.g. the real file contains \r\n to end each line, but the C library gives you the illusion that you have read \n).
In practice, you probably want to use line input routines notably getline (or perhaps fgets).
if you use getline you should care about free-ing the line buffer.
If your textual file has a very regular structure, you might fscanf the data (ignoring what you need to skip) without caring about line boundaries.
If you wanted to absolutely use fseek (which is a mistake), you would have to read the file twice: a first time to remember where each line starts (or ends) and a second time to fseek to the line start. Still, that does not work for updates, because you cannot insert bytes in the middle of a file.
And in practice, the most costly operation is the actual disk read. Buffering (partly done by the kernel and <stdio.h> functions, and partly by you when you deal with lines) is negligible.
Of course you cannot change in place some line in a file. If you need to do that, process the file for input, produce some output file (containing the modified input) and rename that when finished.
BTW, you might perhaps be interested in indexed files like GDBM etc... or even in databases like SqlLite, MariaDb, mongodb etc.... and you might be interested in standard textual serialization formats like JSON or YAML (both have many libraries, even for C, to deal with them).
fseek() is used for random-access files where each record of data has the same size. Typically the data is binary, not text.
To solve your particular issue, you will need to read one line at a time to find the line you want to change. A simple solution to make the change is to write these lines to a temporary file, write the changes to the same temporary file, then skip the parts from the original file that you want to change and copy the reset to the temporary file. Finally, close the original file, copy the temporary file to it, and delete the temporary file.
With that said, I suggest that you learn more about random-access files. These are very useful when storing records all of the same size. If you have control over creating the orignal file, these might be better for your current purpose.

C File Input/Output for Unknown File Types: File Copying

having some issues with a networking assignment. End goal is to have a C program that grabs a file from a given URL via HTTP and writes it to a given filename. I've got it working fine for most text files, but I'm running into some issues, which I suspect all come from the same root cause.
Here's a quick version of the code I'm using to transfer the data from the network file descriptor to the output file descriptor:
unsigned long content_length; // extracted from HTTP header
unsigned long successfully_read = 0;
while(successfully_read != content_length)
{
char buffer[2048];
int extracted = read(connection,buffer,2048);
fprintf(output_file,buffer);
successfully_read += extracted;
}
As I said, this works fine for most text files (though the % symbol confuses fprintf, so it would be nice to have a way to deal with that). The problem is that it just hangs forever when I try to get non-text files (a .png is the basic test file I'm working with, but the program needs to be able to handle anything).
I've done some debugging and I know I'm not going over content_length, getting errors during read, or hitting some network bottleneck. I looked around online but all the C file i/o code I can find for binary files seems to be based on the idea that you know how the data inside the file is structured. I don't know how it's structured, and I don't really care; I just want to copy the contents of one file descriptor into another.
Can anyone point me towards some built-in file i/o functions that I can bludgeon into use for that purpose?
Edit: Alternately, is there a standard field in the HTTP header that would tell me how to handle whatever file I'm working with?
You are using the wrong tool for the job. fprintf takes a format string and extra arguments, like this:
fprintf(output_file, "hello %s, today is the %d", cstring, dayoftheweek);
If you pass the second argument from an unknown source (like the web, which you are doing) you can accidentally have %s or %d or other format specifiers in the string. Then fprintf will try to read more arguments than it was passed, and cause undefined behaviour.
Use fwrite for this:
fwrite(buffer, 1, extracted, output_file);
A couple things with your code:
For fprintf - you are using the data as the second argument, when in fact it should be the format, and the data should be the third argument. This is why you are getting problems with the % character, and why it is struggling when presented with binary data, because it is expecting a format string.
You need to use a different function, such as fwrite, to output the file.
As a side note this is a bit of a security problem - if you fetch a specially crafted file from the server it is possible to expose random areas of your memory.
In addition to Seth's answer: unless you are using a third-party library for handling all the HTTP stuff, you need to deal with the Transfer-Encoding header and the possible compression, or at least detect them and throw an error if you don't know how to handle that case.
In general, it may (or may not) be a good idea to parse the HTTP response headers, and only if they contain exclusively stuff that you understand should you continue to interpret the data that follows the header.
I bet your program is hanging because it's expecting X bytes but receiving Y instead, with X < Y (most likely, sans compression - but PNG don't compress well with gzip). You'll get chunks [*] of data, with one of the chunks most likely spanning content_length so your condition while(successfully_read != content_length) is always true.
You could try running your program under strace or whatever its equivalent is for your OS, if you want to see how your program continues trying to read data it will never get (because you've likely made an HTTP/1.1 request that holds the connection open, and you haven't made a second request) or has ended (if the server closes the connection, your (repeated) calls to read(2) will just return 0, which leaves your (still true) loop condition unchanged.
If you are sending your program's output to stdout, you may find that it produces no output - this can happen if the resource you are retrieving contains no newline or other flush-forcing control characters. Other stdio buffering regimes may apply when output goes to a file. (For example, the file will remain empty until the stdio buffers have accumulates at least 4096 bytes.)
[*] Then there's also Transfer-Encoding: chunked, as #roland-illig alludes to, which will ruin the exact equivalence between content_length (presumably derived from the eponymous HTTP header) and the actual number of bytes transferred over the socket.
You are opening the file as a text file. Doing so means that the program will add \r\n characters at the end of every write() call. Try opening the file as binary, and those errors in size shall go away.

C file reading incorrect number of chars

I have stumbled across a problem where I am attempting to read in a file, which is, according to windows, '87.1 kb' in size, and using the ftell method in program, returns '89282', effectively confirming what windows is saying.
So why is every method to read chars from the file only returning 173 or 174 characters?
The file is a .GIF file renamed to .txt (and I am trying to build a program that can load the data fully as I am working on a program to download online images and need to run comparisons on them).
So far I have tried:
fgetc - This returns 173/174 chars.
fread - Same as above, this is with a string with 1024 or more spaces available.
fgets - Doesn't work (as it doesn't return how many characters it has read - characters which include nulls).
setvbuf - Disabling this with _IONBF, or even supplying a buffer of 1024 or more only means 173/174 is still returned.
fflush - This produced a 'result', although a negative one - it returned '2' chars instead of '173'.
I am utterly stumped as to why it isn't reading anything more than 173/174 chars. Is there something I need to compensate for or expect at the lower level? Some buffer I need to expand or some weird character I need to look out for?
Here's one thing to look at. Have a look at the file in a hex viewer and see if there's a CTRL-Z somewhere around that 173/174 offset.
Then check to see if you're opening it with the "r" mode.
If so, it may be that the Windows translation between text and binary is stopping your reading there because CTRL-Z is an EOF marker in text mode. If so, you can probably fix this with "rb" mode on the fopen.
Failing that, you need to post the smallest code segment that exhibits the problem behaviour. It may be obvious to some of us here but only usually if we can see the code :-)

Resources