InternetReadFile() corrupting downloads in C - c

I'm able to download text documents (.html, .txt, etc) but I can't download images or exe's. I'm pretty sure that this is because I'm using a char, and those files are binary. I know that in C# I would use a byte. But what data-type would I use in this case?
char buffer[1];
DWORD dwRead;
FILE * pFile;
pFile = fopen(file,"w");
while (InternetReadFile(hRequest, buffer, 1, &dwRead))
{
if(dwRead != 1) break;
fprintf(pFile,"%s",buffer);
}
fclose(pFile);

Your problem is not char, it is using fprintf with %s. char can hold all byte values. When a binary data chunk has a \0 (NULL) character in it, fprintf will stop outputting data at that time.
You want to use fwrite in this case.
In Windows, it is also important to use the b specifier when opening binary files.

Since you are reading one byte at a time into a buffer which is not null terminated (because its size is 1), you need to output one byte at a time with either '%c' as the format string or using putc(buffer[0], pFile). As it stands, you are vulnerable to buffer overflow (as in, bad things may happen!).
If you are on a Windows platform, it would be a good idea to open the file in binary mode; it would do no harm on Unix since there is no difference between binary and text mode.

Related

Using fread() to read a text based file - best practices

Consider this code to read a text based file. This sort of fread() usage was briefly touched upon in the excellent book C Programming: A Modern Approach by K.N. King.
There are other methods of reading text based files, but here I am concerned with fread() only.
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
// Declare file stream pointer.
FILE *fp = fopen("Note.txt", "r");
// fopen() call successful.
if(fp != NULL)
{
// Navigate through to end of the file.
fseek(fp, 0, SEEK_END);
// Calculate the total bytes navigated.
long filesize = ftell(fp);
// Navigate to the beginning of the file so
// it can be read.
rewind(fp);
// Declare array of char with appropriate size.
char content[filesize + 1];
// Set last char of array to contain NULL char.
content[filesize] = '\0';
// Read the file content.
fread(content, filesize, 1, fp);
// Close file stream pointer.
fclose(fp);
// Print file content.
printf("%s\n", content);
}
// fopen() call unsuccessful.
else
{
printf("File could not be read.\n");
}
return 0;
}
There are some problems I have with this method. My opinion is that this is not a safe method of performing fread() since there might be an overflow if we try to read an extremely large string. Is this opinion valid?
To circumvent this issue, we may use a buffer size and keep on reading into a char array of that size. If filesize is less than buffer size, then we simply perform fread() once as described in the above code. Otherwise, We divide the total file size by the buffer size and get a result, whose int portion we will use as the total number of times to iterate a loop where we will invoke fread() each time, appending the read buffer array into a larger string. Now, for the final fread(), which we will perform after the loop, we will have to read exactly (filesize % buffersize) bytes of data into an array of that size and finally append this array into the larger string (Which we would have malloc-ed with filesize + 1 beforehand). I find that if we perform fread() for the last chunk of data using buffersize as its second parameter, then extra garbage data of size (buffersize - chunksize) will be read in and the data might become corrupted. Are my assumptions here correct? Please explain if/ how I have overlooked something.
Also, there is the issue that non-ASCII characters might not have size of 1 byte. In that case I would assume the proper amount is being read, but each byte is being read as a single char, so the text is distorted somehow? How is fread() handling reading of multi-byte chars?
this is not a safe method of performing fread() since there might be an overflow if we try to read an extremely large string. Is this opinion valid?
fread() does not care about strings (null character terminated arrays). It reads data as if it was in multiples of unsigned char*1 with no special concern to the data content if the stream opened in binary mode and perhaps some data processing (e.g. end-of-line, byte-order-mark) in text mode.
Are my assumptions here correct?
Failed assumptions:
Assuming ftell() return value equals the sum of fread() bytes.
The assumption can be false in text mode (as OP opened the file) and fseek() to the end is technical undefined behavior in binary mode.
Assuming not checking the return value of fread() is OK. Use the return value of fread() to know if an error occurred, end-of-file and how many multiples of bytes were read.
Assuming error checking is not required. , ftell(), fread(), fseek() instead of rewind() all deserve error checks. In particular, ftell() readily fails on streams that have no certain end.
Assuming no null characters are read. A text file is not certainly made into one string by reading all and appending a null character. Robust code detects and/or copes with embedded null characters.
Multi-byte: assuming input meets the encoding requirements. Example: robust code detects (and rejects) invalid UTF8 sequences - perhaps after reading the entire file.
Extreme: Assuming a file length <= LONG_MAX, the max value returned from ftell(). Files may be larger.
but each byte is being read as a single char, so the text is distorted somehow? How is fread() handling reading of multi-byte chars?
fread() does not function on multi-byte boundaries, only multiples of unsigned char. A given fread() may end with a portion of a multi-byte and the next fread() will continue from mid-multi-byte.
Instead of of 2 pass approach consider 1 single pass
// Pseudo code
total_read = 0
Allocate buffer, say 4096
forever
if buffer full
double buffer_size (`realloc()`)
u = unused portion of buffer
fread u bytes into unused portion of buffer
total_read += number_just_read
if (number_just_read < u)
quit loop
Resize buffer total_read (+ 1 if appending a '\0')
Alternatively consider the need to read the entire file in before processing the data. I do not know the higher level goal, but often processing data as it arrives makes for less resource impact and faster throughput.
Advanced
Text files may be simple ASCII only, 8-bit code page defined, one of various UTF encodings (byte-order-mark, etc. The last line may or may not end with a '\n'. Robust text processing beyond simple ASCII is non-trivial.
ASCII and UTF-8 are the most common. IMO, handle 1 or both of those and error out on anything that does not meet their requirements.
*1 fread() reads in multiple of bytes as per the 3rd argument, which is 1 in OP's case.
// v --- multiple of 1 byte
fread(content, filesize, 1, fp);

C: unsigned short stored into buffer after fread from a binary file doesn't match original pattern

I have a binary file filled with 2-byte words following this pattern(in HEX): 0XY0. this is the part of the code where I execute fread and fopen.
unsigned short buffer[bufferSize];
FILE *ptr; //
ptr = fopen(fileIn,"rb"); //
if(ptr == NULL)
{
fprintf(stderr,"Unable to read from file %s because of %s",fileIn,strerror(errno));
exit(20);
}
size_t readed = fread(buffer,(size_t)sizeof(unsigned short),bufferSize,ptr);
if(readed!=bufferSize)
{
printf("readed and buffersize are not the same\n");
exit(100);
}
//---------------------------
If I look at any content of the buffer, for example buffer[0], instead of being a short with pattern 0XY0, it is a short with pattern Y00X
Where is my error? is it something regarding endianess?
of course i checked every element inside buffer. the program execute with no errors.
EDIT: If i read from file with size char instead of short, the content of buffer(obviously changed to char buffer[bufferSize*2];) match the pattern OXYO. So I have that (for example) buffer[0] is 0X and buffer[1] is Y0
Your problem seems to be exactly typical of an endianness mismatch between the program that stored the data into the file and the one that reads it. Remember that mobile phone processors tend to use big-endian representations and laptop little-endian.
Another potential explanation is your file might have been written in text mode by a windows based program that converted the 0x0A bytes into pairs of 0x0D / 0x0A, causing the contents to shift and cause a similar pattern as what you observe.
Instead of reading the file with fread, you should read it byte by byte and compute the values according the endianness specified for the file format.

fseek/ftell and stat.st_size doesn't return actual file size in text mode

platform: windows
o/s: XP sp3
compiler: gcc v4.8.1
text editor: notepad
encoding: ansi
question: how can i retreive the actual file size in text mode so that i can set a buffer size exactly?
char *filename = "functions.txt";
FILE *source = fopen(filename,"r");
struct stat properites;
stat(filename,&properties);
long size_stat = properties.st_size;
fseek(source,0,SEEK_END);
long size_ftell = ftell(source);
fseek(source,0,SEEK_SET);
char *pchar_source = malloc(sizeof(char)*size_stat);
long size_read = fread(pchar_source,sizeof(char),size_stat,source);
functions.txt
tokenize(String string, Character delimiter) String[]
{
}
output
file size-stat [70]
file size-ftell [70]
file size-fread [67]
for small files, the difference is negligible, however, for file larger files, this means unneccessay memory allocation. any suggestions?
one possible solution:
long fileSize = 0;
while (getc(source) != EOF)
{
fileSize++;
}
however, this is very wasteful and time consuming for large files.
ftell gives you the correct size in bytes. As others noted, it is because you have three line endings encoded as \r\n. When you open in text mode on Windows, they get converted to \n, thus you read three chars less.
There are two options I see:
Use ftell as an estimate for the size of the buffer, but then, after the fread, use size_read in the rest of your code for the buffer size. You will just waste number-of-lines bytes of memory, which is not a big deal.
Open the file in binary mode rb. You will get a size of 70, but also fread will return 70 bytes. Then write your code with the understanding that line endings might be \r, \n, or \r\n.
From the above two I really recommend the 2nd option: it gives a more robust and portable program, and the notion of binary mode is less confusing than the platform dependent text-mode.
If the "size" of the file is to be given in units that depend on the file's contents, then accurately determining that size requires scanning the entire file.
That is precisely the situation for any file opened in text mode on Windows (because physical "\r\n" is treated as a single logical unit). It is also the case if the file content is encoded in some way, and you want the count of decoded units. That's not as unlikely as it may sound, as it arises quite frequently with character encodings, such as (21-bit) Unicode characters encoded as a UTF-8 byte stream.
As far as creating a buffer to hold the whole file content,
If you have to worry about large files, then do everything possible to avoid creating such a buffer in the first place. Ideally, you would process the file in a streaming mode, so that you don't have to keep much of it in memory at any one time.
If you must create such a buffer, then consider a buffer consisting of a linked list of smallish blocks (say 4 - 32k), so that you can extend the buffer as needed without realloc() (e.g. as needed while reading the file).
answer:
no you cant.
the actual file size, and not an "estimate", is only available after a complete read. this is due to conversions (if any) of new lines and encoding types. for those wondering, here is the "proper" way of determining the actual file size.
char *filename = "sample.txt";
FILE *file_source = fopen(filename,"r"); // can be set to either "r" or "rb"
// use stat.st_size if you have the library <sys/stat.h>
struct stat stat_sourceFile;
stat(filename,&stat_sourceFile);
long long_fileSize_stat = stat_sourceFile.st_size; // estimate only
// use fseek,ftell,fseek if you dont have the lib <sys/stat.h>
fseek(file_source,0,SEEK_END);
long long_fileSize_ftell = ftell(file_source); // estimate only
fseek(file_source,0,SEEK_SET);
char *pchar_source = malloc(sizeof(char)*long_fileSize_stat);
long long_ACTUAL_FILE_SIZE = fread(pchar_source,sizeof(char),long_fileSize_stat,file_source);
realloc(pchar_source,long_ACTUAL_FILE_SIZE);
// now when we pass the pointer/array size to ANY function/method, you WONT
// get those funny characters not part of your file at the end of your
// printf statements. also, instead of using long_ACTUAL_FILE_SIZE as
// the bounds for iteration, you could use strlen(pchar_source)
hope this helps others new to c and file buffering.

C, format file for data of HTTP response

I have no experience with fscanf() and very little with functions for FILE. I have code that correctly determines if a client requested an existing file (using stat() and it also ensures it is not a directory). I will omit this part because it is working fine.
My goal is to send a string back to the client with a HTTP header (a string) and the correctly read data, which I would imagine has to become a string at some point to be concatenated with the header for sending back. I know that + is not valid C, but for simplicity I would like to send this: headerString+dataString.
The code below does seem to work for text files but not images. I was hoping that reading each character individually would solve the problem but it does not. When I point a browser (Firefox) at my server looking for an image it tells me "The image (the name of the image) cannot be displayed because it contains errors.".
This is the code that is supposed to read a file into httpData:
int i = 0;
FILE* file;
file = fopen(fullPath, "r");
if (file == NULL) errorMessageExit("Failed to open file");
while(!feof(file)) {
fscanf(file, "%c", &httpData[i]);
i++;
}
fclose(file);
printf("httpData = %s\n", httpData);
Edit: This is what I send:
char* httpResponse = malloc((strlen(httpHeader)+strlen(httpData)+1)*sizeof(char));
strcpy(httpResponse, httpHeader);
strcat(httpResponse, httpData);
printf("HTTP response = %s\n", httpResponse);
The data part produces ???? for the image but correct html for an html file.
Images contain binary data. Any of the 256 distinct 8-bit patterns may appear in the image including, in particular, the null byte, 0x00 or '\0'. On some systems (notably Windows), you need to distinguish between text files and binary files, using the letter b in the standard I/O fopen() call (works fine on Unix as well as Windows). Given that binary data can contain null bytes, you can't use strcpy() et al to copy chunks of data around since the str*() functions stop copying at the first null byte. Therefore, you have to use the mem*() functions which take a start position and a length, or an equivalent.
Applied to your code, printing the binary httpData with %s won't work properly; the %s will stop at the first null byte. Since you have used stat() to verify the existence of the file, you also have a size for the file. Assuming you don't have to deal with dynamically changing files, that means you can allocate httpData to be the correct size. You can also pass the size to the reading code. This also means that the reading code can use fread() and the writing code can use fwrite(), saving on character-by-character I/O.
Thus, we might have a function:
int readHTTPData(const char *filename, size_t size, char *httpData)
{
FILE *fp = fopen(filename, "rb");
size_t n;
if (fp == 0)
return E_FILEOPEN;
n = fread(httpData, size, 1, fp);
fclose(fp);
if (n != 1)
return E_SHORTREAD;
fputs("httpData = ", stdout);
fwrite(httpData, size, 1, stdout);
putchar('\n');
return 0;
}
The function returns 0 on success, and some predefined (negative?) error numbers on failure. Since memory allocation is done before the routine is called, it is pretty simple:
Open the file; report error if that fails.
Read the file in a single operation.
Close the file.
Report error if the read did not get all the data that was expected.
Report on the data that was read (debugging only — and printing binary data to standard output raw is not the best idea in the world, but it parallels what the code in the question does).
Report on success.
In the original code, there is a loop:
int i = 0;
...
while(!feof(file)) {
fscanf(file, "%c", &httpData[i]);
i++;
}
This loop has a lot of problems:
You should not use feof() to test whether there is more data to read. It reports whether an EOF indication has been given, not whether it will be given.
Consequently, when the last character has been read, the feof() reports 'false', but the fscanf() tries to read the next (non-existent) character, adds it to the buffer (probably as a letter such as ÿ, y-umlaut, 0xFF, U+00FF, LATIN SMALL LETTER Y WITH DIAERESIS).
The code makes no check on how many characters have been read, so it has no protection against buffer overflow.
Using fscanf() to read a single character is a lot of overhead compared to getc().
Here's a more nearly correct version of the code, assuming that size is the number of bytes allocated to httpData.
int i = 0;
int c;
while ((c = getc(file)) != EOF && i < size)
httpData[i++] = c;
You could check that you get EOF when you expect it. Note that the fread() code does the size checking inside the fread() function. Also, the way I wrote the arguments, it is an all-or-nothing proposition — either all size bytes are read or everything is treated as missing. If you want byte counts and are willing to tolerate or handle short reads, you can reverse the order of the size arguments. You could also check the return from fwrite() if you wanted to be sure it was all written, but people tend to be less careful about checking that output succeeded. (It is almost always crucial to check that you got the input you expected, though — don't skimp on input checking.)
At some point, for plain text data, you need to think about CRLF vs NL line endings. Text files handle that automatically; binary files do not. If the data to be transferred is image/png or something similar, you probably don't need to worry about this. If you're on Unix and dealing with text/plain, you may have to worry about CRLF line endings (but I'm not an expert on this — I've not done low-level HTTP stuff recently (not in this millennium), so the rules may have changed).

Unexpected output copying file in C

In another question, the accepted answer shows a method for reading the contents of a file into memory.
I have been trying to use this method to read in the content of a text file and then copy it to a new file. When I write the contents of the buffer to the new file, however, there is always some extra garbage at the end of the file. Here is an example of my code:
inputFile = fopen("D:\\input.txt", "r");
outputFile = fopen("D:\\output.txt", "w");
if(inputFile)
{
//Get size of inputFile
fseek(inputFile, 0, SEEK_END);
inputFileLength = ftell(inputFile);
fseek(inputFile, 0, SEEK_SET);
//Allocate memory for inputBuffer
inputBuffer = malloc(inputFileLength);
if(inputBuffer)
{
fread (inputBuffer, 1, inputFileLength, inputFile);
}
fclose(inputFile);
if(inputBuffer)
{
fprintf(outputFile, "%s", inputBuffer);
}
//Cleanup
free(inputBuffer);
fclose(outputFile);
}
The output file always contains an exact copy of the input file, but then has the text "MPUTERNAM2" appended to the end. Can anyone shed some light as to why this might be happening?
You may be happier with
int numBytesRead = 0;
if(inputBuffer)
{
numBytesRead = fread (inputBuffer, 1, inputFileLength, inputFile);
}
fclose(inputFile);
if(inputBuffer)
{
fwrite( inputBuffer, 1, numBytesRead, outputFile );
}
It doesn't need a null-terminated string (and therefore will work properly on binary data containing zeroes)
Because you are writing the buffer as if it were a string. Strings end with a NULL, the file you read does not.
You could NULL terminate your string, but a better solution is to use fwrite() instead of fprintf(). This would also let you copy files that contain NULL characters.
Unless you know the input file will always be small, you might consider reading/writing in a loop so that you can copy files larger than memory.
You haven't allocated enough space for the terminating null character in your buffer (and you also forget to actually set it), so your fprintf is effectively overreading into some other memory. Your buffer is exactly the same size as the file, and is filled with its content, however, fprintf reads the parameter looking for the terminating null, which isn't there, until a couple of characters later where, coincidently, there is one.
EDIT
You're actually mixing two types of io, fread (which is paired with fwrite) and fprintf (which is paired with fscanf). You should probably be doing fwrite with the number of bytes to write; or conversely, use fscanf, which would null-terminate your string (although, this wouldn't allow nulls in your string).
Allocating memory to fit the file is actually quite a bad way of doing it, especially the way it's done here. If the malloc() fails, no data is written to the output file (and it fails silently). In other words, you can't copy files greater than a few gigabytes on a 32-bit platform due to the address space limitations.
It's actually far better to use a smaller memory chunk (allocated or on the stack) and read/write the file in chunks. The reads and writes will be buffered anyway and, as long as you make the chunks relatively large, the overhead of function calls to the C runtime libraries is minimal.
You should always copy files in binary mode as well, it's faster since there's no chance of translation.
Something like:
FILE *fin = fopen ("infile","rb"); // make sure you check these for NULL return
FILE *fout = fopen ("outfile","wb");
char buff[1000000]; // or malloc/check-null if you don't have much stack space.
while ((count = fread (buff, 1, sizeof(buff), fin)) > 0) {
// Check count == -1 and errno here.
fwrite (buff, 1, count, fout); // and check return value.
}
fclose (fout);
fclose (fin);
This is from memory but provides the general idea of how to do it. And you should always have copiuos error checking.
fprintf expects inputBuffer to be null-terminated, which it isn't. So it's reading past the end of inputBuffer and printing whatever's there (into your new file) until it finds a null character.
In this case you could malloc an extra byte and put a null as the last character in inputBuffer.
In addition to what other's have said: You should also open your files in binary-mode - otherwise, you might get unexpected results on Windows (or other non-POSIX systems).
You can use
fwrite (inputBuffer , 1 , inputFileLength , outputFile );
instead of fprintf, to avoid the zero-terminated string problem. It also "matches better" with fread :)
Try using fgets instead, it will add the null for you at the end of the string. Also as was said above you need one more space for the null terminator.
ie
The string "Davy" is represented as the array that contains D,a,v,y,\0 (without the commas). Basically your array needs to be at least sizeofstring + 1 to hold the null terminator. Also fread will not automatically add the terminator, which is why even if your file is way shorter than the maximum length you get garbage..
Note an alternative method for being lazy is just to use calloc which sets the string to 0. But still you should only fread inputFileLength-1 characters at most.

Resources