Why ReadFile function writes garbage to buffer? - c

Let's say I decided to open an existing file with the CreateFile function. The content of the file is Hello world. There is also a buffer (char array with size 11 and filled with zero bytes) that should contain the contents of the file.
And when I try to read the file with the ReadFile function, certain garbage is written to the buffer. The debugger (I've tried GDB and LLDB) says that the contents of the buffer after reading is \377?H\000e\000l\000l\000o\000\000w\000o\000r\000l\000d\000\r\000\n\000\r\000 \n, '\000', and in a human-readable form, it looks like this ■ H.
I've tried not filling the buffer with zeros. I tried to write (with WriteFile) to a file first, then read. I also tried to change the value of how many bytes to read with ReadFile. But it still doesn't change anything.
Also, GetLastError returns ERROR_SUCCESS.
Code:
#define WIN32_LEAN_AND_MEAN
#include <Windows.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void) {
HANDLE file = CreateFile("./test_file.txt", GENERIC_READ, FILE_SHARE_READ,
NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
if (file == INVALID_HANDLE_VALUE) {
puts("Failed to open");
return EXIT_FAILURE;
}
size_t length = strlen("Hello world"); /* 11 */
char buffer[12];
DWORD count = 0; /* Is always 11 (length of "Hello world") after reading */
memset(buffer, '\0', length + 1);
if (!ReadFile(file, buffer, (DWORD) length, &count, NULL)) {
puts("Failed to read or EOF reached.");
CloseHandle(file);
return EXIT_FAILURE;
}
printf("buffer: '%s'\n", buffer);
printf("count: %lu\n", count);
CloseHandle(file);
return EXIT_SUCCESS;
}
In the console, the output of the program looks like this:
buffer: ' ■ H'
count: 11

The text file itself is not written in a 7bit ASCII or 8bit UTF-8 byte encoding, like you are expecting. It is actually written in a UTF-16 byte encoding, with a BOM at the front of the file (bytes 0xFF 0xFE for UTF-16LE). You are simply reading the file's raw bytes and displaying them as-is without any regard to their encoding.

Related

How to read file and store it into wchar_t in C?

I am trying to get info from a .txt file called a.txt with the ReadFile() function, which is provided by <windows.h>, and store it in a wchar_t[] variable. Here is my code:
#include <stdio.h>
#include <windows.h>
#include <io.h>
#include <fcntl.h>
int main() {
_setmode(_fileno(stdout), _O_U16TEXT);
_setmode(_fileno(stdin), _O_U16TEXT);
_setmode(_fileno(stderr), _O_U16TEXT);
HANDLE fh = CreateFileW(
L"a.txt",
GENERIC_READ,
0,
NULL,
OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL,
NULL);
LARGE_INTEGER size;
GetFileSizeEx(fh, &size);
DWORD sizeF = size.QuadPart;
wchar_t *readBuffer = (wchar_t *)malloc(sizeF * sizeof(wchar_t));
DWORD bytesRead;
if (ReadFile(fh, readBuffer, sizeF, &bytesRead, NULL)) {
readBuffer[sizeF] = L'\0';
}
wprintf(L"%s\n", readBuffer);
CloseHandle(fh);
}
But the output I get is not what I expected. It is question marks in squares:
What is wrong with my code?
There is an obvious mistake in the allocation size: you should allocate space for sizeF + 1 elements so you can set a null terminator at readBuffer[sizeF].
Testing for allocation error seems safer, especially since the allocation size may be very large.
There is another problem here: wprintf(L"%s\n", readBuffer); You should use %ls to output an array of wchar_t, %s expects an array of char containing multibyte representations.
It is also unclear why you call _setmode(_fileno(stdout), _O_U16TEXT); on all 3 standard stream handles.
I am not sure about the precise semantics of the proprietary APIs you use... Using standard functions seems both simpler and more portable.

Not getting expected debug print from program that opens and writes a file

I am learning about file descriptors by using the open, write and close functions. What I expect is a printf statement outputting the file descriptor after the open function, and another printf statement outputting a confirmation of the text being written. However, I get the following result:
.\File.exe "this is a test"
[DEBUG] buffer # 0x00b815c8: 'this is a test'
[DEBUG] datafile # 0x00b81638: 'C:\Users\____\Documents\Notes'
With a blank space where the further debugging output should be. The code block for the section is:
strcpy(buffer, argv[1]); //copy first vector into the buffer
printf("[DEBUG] buffer \t # 0x%08x: \'%s\'\n", buffer, buffer); //debug buffer
printf("[DEBUG] datafile # 0x%08x: \'%s\'\n", datafile, datafile); //debug datafile
strncat(buffer, "\n", 1); //adds a newline
fd = open(datafile, O_WRONLY|O_CREAT|O_APPEND, S_IRUSR|S_IWUSR); //opens file
if(fd == -1)
{
fatal("in main() while opening file");
}
printf("[DEBUG] file descriptor is %d\n", fd);
if(write(fd, buffer, strlen(buffer)) == -1) //wrting data
{
fatal("in main() while writing buffer to file");
}
if(close(fd) == -1) //closing file
{
fatal("in main() while closing file");
}
printf("Note has been saved.");
I basically copied the code word for word from the book I'm studying, so how could it not work?
The problem is that the printf function does not display anything, and the file descriptor is not returned.
Here is the full code:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/stat.h>
void usage(char *pnt, char *opnt) //usage function
{
printf("Usage: %s <data to add to \"%s\">", pnt, opnt);
exit(0);
}
void fatal(char*); //fatal function for errors
void *ec_malloc(unsigned int); //wrapper for malloc error checking
int main(int argc, char *argv[]) //initiates argumemt vector/count variables
{
int fd; //file descriptor
char *buffer, *datafile;
buffer = (char*) ec_malloc(100); //buffer given 100 bytes of ec memory
datafile = (char*) ec_malloc(20); //datafile given 20 bytes of ec memory
strcpy(datafile, "C:\\Users\\____\\Documents\\Notes");
if(argc < 2) //if argument count is less than 2 i.e. no arguments provided
{
usage(argv[0], datafile); //print usage message from usage function
}
strcpy(buffer, argv[1]); //copy first vector into the buffer
printf("[DEBUG] buffer \t # %p: \'%s\'\n", buffer, buffer); //debug buffer
printf("[DEBUG] datafile # %p: \'%s\'\n", datafile, datafile); //debug datafile
strncat(buffer, "\n", 1); //adds a newline
fd = open(datafile, O_WRONLY|O_CREAT|O_APPEND, S_IRUSR|S_IWUSR); //opens file
if(fd == -1)
{
fatal("in main() while opening file");
}
printf("[DEBUG] file descriptor is %d\n", fd);
if(write(fd, buffer, strlen(buffer)) == -1) //wrting data
{
fatal("in main() while writing buffer to file");
}
if(close(fd) == -1) //closing file
{
fatal("in main() while closing file");
}
printf("Note has been saved.");
free(buffer);
free(datafile);
}
void fatal(char *message)
{
char error_message[100];
strcpy(error_message, "[!!] Fatal Error ");
strncat(error_message, message, 83);
perror(error_message);
exit(-1);
}
void *ec_malloc(unsigned int size)
{
void *ptr;
ptr = malloc(size);
if(ptr == NULL)
{
fatal("in ec_malloc() on memory allocation");
return ptr;
}
}
EDIT: the issue has been fixed. The reason for this bug was that the memory allocated within the ec_malloc function was not sufficient, which meant that the text could not be saved. I changed the byte value to 100 and the code now works.
I am not sure which compiler you are using, but the one I tried the code with (GCC) says:
main.c:34:5: warning: ‘strncat’ specified bound 1 equals source length [-Wstringop-overflow=]
34 | strncat(buffer, "\n", 1); //adds a newline
| ^~~~~~~~~~~~~~~~~~~~~~~~
In other words, the call to strncat in your code is highly suspicious. You are trying to append a single line-break character, which has a length of 1, which you pass as the third argument. But strncat expects the third parameter to be the remaining space in buffer, not the length of the string to append.
A correct call would look a bit like this:
size_t bufferLength = 100;
char* buffer = malloc(bufferLength);
strncat(buffer, "\n", (bufferLength - strlen(buffer) - strlen("\n") - 1));
In this case, however, you are saved, because strncat guarantees that the resulting buffer is NUL-terminated, meaning that it always writes one additional byte beyond the specified size.
All of this is complicated, and a common source of bugs. It's easier to simply use snprintf to build up the entire string at one go:
size_t bufferLength = 100;
char* buffer = malloc(bufferLength);
snprintf(buffer, bufferLength, "%s\n", argv[1]);
Another bug in your code is the ec_malloc function:
void *ec_malloc(unsigned int size)
{
void *ptr;
ptr = malloc(size);
if(ptr == NULL)
{
fatal("in ec_malloc() on memory allocation");
return ptr;
}
}
See if you can spot it: what happens if ptr is not NULL? Well, nothing! The function doesn't return a value in this case; execution just falls off the end.
If you're using GCC (and possibly other compilers) on x86, this code will appear to work fine, because the result of the malloc function will remain in the proper CPU register to serve as the result of the ec_malloc function. But the fact that it just happens to work by the magic of circumstance does not make it correct code. It is subject to stop working at any time, and it should be fixed. The function deserves a return value!
Unfortunately, the GCC compiler is unable to detect this mistake, but Clang does:
<source>:64:1: warning: non-void function does not return a value in all control paths [-Wreturn-type]
}
^
The major bug in your code is a buffer overrun. At the top, you allocate only 20 bytes for the datafile buffer:
datafile = (char*) ec_malloc(20); //datafile given 20 bytes of ec memory
which means it can only store 20 characters. However, you proceed to write in more than 20 characters:
strcpy(datafile, "C:\\Users\\____\\Documents\\Notes");
That string literal is 33 characters, not including the terminating NUL! You need a buffer with at least 50 characters of space to hold all of this. With a buffer that is too small, the strcpy function call creates a classic "buffer overrun" error, which is undefined behavior that manifests itself as corrupting your program's memory area and thus premature termination.
Again, when I tried compiling and running the code, GCC reported:
malloc(): corrupted top size
because it detected that you had overrun the dynamically-allocated memory (returned by malloc). It was able to do this because, under the hood, malloc stores sentinel information after the allocated memory block, and your overwriting of the allocated space had written over its sentinel information.
The whole code is a bit suspect; it was not written by someone who knows C very well, nor was it debugged or reviewed by anyone else.
There is no real need to use dynamic memory allocation here in order to allocate fixed-size buffers. If you're going to use dynamic memory allocation, then allocate the actual amount of space that you need. Otherwise, if you're allocating fixed-size buffers, then just allocate on the stack.
Don't bother with complex string-manipulation functions when you can get away with simply using snprintf.
And as a bonus tip: when debugging problems, try to reduce the code down as small as you can get it. None of the file I/O stuff was related to this problem, so when I was analyzing this code, I replaced that whole section with:
printf("[DEBUG] file descriptor is %d\n", 42);
Once the rest of the code is working, I can go back and add the real code back to that section, and then test it. (Which I didn't do, because I don't have a file system handy to test this.)

Why does fgetc put the file offset at the end of the file?

I have a simple test program that uses fgetc() to read a character from a file stream and lseek() to read a file offset. It looks like this:
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main() {
char buf[] = "hello world";
FILE *f;
int fd;
fd = open("test.txt", O_RDWR | O_CREAT | O_TRUNC, 0600);
write(fd, buf, sizeof(buf));
lseek(fd, 0, SEEK_SET);
f = fdopen(fd, "r");
printf("%c\n", fgetc(f));
printf("%d\n", lseek(fd, 0, SEEK_CUR));
}
When I run it, I get the following output:
h
12
The return value of fgetc(f), h, makes sense to me. But why is it repositioning the file offset to be at the end of the file? Why doesn't lseek(fd, 0, SEEK_CUR) give me 1?
If I repeat the the first print statement, it works as expected and prints an e then an l etc.
I don't see any mention of this weird behavior in the man page.
stdio functions like fgetc are buffered. They will read() a large block into a buffer and then return characters from the buffer on successive calls.
Since the default buffer size is more than 12 (usually many KB), the first time you fgetc(), it tries to fill its buffer which means reading the entire file. Thus lseek returns a position at the end of the file.
If you want to get a file position that takes into account what's still in the buffer, use ftell() instead.

fgets statement reads first line and not sure how to modify because I have to return a pointer [duplicate]

I need to copy the contents of a text file to a dynamically-allocated character array.
My problem is getting the size of the contents of the file; Google reveals that I need to use fseek and ftell, but for that the file apparently needs to be opened in binary mode, and that gives only garbage.
EDIT: I tried opening in text mode, but I get weird numbers. Here's the code (I've omitted simple error checking for clarity):
long f_size;
char* code;
size_t code_s, result;
FILE* fp = fopen(argv[0], "r");
fseek(fp, 0, SEEK_END);
f_size = ftell(fp); /* This returns 29696, but file is 85 bytes */
fseek(fp, 0, SEEK_SET);
code_s = sizeof(char) * f_size;
code = malloc(code_s);
result = fread(code, 1, f_size, fp); /* This returns 1045, it should be the same as f_size */
The root of the problem is here:
FILE* fp = fopen(argv[0], "r");
argv[0] is your executable program, NOT the parameter. It certainly won't be a text file. Try argv[1], and see what happens then.
You cannot determine the size of a file in characters without reading the data, unless you're using a fixed-width encoding.
For example, a file in UTF-8 which is 8 bytes long could be anything from 2 to 8 characters in length.
That's not a limitation of the file APIs, it's a natural limitation of there not being a direct mapping from "size of binary data" to "number of characters."
If you have a fixed-width encoding then you can just divide the size of the file in bytes by the number of bytes per character. ASCII is the most obvious example of this, but if your file is encoded in UTF-16 and you happen to be on a system which treats UTF-16 code points as the "native" internal character type (which includes Java, .NET and Windows) then you can predict the number of "characters" to allocate as if UTF-16 were fixed width. (UTF-16 is variable width due to Unicode characters above U+FFFF being encoded in multiple code points, but a lot of the time developers ignore this.)
I'm pretty sure argv[0] won't be an text file.
Give this a try (haven't compiled this, but I've done this a bazillion times, so I'm pretty sure it's at least close):
char* readFile(char* filename)
{
FILE* file = fopen(filename,"r");
if(file == NULL)
{
return NULL;
}
fseek(file, 0, SEEK_END);
long int size = ftell(file);
rewind(file);
char* content = calloc(size + 1, 1);
fread(content,1,size,file);
return content;
}
If you're developing for Linux (or other Unix-like operating systems), you can retrieve the file-size with stat before opening the file:
#include <stdio.h>
#include <sys/stat.h>
int main() {
struct stat file_stat;
if(stat("main.c", &file_stat) != 0) {
perror("could not stat");
return (1);
}
printf("%d\n", (int) file_stat.st_size);
return (0);
}
EDIT: As I see the code, I have to get into the line with the other posters:
The array that takes the arguments from the program-call is constructed this way:
[0] name of the program itself
[1] first argument given
[2] second argument given
[n] n-th argument given
You should also check argc before trying to use a field other than '0' of the argv-array:
if (argc < 2) {
printf ("Usage: %s arg1", argv[0]);
return (1);
}
argv[0] is the path to the executable and thus argv[1] will be the first user submitted input. Try to alter and add some simple error-checking, such as checking if fp == 0 and we might be ble to help you further.
You can open the file, put the cursor at the end of the file, store the offset, and go back to the top of the file, and make the difference.
You can use fseek for text files as well.
fseek to end of file
ftell the offset
fseek back to the begining
and you have size of the file
Kind of hard with no sample code, but fstat (or stat) will tell you how big the file is. You allocate the memory required, and slurp the file in.
Another approach is to read the file a piece at a time and extend your dynamic buffer as needed:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define PAGESIZE 128
int main(int argc, char **argv)
{
char *buf = NULL, *tmp = NULL;
size_t bufSiz = 0;
char inputBuf[PAGESIZE];
FILE *in;
if (argc < 2)
{
printf("Usage: %s filename\n", argv[0]);
return 0;
}
in = fopen(argv[1], "r");
if (in)
{
/**
* Read a page at a time until reaching the end of the file
*/
while (fgets(inputBuf, sizeof inputBuf, in) != NULL)
{
/**
* Extend the dynamic buffer by the length of the string
* in the input buffer
*/
tmp = realloc(buf, bufSiz + strlen(inputBuf) + 1);
if (tmp)
{
/**
* Add to the contents of the dynamic buffer
*/
buf = tmp;
buf[bufSiz] = 0;
strcat(buf, inputBuf);
bufSiz += strlen(inputBuf) + 1;
}
else
{
printf("Unable to extend dynamic buffer: releasing allocated memory\n");
free(buf);
buf = NULL;
break;
}
}
if (feof(in))
printf("Reached the end of input file %s\n", argv[1]);
else if (ferror(in))
printf("Error while reading input file %s\n", argv[1]);
if (buf)
{
printf("File contents:\n%s\n", buf);
printf("Read %lu characters from %s\n",
(unsigned long) strlen(buf), argv[1]);
}
free(buf);
fclose(in);
}
else
{
printf("Unable to open input file %s\n", argv[1]);
}
return 0;
}
There are drawbacks with this approach; for one thing, if there isn't enough memory to hold the file's contents, you won't know it immediately. Also, realloc() is relatively expensive to call, so you don't want to make your page sizes too small.
However, this avoids having to use fstat() or fseek()/ftell() to figure out how big the file is beforehand.

read system call doesn't detect end of file

I'm trying to create a function that reads an entire file using a specific read size that can change anytime, but the read system call doesn't store the characters properly in the buffer, so far I'm only trying to print until the end of file like this:
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
#include <fcntl.h>
# define READ_SIZE (42)
int main(int argc, char **argv)
{
int fd;
int rd;
char *buffer;
buffer = malloc(READ_SIZE);
fd = open(argv[1], O_RDONLY);
while ((rd = read(fd, buffer, READ_SIZE)) > 0)
{
printf("%s", buffer);
}
return (0);
}
This is the file that I'm trying to read:
test1234
test123
test1
test2
test3
test4
test
This is the output of my program:
test123
test12
test1
test2
test3
test4
testest123
test12
test1
test2
test3
test4
tes
I can only use malloc, and read to handle this, open is only for testing, and I don't understand why it does this, usually read returns the number of bytes read in that file, and 0 if it reaches the end of file, so it's a bit weird to see this.
The printing of the character array lacks a null character. This is UB with "%s".
printf("%s", buffer); // bad
To limit printing a character array lacking a null character, use a precision modifier. This will print the character array up to that many characters or a null character - which ever is first.
// printf("%s", buffer);
printf("%.*s", rd, buffer);
Debug tip: Print text with sentinels to clearly indicate the result of each print.
printf("<%.*s>\n", rd, buffer);
Besides the very elegant solution provided by chux's answer you could as well just terminate the buffer (and with this only make it a C-"string") explicitly before printing:
while ((rd = read(fd, buffer, READ_SIZE-1)) > 0) /* read one less, to have a spare
char available for the `0`-terminator. */
{
buffer[rd] = '\0';
printf("'%s'", buffer);
}

Resources