Buffer size in file I/O - c

I'm trying to write a small program to find the buffer size of an open file stream. After searching around a little bit, I found the __fbufsize() function. This is the code I wrote:
#include <stdio.h>
#include <stdio_ext.h>
void main() {
FILE *f;
int bufsize;
f = fopen("test.txt","wb");
if (f == NULL) {
perror("fopen failed\n");
return;
}
bufsize = __fbufsize(f);
printf("The buffer size is %d\n",bufsize);
return;
}
I get the buffer size as zero. I'm a bit confused as to why this is happening. Shouldn't the stream be buffered by default? I get a non-zero value if I use setvbuf with _IOFBF before calling fbufsize.

Note that the correct return type for main() is int, not void.
This code compiles on Linux (Ubuntu 14.04 derivative tested):
#include <stdio.h>
#include <stdio_ext.h>
int main(void)
{
FILE *f;
size_t bufsize;
f = fopen("test.txt", "wb");
if (f == NULL)
{
perror("fopen failed\n");
return -1;
}
bufsize = __fbufsize(f);
printf("The buffer size is %zd\n", bufsize);
putc('\n', f);
bufsize = __fbufsize(f);
printf("The buffer size is %zd\n", bufsize);
fclose(f);
return 0;
}
When run, it produces:
The buffer size is 0
The buffer size is 4096
As suggested in the comments, until you use the file stream, the buffer size is not set. Until then, you could change the size with setvbuf(), so the library doesn't set the buffer size until you try to use it.
The macro BUFSIZ defined in <stdio.h> is the default buffer size. There's no standard way to find the buffer size set by setvbuf(). You need to identify the platform you're working on to allow useful commentary on __fbufsize() as a function (though it seems to be a GNU libc extension: __fbufsize()).
There are numerous small improvements that should be made in the program, but they're not immediately germane.

__fbufsize man page says:
The __fbufsize() function returns the size of the buffer currently used by the given stream.
so I think this is buffer size used by the stream.

Related

fgets statement reads first line and not sure how to modify because I have to return a pointer [duplicate]

I need to copy the contents of a text file to a dynamically-allocated character array.
My problem is getting the size of the contents of the file; Google reveals that I need to use fseek and ftell, but for that the file apparently needs to be opened in binary mode, and that gives only garbage.
EDIT: I tried opening in text mode, but I get weird numbers. Here's the code (I've omitted simple error checking for clarity):
long f_size;
char* code;
size_t code_s, result;
FILE* fp = fopen(argv[0], "r");
fseek(fp, 0, SEEK_END);
f_size = ftell(fp); /* This returns 29696, but file is 85 bytes */
fseek(fp, 0, SEEK_SET);
code_s = sizeof(char) * f_size;
code = malloc(code_s);
result = fread(code, 1, f_size, fp); /* This returns 1045, it should be the same as f_size */
The root of the problem is here:
FILE* fp = fopen(argv[0], "r");
argv[0] is your executable program, NOT the parameter. It certainly won't be a text file. Try argv[1], and see what happens then.
You cannot determine the size of a file in characters without reading the data, unless you're using a fixed-width encoding.
For example, a file in UTF-8 which is 8 bytes long could be anything from 2 to 8 characters in length.
That's not a limitation of the file APIs, it's a natural limitation of there not being a direct mapping from "size of binary data" to "number of characters."
If you have a fixed-width encoding then you can just divide the size of the file in bytes by the number of bytes per character. ASCII is the most obvious example of this, but if your file is encoded in UTF-16 and you happen to be on a system which treats UTF-16 code points as the "native" internal character type (which includes Java, .NET and Windows) then you can predict the number of "characters" to allocate as if UTF-16 were fixed width. (UTF-16 is variable width due to Unicode characters above U+FFFF being encoded in multiple code points, but a lot of the time developers ignore this.)
I'm pretty sure argv[0] won't be an text file.
Give this a try (haven't compiled this, but I've done this a bazillion times, so I'm pretty sure it's at least close):
char* readFile(char* filename)
{
FILE* file = fopen(filename,"r");
if(file == NULL)
{
return NULL;
}
fseek(file, 0, SEEK_END);
long int size = ftell(file);
rewind(file);
char* content = calloc(size + 1, 1);
fread(content,1,size,file);
return content;
}
If you're developing for Linux (or other Unix-like operating systems), you can retrieve the file-size with stat before opening the file:
#include <stdio.h>
#include <sys/stat.h>
int main() {
struct stat file_stat;
if(stat("main.c", &file_stat) != 0) {
perror("could not stat");
return (1);
}
printf("%d\n", (int) file_stat.st_size);
return (0);
}
EDIT: As I see the code, I have to get into the line with the other posters:
The array that takes the arguments from the program-call is constructed this way:
[0] name of the program itself
[1] first argument given
[2] second argument given
[n] n-th argument given
You should also check argc before trying to use a field other than '0' of the argv-array:
if (argc < 2) {
printf ("Usage: %s arg1", argv[0]);
return (1);
}
argv[0] is the path to the executable and thus argv[1] will be the first user submitted input. Try to alter and add some simple error-checking, such as checking if fp == 0 and we might be ble to help you further.
You can open the file, put the cursor at the end of the file, store the offset, and go back to the top of the file, and make the difference.
You can use fseek for text files as well.
fseek to end of file
ftell the offset
fseek back to the begining
and you have size of the file
Kind of hard with no sample code, but fstat (or stat) will tell you how big the file is. You allocate the memory required, and slurp the file in.
Another approach is to read the file a piece at a time and extend your dynamic buffer as needed:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define PAGESIZE 128
int main(int argc, char **argv)
{
char *buf = NULL, *tmp = NULL;
size_t bufSiz = 0;
char inputBuf[PAGESIZE];
FILE *in;
if (argc < 2)
{
printf("Usage: %s filename\n", argv[0]);
return 0;
}
in = fopen(argv[1], "r");
if (in)
{
/**
* Read a page at a time until reaching the end of the file
*/
while (fgets(inputBuf, sizeof inputBuf, in) != NULL)
{
/**
* Extend the dynamic buffer by the length of the string
* in the input buffer
*/
tmp = realloc(buf, bufSiz + strlen(inputBuf) + 1);
if (tmp)
{
/**
* Add to the contents of the dynamic buffer
*/
buf = tmp;
buf[bufSiz] = 0;
strcat(buf, inputBuf);
bufSiz += strlen(inputBuf) + 1;
}
else
{
printf("Unable to extend dynamic buffer: releasing allocated memory\n");
free(buf);
buf = NULL;
break;
}
}
if (feof(in))
printf("Reached the end of input file %s\n", argv[1]);
else if (ferror(in))
printf("Error while reading input file %s\n", argv[1]);
if (buf)
{
printf("File contents:\n%s\n", buf);
printf("Read %lu characters from %s\n",
(unsigned long) strlen(buf), argv[1]);
}
free(buf);
fclose(in);
}
else
{
printf("Unable to open input file %s\n", argv[1]);
}
return 0;
}
There are drawbacks with this approach; for one thing, if there isn't enough memory to hold the file's contents, you won't know it immediately. Also, realloc() is relatively expensive to call, so you don't want to make your page sizes too small.
However, this avoids having to use fstat() or fseek()/ftell() to figure out how big the file is beforehand.

Fwrite write only 4096 instead of 100,000 bytes

I use fwrite to write a buffer of 100,000 chars to file, but the return value from fwrite is only 4096.
char buffer [100000];
memset(buffer,0x00,100000);
FILE *f = fopen("<path>","ab+");
if(f ==NULL)
{
return;
}
int ret =fwrite(buffer,1,100000,f);
printf("ret = %d",ret);
ret = 4096
Why this code write only 4096 bytes instead of 100,000 ?
This is Linux embedded system
From man pages:
RETURN VALUE
[...] If an error occurs, or the end of the file is
reached, the return value is a short item count (or zero).
In this case you should use ferror(f) to see if the file handle is in error condition. Also, you can zero the errno before the call, and print the error message with perror:
errno = 0;
int ret = fwrite(buffer, 1, 100000, f);
if (ret != 100000) {
printf("Stream error indication %d", ferror(f));
perror("Short item count");
}
The maximum length for any write call is defined by SSIZE_MAX which can be found in unistd.h.
This holds for every POSIX-compliant system. SSIZE_MAX may differ for different implementations.
Try out the following example to determine the maximum write length on your system:
#include <stdio.h>
#include <limits.h>
int main(void) {
printf("SSIZE_MAX : %ld\n", SSIZE_MAX);
return 0;
}
It prints SSIZE_MAX : 9223372036854775807 on my machine.
EDIT: You can also try to locate your limits.h file, but compiling might be the easier option

Reading files to shared memory

I am reading a binary file that I want to offload directly to the Xeon Phi through Cilk and shared memory.
As we are reading fairly much data at once each time and binary data the preferred option is to use fread.
So if I make a very simple example it would go like this
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
_Cilk_shared uint8_t* _Cilk_shared buf;
int main(int argc, char **argv) {
printf("Argv is %s\n", argv[1]);
FILE* infile = fopen(argv[1], "rb");
buf = (_Cilk_shared uint8_t*) _Offload_shared_malloc(2073600);
int len = fread(buf, 1, 2073600, infile);
if(ferror(infile)) {
perror("ferror");
}
printf("Len is %d and first value of buf is %d\n", len, *buf);
return 0;
}
The example is very simplified from the real code but enough to examplify the behavior.
This code would then return
ferror: Bad address
Len is 0 and first value of buf is 0
However if we switch out the fread for a fgets (not very suitable for reading binary data, specially with the return value) things work great.
That is we switch fgets((char *) buf, 2073600, infile); and then drop the len from the print out we get
first value of buf is 46
Which fits with what we need and I can run _Offload_cilk on a function with buf as an argument and do work on it.
Is there something I am missing or is fread just not supported? I've tried to find as much info on this from both intel and other sites on the internet but I have sadly been unable to.
----EDIT----
After more research into this it seems that running fread on the shared memory with a value higher than 524287 (524287 is 19 bits exactly) fread gets the error from above. At 524287 or lower things work, and you can run as many fread as you want and read all the data.
I am utterly unable to find any reason written anywhere for this.
I don't have a PHI, so unable to see if this would make a difference -- but fread has it's own buffering, and while that may be turned of for this type of readind, then I don't see why you would go through the overhead of using fread rather than just using the lower level calls of open&read, like
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdlib.h>
#include <stdint.h>
_Cilk_shared uint8_t* _Cilk_shared buf;
int main(int argc, char **argv) {
printf("Argv is %s\n", argv[1]);
int infile = open(argv[1], O_RDONLY); // should test if open ok, but skip to make code similar to OP's
int len, pos =0, size = 2073600;
buf = (_Cilk_shared uint8_t*) _Offload_shared_malloc(size);
do {
buf[pos]=0; // force the address to be mapped to process memory before read
len = read(infile, &buf[pos], size);
if(len < 0) {
perror("error");
break;
}
pos += len; // move position forward in cases where we have no read the entire data in first read.
size -= len;
} while (size > 0);
printf("Len is %d (%d) and first value of buf is %d\n", len, pos, *buf);
return 0;
}
read & write should work with shared memory allocated without the problem you are seeing.
Can you try to insert something like this before the fread calls?
memset(buf, 0, 2073600); // after including string.h
This trick worked for me, but I don't know why (lazy allocation?).
FYI, you can also post a MIC question on this forum.

reading from a binary file in C

I am currently working on a project in which I have to read from a binary file and send it through sockets and I am having a hard time trying to send the whole file.
Here is what I wrote so far:
FILE *f = fopen(line,"rt");
//size = lseek(f, 0, SEEK_END)+1;
fseek(f, 0L, SEEK_END);
int size = ftell(f);
unsigned char buffer[MSGSIZE];
FILE *file = fopen(line,"rb");
while(fgets(buffer,MSGSIZE,file)){
sprintf(r.payload,"%s",buffer);
r.len = strlen(r.payload)+1;
res = send_message(&r);
if (res < 0) {
perror("[RECEIVER] Send ACK error. Exiting.\n");
return -1;
}
}
I think it has something to do with the size of the buffer that I read into,but I don't know what it's the correct formula for it.
One more thing,is the sprintf done correctly?
If you are reading binary files, a NUL character may appear anywhere in the file.
Thus, using string functions like sprintf and strlen is a bad idea.
If you really need to use a second buffer (buffer), you could use memcpy.
You could also directly read into r.payload (if r.payload is already allocated with sufficient size).
You are looking for fread for a binary file.
The return value of fread tells you how many bytes were read into your buffer.
You may also consider to call fseek again.
See here How can I get a file's size in C?
Maybe your code could look like this:
#include <stdint.h>
#include <stdio.h>
#define MSGSIZE 512
struct r_t {
uint8_t payload[MSGSIZE];
int len;
};
int send_message(struct r_t *t);
int main() {
struct r_t r;
FILE *f = fopen("test.bin","rb");
fseek(f, 0L, SEEK_END);
size_t size = ftell(f);
fseek(f, 0L, SEEK_SET);
do {
r.len = fread(r.payload, 1, sizeof(r.payload), f);
if (r.len > 0) {
int res = send_message(&r);
if (res < 0) {
perror("[RECEIVER] Send ACK error. Exiting.\n");
fclose(f);
return -1;
}
}
} while (r.len > 0);
fclose(f);
return 0;
}
No, the sprintf is not done correctly. It is prone to buffer overflow, a very serious security problem.
I would consider sending the file as e.g. 1024-byte chunks instead of as line-by-line, so I would replace the fgets call with an fread call.
Why are you opening the file twice? Apparently to get its size, but you could open it only once and jump back to the beginning of the file. And, you're not using the size you read for anything.
Is it a binary file or a text file? fgets() assumes you are reading a text file -- it stops on a line break -- but you say it's a binary file and open it with "rb" (actually, the first time you opened it with "rt", I assume that was a typo).
IMO you should never ever use sprintf. The number of characters written to the buffer depends on the parameters that are passed in, and in this case if there is no '\0' in buffer then you cannot predict how many bytes will be copied to r.payload, and there is a very good chance you will overflow that buffer.
I think sprintf() would be the first thing to fix. Use memcpy() and you can tell it exactly how many bytes to copy.

File get contents in C

What is the best way to get the contents of a file into a single character array?
I have read this question:
Easiest way to get file's contents in C
But from the comments, I've seen that the solution isn't great for large files. I do have access to the stat function. If the file size is over 4 gb, should I just return an error?
The contents of the file is encrypted and since it's supplied by the user it could be as large as anyone would want it to be. I want it to return an error and not crash if the file is too big. The main purpose of populating the character array with the contents of a file, is to compare it to another character array and also (if needed and configured to do so) to log both of these to a log file (or multiple log files if necessary).
You may use fstat(3) from sys/stat.h. Here is a little function to get size of the file, allocate memory if file is less than 4GB's and return (-1) otherwise. It reads the file to the char array passed to char *buffer a char *, which contains the contents of the whole file.It should be free'd after use.
#include <stdio.h>
#include <sys/stat.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <fcntl.h>
char *loadlfile(const char *path)
{
int file_descr;
FILE *fp;
struct stat buf;
char *p, *buffer;
fstat((file_descr = open(path, O_RDONLY)), &buf);
// This check is done at preprocessing and requires no check at runtime.
// It basically means "If this machine is not of a popular 64bit architecture,
// it's probably not 128bit and possibly has limits in maximum memory size.
// This check is done for the sake of omission of malloc(3)'s unnecessary
// invocation at runtime.
// Amd 64 Arm64 Intel 64 Intel 64 for Microsofts compiler.
#if !defined(__IA_64) || !defined(__aarch64__) || !defined(__ia64__) || !defined(_M_IA64)
#define FILE_MAX_BYTES (4000000000)
// buf.st_size is of off_t, you may need to cast it.
if(buf.st_size >= FILE_MAX_BYTES-1)
return (-1);
#endif
if(NULL == (buffer = malloc(buf.st_size + 1)))
return NULL;
fp = fdopen(file_descr, "rb");
p = buffer;
while((*p++ = fgetc(fp)) != EOF)
;
*p = '\0';
fclose(fp);
close(file_descr);
return buffer;
}
A very broad list of pre-defined macros for various things can be found # http://sourceforge.net/p/predef/wiki/Home/. The reason for the architecture and file size check is, malloc can be expensive at times and it is best to omit/skip it's usage when it is not needed. And querying a memory of max. 4gb for a whole block of 4gb storage is just waste of those precious cycles.
From that guy's code just do, if I understand your question correctly:
char * buffer = 0;
long length;
FILE * f = fopen (filename, "rb");
if (f)
{
fseek (f, 0, SEEK_END);
length = ftell (f);
if(length > MY_MAX_SIZE) {
return -1;
}
fseek (f, 0, SEEK_SET);
buffer = malloc (length);
if (buffer)
{
fread (buffer, 1, length, f);
}
fclose (f);
}
if (buffer)
{
// start to process your data / extract strings here...
}

Resources