I really have a problem in here. It seems that i dont really find the best way to exit a loop when reading characters from a file. I know that every tutorial suggests that i shouldn't use while( !feof() ) but they dont really suggest anything else than putting fgets() in the while and that is not really apropriate because i want to read the whole FILE content in my variable.
while (!feof(newFile))
{
newString[i++] = fgetc(newFile);
}
newString[i] = '\0';
i = 0;
//this is the resoult seen with the debugger
newFile content = ABC
newString[0] = 65 (A)
newString[1] = 66 (B)
newString[2] = 67 (C)
newString[3] = 10 (\n)
newString[4] = -1
newString[5] = 0 (\0)
I am looking for a solution and some advices about how to improve my algorithm.
int c;
while ((c = fgetc(newFile)) != EOF) newString[i++] = c;
newString[i] = '\0';
For reading whole test files into memory, I suggest using mmap. This has the benefit, that all buffering and reading can be handled by your operating system, and you can focus your code on the task at hand. (also, it's usually faster than buffering stuff yourself.)
#include <stdio.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/stat.h>
int
main (void)
{
int fd = open("filename", O_RDONLY);
if (fd == -1)
return 0; // file open failed
struct stat sb;
int res = fstat(fd, &sb);
if (res == -1)
return 0; // stat failed
size_t length = sb.st_size;
char *data = mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
if (!data)
return 0; // mmap failed
// iterate over characters
size_t i;
for (i = 0; i < length; ++i)
printf("'%c'\n", data[i]);
munmap(data, length);
return 0;
}
they dont really suggest anything else than putting fgets() in the while and that is not really apropriate
That is absolutely, entirely appropriate. fgets() reads the file line by line, and you can append each line onto then end of a dybamically expanding buffer.
However, if you don't want to use fgets(), and you just want to read the file at once: use fread().
FILE *f = fopen("foo.txt", "rb");
if (!f)
abort(); // "handle" error
fseek(f, 0, SEEK_END);
size_t len = ftell(f);
fseek(f, 0, SEEK_SET);
char *buf = malloc(len + 1);
if (!buf)
abort();
if (fread(buf, len, 1, f) != 1) {
// handle reading error
}
buf[len] = 0;
fclose(f);
Related
To be specific: why can I do this:
FILE *fp = fopen("/proc/self/maps", "r");
char buf[513]; buf[512] = NULL;
while(fgets(buf, 512, fp) > NULL) printf("%s", buf);
but not this:
int fd = open("/proc/self/maps", O_RDONLY);
struct stat s;
fstat(fd, &s); // st_size = 0 -> why?
char *file = mmap(0, s.st_size /*or any fixed size*/, PROT_READ, MAP_PRIVATE, fd, 0); // gives EINVAL for st_size (because 0) and ENODEV for any fixed block
write(1, file, st_size);
I know that /proc files are not really files, but it seems to have some defined size and content for the FILE* version. Is it secretly generating it on-the-fly for read or something? What am I missing here?
EDIT:
as I can clearly read() from them, is there any way to get the possible available bytes? or am I stuck to read until EOF?
They are created on the fly as you read them. Maybe this would help, it is a tutorial showing how a proc file can be implemented:
https://devarea.com/linux-kernel-development-creating-a-proc-file-and-interfacing-with-user-space/
tl;dr: you give it a name and read and write handlers, that's it. Proc files are meant to be very simple to implement from the kernel dev's point of view. They do not behave like full-featured files though.
As for the bonus question, there doesn't seem to be a way to indicate the size of the file, only EOF on reading.
proc "files" are not really files, they are just streams that can be read/written from, but they contain no pyhsical data in memory you can map to.
https://tldp.org/LDP/Linux-Filesystem-Hierarchy/html/proc.html
As already explained by others, /proc and /sys are pseudo-filesystems, consisting of data provided by the kernel, that does not really exist until it is read – the kernel generates the data then and there. Since the size varies, and really is unknown until the file is opened for reading, it is not provided to userspace at all.
It is not "unfortunate", however. The same situation occurs very often, for example with character devices (under /dev), pipes, FIFOs (named pipes), and sockets.
We can trivially write a helper function to read pseudofiles completely, using dynamic memory management. For example:
// SPDX-License-Identifier: CC0-1.0
//
#define _POSIX_C_SOURCE 200809L
#define _ATFILE_SOURCE
#define _GNU_SOURCE
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
#include <errno.h>
/* For example main() */
#include <stdio.h>
/* Return a directory handle for a specific relative directory.
For absolute paths and paths relative to current directory, use dirfd==AT_FDCWD.
*/
int at_dir(const int dirfd, const char *dirpath)
{
if (dirfd == -1 || !dirpath || !*dirpath) {
errno = EINVAL;
return -1;
}
return openat(dirfd, dirpath, O_DIRECTORY | O_PATH | O_CLOEXEC);
}
/* Read the (pseudofile) contents to a dynamically allocated buffer.
For absolute paths and paths relative to current durectory, use dirfd==AT_FDCWD.
You can safely initialize *dataptr=NULL,*sizeptr=0 for dynamic allocation,
or reuse the buffer from a previous call or e.g. getline().
Returns 0 with errno set if an error occurs. If the file is empty, errno==0.
In all cases, remember to free (*dataptr) after it is no longer needed.
*/
size_t read_pseudofile_at(const int dirfd, const char *path, char **dataptr, size_t *sizeptr)
{
char *data;
size_t size, have = 0;
ssize_t n;
int desc;
if (!path || !*path || !dataptr || !sizeptr) {
errno = EINVAL;
return 0;
}
/* Existing dynamic buffer, or a new buffer? */
size = *sizeptr;
if (!size)
*dataptr = NULL;
data = *dataptr;
/* Open pseudofile. */
desc = openat(dirfd, path, O_RDONLY | O_CLOEXEC | O_NOCTTY);
if (desc == -1) {
/* errno set by openat(). */
return 0;
}
while (1) {
/* Need to resize buffer? */
if (have >= size) {
/* For pseudofiles, linear size growth makes most sense. */
size = (have | 4095) + 4097 - 32;
data = realloc(data, size);
if (!data) {
close(desc);
errno = ENOMEM;
return 0;
}
*dataptr = data;
*sizeptr = size;
}
n = read(desc, data + have, size - have);
if (n > 0) {
have += n;
} else
if (n == 0) {
break;
} else
if (n == -1) {
const int saved_errno = errno;
close(desc);
errno = saved_errno;
return 0;
} else {
close(desc);
errno = EIO;
return 0;
}
}
if (close(desc) == -1) {
/* errno set by close(). */
return 0;
}
/* Append zeroes - we know size > have at this point. */
if (have + 32 > size)
memset(data + have, 0, 32);
else
memset(data + have, 0, size - have);
errno = 0;
return have;
}
int main(void)
{
char *data = NULL;
size_t size = 0;
size_t len;
int selfdir;
selfdir = at_dir(AT_FDCWD, "/proc/self/");
if (selfdir == -1) {
fprintf(stderr, "/proc/self/ is not available: %s.\n", strerror(errno));
exit(EXIT_FAILURE);
}
len = read_pseudofile_at(selfdir, "status", &data, &size);
if (errno) {
fprintf(stderr, "/proc/self/status: %s.\n", strerror(errno));
exit(EXIT_FAILURE);
}
printf("/proc/self/status: %zu bytes\n%s\n", len, data);
len = read_pseudofile_at(selfdir, "maps", &data, &size);
if (errno) {
fprintf(stderr, "/proc/self/maps: %s.\n", strerror(errno));
exit(EXIT_FAILURE);
}
printf("/proc/self/maps: %zu bytes\n%s\n", len, data);
close(selfdir);
free(data); data = NULL; size = 0;
return EXIT_SUCCESS;
}
The above example program opens a directory descriptor ("atfile handle") to /proc/self. (This way you do not need to concatenate strings to construct paths.)
It then reads the contents of /proc/self/status. If successful, it displays its size (in bytes) and its contents.
Next, it reads the contents of /proc/self/maps, reusing the previous buffer. If successful, it displays its size and contents as well.
Finally, the directory descriptor is closed as it is no longer needed, and the dynamically allocated buffer released.
Note that it is perfectly safe to do free(NULL), and also to discard the dynamic buffer (free(data); data=NULL; size=0;) between the read_pseudofile_at() calls.
Because pseudofiles are typically small, the read_pseudofile_at() uses a linear dynamic buffer growth policy. If there is no previous buffer, it starts with 8160 bytes, and grows it by 4096 bytes afterwards until sufficiently large. Feel free to replace it with whatever growth policy you prefer, this one is just an example, but works quite well in practice without wasting much memory.
I am learning C and I have been trying to read a file and print what I just read. I open the file and need to call another function to read and return the sentence that was just read.
My function will return 1 if everything went fine or 0 otherwise.
I have been trying to make it work for a while but I really dont get why I cant manage to give line its value. In the main, it always prints (null).
The structure of the project has to stay the same, and I absolutely have to use open and read. Not fopen, or anything else...
If someone can explain it to me that would be awesome.
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#define BUFF_SIZE 50
int read_buff_size(int const fd, char **line)
{
char buf[BUFF_SIZE];
int a;
a = read(fd, buf, BUFF_SIZE);
buf[a] = '\0';
*line = strdup(buf);
return (1);
}
int main(int ac, char **av)
{
char *line;
int fd;
if (ac != 2)
{
printf("error");
return (0);
}
else
{
if((fd = open(av[1], O_RDONLY)) == -1)
{
printf("error");
return (0);
}
else
{
if (read_buff_size(fd, &line))
printf("%s\n", line);
}
close(fd);
}
}
Here:
char buf[BUFF_SIZE];
int a;
a = read(fd, buf, BUFF_SIZE);
buf[a] = '\0';
if there are more characters than BUFF_SIZE available to be read, then you will fill your array entirely, and buf[a] will be past the end of your array. You should either increase the size of buf by one character:
char buf[BUFF_SIZE + 1];
or, more logically given your macro name, read one fewer characters:
a = read(fd, buf, BUFF_SIZE - 1);
You should also check the returns from strdup() and read() for errors, as they can both fail.
read(fd, buf, BUFF_SIZE); //UB if string is same or longer as BUFF_SIZE
u need +1 byte to store 0, so use BUFF_SIZE - 1 on reading or +1 on array allocation...also you should check all returned values and if something failed - return 0
Keep it simple and take a look at:
https://github.com/mantovani/apue/blob/c47b4b1539d098c153edde8ff6400b8272acb709/mycat/mycat.c
(Archive form straight from the source: http://www.kohala.com/start/apue.tar.Z)
#define BUFFSIZE 8192
int main(void){
int n;
char buf[BUFFSIZE];
while ( (n = read(STDIN_FILENO, buf, BUFFSIZE)) > 0)
if (write(STDOUT_FILENO, buf, n) != n)
err_sys("write error");
if (n < 0)
err_sys("read error");
exit(0);
}
No need to use the heap (strdup). Just write your buffer to STDOUT_FILENO (=1) for as long as read returns a value that's greater than 0. If you end with read returning 0, the whole file has been read.
I read an exec'd program's stdout using a pipe:
int pipes[2];
pipe(pipes);
if (fork() == 0) {
dup2(pipes[1], 1);
close(pipes[1]);
execlp("some_prog", "");
} else {
char* buf = auto_read(pipes[0]);
}
To read from stdout, I have a function auto_read which automatically allocates more memory as needed.
char* auto_read(int fp) {
int bytes = 1000;
char* buf = (char*)malloc(bytes+1);
int bytes_read = read(fp, buf, bytes);
int total_reads = 1;
while (bytes_read != 0) {
realloc(buf, total_reads * bytes + 1);
bytes_read = read(fp, buf + total_reads * bytes, bytes);
total_reads++;
}
buf[(total_reads - 1) * bytes + bytes_read] = 0;
return buf;
}
The reason I do it this way is I don't know how much text the program is going to spew out ahead of time, and I don't want to create an overly large buffer and be a memory hog. I'm wondering if there is:
A cleaner way to write this.
A more memory or speed-efficient way of doing this.
Use popen if you only need to read from a process and are on a *NIX platform:
FILE *programStdout = popen("command", "r");
// read from programStdout (fread(), fgets(), etc.)
char buffer[1024];
while (fgets(buffer, 1024, programStdout))
{
puts(buffer);
}
EDIT: You asked for a way to map a programs output to a file, so here you go:
#import <stdio.h>
#import <unistd.h>
#import <sys/mman.h>
void *dataWithContentsOfMappedProgram(const char *command, size_t *len)
{
// read the data
char template[] = "/tmp/tmpfile_XXXXXX";
int fd = mkstemp(template);
FILE *output = fdopen(fd, "w+");
FILE *input = popen(command, "r");
#define BUF_SIZ 1024
char buffer[BUF_SIZ];
size_t readSize = 0;
while ((readSize = fread(buffer, 1, BUF_SIZ, input)))
{
fwrite(buffer, 1, readSize, output);
}
fclose(input);
input = NULL;
#undef BUF_SIZ
// now we map the file
long fileLength = ftell(output);
fseek(output, 0, SEEK_SET);
void *data = mmap(NULL, fileLength, PROT_READ | PROT_WRITE, MAP_FILE | MAP_PRIVATE, fd, 0);
close(fd);
if (data == MAP_FAILED)
return NULL;
return data;
}
int main()
{
size_t fileLen = 0;
char *mapped = dataWithContentsOfMappedProgram("echo Hello World!", &fileLen);
puts(mapped);
munmap(mapped, fileLen);
}
Below is part of my code to read data from a text file, strip out the HTML and print out just the normal text. This all work swell but i am having a problem with reading all of the text file. How would i read the entire text file, understand that i will probably need to use malloc but am unsure of how to do so.
int i, nRead, fd;
int source;
char buf[1024];
int idx = 0;
int opened = 0;
if((fd = open("data.txt", O_RDONLY)) == -1)
{
printf("Cannot open the file");
}
else
{
nRead = read(fd, buf, 1024);
printf("Original String ");
for(i=0; i<nRead; i++)
{
printf("%c", buf[i]);
}
printf("\nReplaced String ");
for(i=0; i<nRead; i++)
{
if(buf[i]=='<') {
opened = 1;
} else if (buf[i] == '>') {
opened = 0;
} else if (!opened) {
buf[idx++] = buf[i];
}
//printf("%c", buf[i]);
}
}
buf[idx] = '\0';
printf("%s\n", buf);
close(source);
If you want to read the complete file do the following:
Open the file
Use fstat - see fstat - to get the size
malloc the buffer i.e. buffer = malloc(fileStats.st_size);
Read the file fread(buffer, fileStats.st_size, 1);
Close the file.
Play with the buffer to your hearts content.
You may wish to add one to the buffer size to place the null character into it.
Instead of collecting all the text in a single buffer, you could just put the above in a loop and call read() repeatedly to fill the buffer. Process each chunk as you read it, and print out the part you've processed so far. When you hit end-of-file (i.e., when read() returns 0,) stop.
More efficient would be to use the mmap() call to map the file directly into memory:
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
struct stat statbuf;
stat("data.txt", &statbuf);
size_t len = stat.st_size;
int fd = open("data.txt",O_RDONLY);
char *buf = mmap(NULL, len, PROT_READ, MAP_PRIVATE,fd, 0);
for( i=0; i< len; i++ ) {
// do your own thing here
}
munmap(buf,len);
close(fd);
If the file is longer than 2GB then use the mmap2() call - you will have to fiddle with page sizes as the last argument is in pages (usually 4k)
I have the following bit of code (it's "example" code, so nothing fancy):
#include <stdio.h>
#include <string.h>
#include <fcntl.h>
#include <sys/types.h>
#include <unistd.h>
int main()
{
char buffer[9];
int fp = open("test.txt", O_RDONLY);
if (fp != -1) // If file opened successfully
{
off_t offset = lseek(fp, 2, SEEK_SET); // Seek from start of file
ssize_t count = read(fp, buffer, strlen(buffer));
if (count > 0) // No errors (-1) and at least one byte (not 0) was read
{
printf("Read test.txt %d characters from start: %s\n", offset, buffer);
}
close(fp);
}
int fp2 = open("test.txt", O_WRONLY);
if (fp2 != -1)
{
off_t offset = lseek(fp2, 2, SEEK_CUR); // Seek fraom current position (0) - same result as above in this case
ssize_t count = write(fp2, buffer, strlen(buffer));
if (count == strlen(buffer)) // We successfully wrote all the bytes
{
printf("Wrote to test.txt %d characters from current (0): %s\n", offset, buffer);
}
close(fp2);
}
}
This code does not return the first printout (reading) as it is, and the second printout reads: "Wrote test.txt 0 characters from current (0): " indicating that it did not seek anywhere in the file and that buffer is empty.
The odd thing is, if I comment out everything from fp2 = open("test.txt", O_WRONLY);, the first printout returns what I expected. As soon as I include the second open statement (even with nothing else) it won't write it. Does it somehow re-order the open statements or something else?
The line
ssize_t count = read(fp, buffer, strlen(buffer));
is wrong, you're taking the strlen of an uninitialized buffer. You likely want the size of the buffer like so:
ssize_t count = read(fp, buffer, sizeof buffer);
You should make sure buffer really contain a nul terminated string as well when you print it as one.
if (fp != -1) // If file opened successfully
{
off_t offset = lseek(fp, 2, SEEK_SET); // Seek from start of file
ssize_t count = read(fp, buffer, sizeof buffer - 1);
if (count > 0) // No errors (-1) and at least one byte (not 0) was read
{
buffer[count] = 0;
Are you perfectly sure you are cleaning out the file every time you run?
As written, the first time you run this, you'll only see the second printout, and the second time you might see the first one.