Why does allocating a lot of memory give worse results? - c

So in my assignment I am testing the times it takes for different copying functions to copy. One of them I am a bit curious on the results. In one my copy functions it involves allocating memory like so:
int copyfile3(char* infilename, char* outfilename, int size) {
int infile; //File handles for source and destination.
int outfile;
infile = open(infilename, O_RDONLY); // Open the input and output files.
if (infile < 0) {
open_file_error(infilename);
return 1;
}
outfile = open(outfilename, O_WRONLY | O_CREAT, S_IRUSR | S_IWUSR);
if (outfile < 0) {
open_file_error(outfilename);
return 1;
}
int intch; // Character read from input file. must be an int to catch EOF.
char *ch = malloc(sizeof(char) * (size + 1));
gettimeofday(&start, NULL);
// Read each character from the file, checking for EOF.
while ((intch = read(infile, ch, size)) > 0) {
write(outfile, ch, intch); // Write out.
}
gettimeofday(&end, NULL);
// All done--close the files and return success code.
close(infile);
close(outfile);
free(ch);
return 0; // Success!
}
The main program allows the user to input the infile outfile copyFunctionNumber. If 3 is chosen the user can input a specific buffer size. So I was testing copying a file (6.3 MB) with different buffer sizes. When I choose 1024 it gives a difference of 42,000 microseconds, for 2000 it gives 26,000 microseconds, but for 3000 it gives 34,000 microseconds. My question is why does it go back up? And how could you tell what the perfect buffer size will be for the copying to take the least amount of time.

Related

Why can I not mmap /proc/self/maps?

To be specific: why can I do this:
FILE *fp = fopen("/proc/self/maps", "r");
char buf[513]; buf[512] = NULL;
while(fgets(buf, 512, fp) > NULL) printf("%s", buf);
but not this:
int fd = open("/proc/self/maps", O_RDONLY);
struct stat s;
fstat(fd, &s); // st_size = 0 -> why?
char *file = mmap(0, s.st_size /*or any fixed size*/, PROT_READ, MAP_PRIVATE, fd, 0); // gives EINVAL for st_size (because 0) and ENODEV for any fixed block
write(1, file, st_size);
I know that /proc files are not really files, but it seems to have some defined size and content for the FILE* version. Is it secretly generating it on-the-fly for read or something? What am I missing here?
EDIT:
as I can clearly read() from them, is there any way to get the possible available bytes? or am I stuck to read until EOF?
They are created on the fly as you read them. Maybe this would help, it is a tutorial showing how a proc file can be implemented:
https://devarea.com/linux-kernel-development-creating-a-proc-file-and-interfacing-with-user-space/
tl;dr: you give it a name and read and write handlers, that's it. Proc files are meant to be very simple to implement from the kernel dev's point of view. They do not behave like full-featured files though.
As for the bonus question, there doesn't seem to be a way to indicate the size of the file, only EOF on reading.
proc "files" are not really files, they are just streams that can be read/written from, but they contain no pyhsical data in memory you can map to.
https://tldp.org/LDP/Linux-Filesystem-Hierarchy/html/proc.html
As already explained by others, /proc and /sys are pseudo-filesystems, consisting of data provided by the kernel, that does not really exist until it is read – the kernel generates the data then and there. Since the size varies, and really is unknown until the file is opened for reading, it is not provided to userspace at all.
It is not "unfortunate", however. The same situation occurs very often, for example with character devices (under /dev), pipes, FIFOs (named pipes), and sockets.
We can trivially write a helper function to read pseudofiles completely, using dynamic memory management. For example:
// SPDX-License-Identifier: CC0-1.0
//
#define _POSIX_C_SOURCE 200809L
#define _ATFILE_SOURCE
#define _GNU_SOURCE
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
#include <errno.h>
/* For example main() */
#include <stdio.h>
/* Return a directory handle for a specific relative directory.
For absolute paths and paths relative to current directory, use dirfd==AT_FDCWD.
*/
int at_dir(const int dirfd, const char *dirpath)
{
if (dirfd == -1 || !dirpath || !*dirpath) {
errno = EINVAL;
return -1;
}
return openat(dirfd, dirpath, O_DIRECTORY | O_PATH | O_CLOEXEC);
}
/* Read the (pseudofile) contents to a dynamically allocated buffer.
For absolute paths and paths relative to current durectory, use dirfd==AT_FDCWD.
You can safely initialize *dataptr=NULL,*sizeptr=0 for dynamic allocation,
or reuse the buffer from a previous call or e.g. getline().
Returns 0 with errno set if an error occurs. If the file is empty, errno==0.
In all cases, remember to free (*dataptr) after it is no longer needed.
*/
size_t read_pseudofile_at(const int dirfd, const char *path, char **dataptr, size_t *sizeptr)
{
char *data;
size_t size, have = 0;
ssize_t n;
int desc;
if (!path || !*path || !dataptr || !sizeptr) {
errno = EINVAL;
return 0;
}
/* Existing dynamic buffer, or a new buffer? */
size = *sizeptr;
if (!size)
*dataptr = NULL;
data = *dataptr;
/* Open pseudofile. */
desc = openat(dirfd, path, O_RDONLY | O_CLOEXEC | O_NOCTTY);
if (desc == -1) {
/* errno set by openat(). */
return 0;
}
while (1) {
/* Need to resize buffer? */
if (have >= size) {
/* For pseudofiles, linear size growth makes most sense. */
size = (have | 4095) + 4097 - 32;
data = realloc(data, size);
if (!data) {
close(desc);
errno = ENOMEM;
return 0;
}
*dataptr = data;
*sizeptr = size;
}
n = read(desc, data + have, size - have);
if (n > 0) {
have += n;
} else
if (n == 0) {
break;
} else
if (n == -1) {
const int saved_errno = errno;
close(desc);
errno = saved_errno;
return 0;
} else {
close(desc);
errno = EIO;
return 0;
}
}
if (close(desc) == -1) {
/* errno set by close(). */
return 0;
}
/* Append zeroes - we know size > have at this point. */
if (have + 32 > size)
memset(data + have, 0, 32);
else
memset(data + have, 0, size - have);
errno = 0;
return have;
}
int main(void)
{
char *data = NULL;
size_t size = 0;
size_t len;
int selfdir;
selfdir = at_dir(AT_FDCWD, "/proc/self/");
if (selfdir == -1) {
fprintf(stderr, "/proc/self/ is not available: %s.\n", strerror(errno));
exit(EXIT_FAILURE);
}
len = read_pseudofile_at(selfdir, "status", &data, &size);
if (errno) {
fprintf(stderr, "/proc/self/status: %s.\n", strerror(errno));
exit(EXIT_FAILURE);
}
printf("/proc/self/status: %zu bytes\n%s\n", len, data);
len = read_pseudofile_at(selfdir, "maps", &data, &size);
if (errno) {
fprintf(stderr, "/proc/self/maps: %s.\n", strerror(errno));
exit(EXIT_FAILURE);
}
printf("/proc/self/maps: %zu bytes\n%s\n", len, data);
close(selfdir);
free(data); data = NULL; size = 0;
return EXIT_SUCCESS;
}
The above example program opens a directory descriptor ("atfile handle") to /proc/self. (This way you do not need to concatenate strings to construct paths.)
It then reads the contents of /proc/self/status. If successful, it displays its size (in bytes) and its contents.
Next, it reads the contents of /proc/self/maps, reusing the previous buffer. If successful, it displays its size and contents as well.
Finally, the directory descriptor is closed as it is no longer needed, and the dynamically allocated buffer released.
Note that it is perfectly safe to do free(NULL), and also to discard the dynamic buffer (free(data); data=NULL; size=0;) between the read_pseudofile_at() calls.
Because pseudofiles are typically small, the read_pseudofile_at() uses a linear dynamic buffer growth policy. If there is no previous buffer, it starts with 8160 bytes, and grows it by 4096 bytes afterwards until sufficiently large. Feel free to replace it with whatever growth policy you prefer, this one is just an example, but works quite well in practice without wasting much memory.

Copy data from file X to file Y program in C

I tried to write basic program in C which copy data from file to another with given source path, destination path and buffer size as input.
my problem is the destination file filled with junk or something because its way larger than the source (get bigger depending on buffer size) and can't be open.
How do i read and write just the bytes in the source?
i'm working in linux, and this is the actually copying part:
char buffer[buffer_size];
int readable=1;
int writeable;
while(readable != 0){
readable = read(sourcef, buffer, buffer_size);
if(readable == -1){
close(sourcef);
close(destf);
exit_with_usage("Could not read.");
}
writeable = write(destf, buffer, buffer_size);
if(writeable == -1){
close(sourcef);
close(destf);
exit_with_usage("Could not write.");
}
}
writeable = write(destf, buffer, buffer_size);
must be
writeable = write(destf, buffer, readable);
Currently you do not write the number of characters you read but all the buffer, so the output file is too large
You also manage wrongly the end of the input file
The return value of read is :
On success, the number of bytes read is returned (zero indicates end of file)
On error, -1 is returned
A proposal :
/* you already check input and output file was open with success */
char buffer[buffer_size];
for(;;){
ssize_t readable = read(sourcef, buffer, buffer_size);
if(readable <= 0){
close(sourcef);
close(destf);
if (readable != 0)
/* not EOF */
exit_with_usage("Could not read.");
/* EOF */
break;
}
if (write(destf, buffer, n) != n) {
close(sourcef);
close(destf);
exit_with_usage("Could not write.");
}
}
I suppose exit_with_usage calls exit() so does not return
Note in theory write may write less than the expected number of characters without being an error, and the write has to be done in a loop, but in that case it is useless to manage that
read function returns how many bytes were read to buffer(which has buffer_size). Its not always the case actual bytes read has same value as buffer size(consider scenario if there are not enough bytes left in source file to fully fill your buffer). So you should write to destination file not buffer_size(third argument of the write function), but how many bytes have you read - that is readable variable in your code
You should exit when readable returns an error.So
while(readable != 0){
should be
while(readable != -1){
So that loop could be terminataed when an readfile is exhausted.
You see currently after the whole readfile has been read, calling read fails but write is being called repeatedly since execution has no exit path for failure on read. Also write should only write the number of bytes read. So the code would look like this:
char buffer[buffer_size];
int readable=1;
int writeable;
while(readable != -1){
readable = read(sourcef, buffer, buffer_size);
if(readable == -1){
close(sourcef);
close(destf);
exit_with_usage("Could not read.");
}
writeable = write(destf, buffer, readable);
if(writeable == -1){
close(sourcef);
close(destf);
exit_with_usage("Could not write.");
}
}
Simple code
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h> // For system calls write, read e close
#include <fcntl.h>
#define BUFFER_SIZE 4096
int main(int argc, char* argv[]) {
if (argc != 3) {
printf("Usage %s Src_file Dest_file\n", argv[0]);
exit(1);
}
unsigned char buffer[BUFFER_SIZE] = {0};
ssize_t ReadByte = 0;
int src_fd, dst_fd;
// open file in read mode
if ((src_fd = open(argv[1], O_RDONLY)) == -1) {
printf("Failed to open input file %s\n", argv[1]);
exit(1);
}
// open file in write mode and already exists to overwrite
if ((dst_fd = open(argv[2], O_WRONLY | O_CREAT | O_TRUNC, 644)) == -1) {
printf("Failed to create output file %s\n", argv[2]);
exit(1);
}
// loop
while (1) {
// read buffer
ReadByte = read(src_fd, buffer, sizeof(buffer));
// error with reading
if (ReadByte == -1) {
printf("Encountered an error\n");
break;
} else if (ReadByte == 0) {
// file end exit loop
printf("File copying successful.\n");
break;
}
// error with writing
if (write(dst_fd, buffer, ReadByte) == -1) {
printf("Failed to copying file\n");
break;
}
}
// Close file
close(src_fd);
close(dst_fd);
exit(0);
}
Run
./program src_file dest_file

To count the total number of lines in a file using Unix system call in C

I am a beginner in C language and unix do not have much experience in it. I am trying to count total lines inside the file using unix system call but I am getting no result. My lineCount is always coming out as 0 I do not know why? I want someone to help me figure out the line count if possible.Thanks
int lineCount = 0;
while (*buffer != '\0') // to check the end of the file
{
read( in_fd, buffer, BUFFERSZ);
if (*buffer == '\n')
{
lineCount++;
}
}
printf("Linecount: %i \n", lineCount );
Working with open, read and write, is really not much different from working with fopen, fread, and fwrite (or fgets and fprintf) aside from the burden of any conversions, counting of bytes, and setting created file permission bits being on you. When write writes a value such as 1020 to a file, it is writing the number of bytes you tell it to write, and the number will exist in the file in with the same endianness as your hardware uses.
For example if you have unsigned v = 1020; (0x3fc in hex) and then write (fd, &v, sizeof v);, when you look at your file with hexdump or od (or the like), it will contain fc 03 00 00 (assuming your hardware is little-endian). Those are your 4-bytes of an unsigned value 1020. You can't open the file in a text editor and expect to see ASCII characters, because that isn't what was written to the file.
To find the number of lines in a file using open and read, you basically want to open the file, read the file into a buffer some reasonable number of bytes at a time and count the '\n' characters in the file.
(note: you will also want to check if the last character read from the file is something other than a '\n'. If it is you will want to add +1 to your line-count to account for the non-POSIX line end of the final line.)
The only other caveat is to pay attention to the mode (permissions) for any newly created file you open for writing. Otherwise, you will find yourself without access to the newly created file. That is why open provides mode_t mode as the third argument in the case the O_CREAT flag is provided.
If you are going to stay true to only using open, read, write for your program I/O, then you will have to provide error message output to the terminal STDERR_FILENO in the event of an error. You may want a short helper function that will write string messages for that purpose.
Putting the pieces together, you could do something like the following while staying true to your cause. The following code takes the infile and outfile names as the first two arguments to the program, reads infile 65K bytes at a time, counts the '\n's in the file and then writes the result to outfile accounting for any non-POSIX line end for the file. writeliteral is provided as a helper for the error messages:
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
enum { BUFFERSZ = 1 << 16 }; /* 65K buffer size */
void writeliteral (int fildes, const char *s);
int main (int argc, char **argv) {
if (argc < 3) {
writeliteral (STDERR_FILENO, "error: insufficient input.\n");
writeliteral (STDERR_FILENO, "usage: progname infile outfile\n");
return 1;
}
char buf[BUFFERSZ] = "";
unsigned i = 0, nlines = 0;
ssize_t n = 0;
mode_t mode = S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH;
int fd = open (argv[1], O_RDONLY);
if (fd == -1) { /* validate file open for reading */
writeliteral (STDERR_FILENO, "error: infile open failed.\n");
return 1;
}
while ((n = read (fd, buf, sizeof buf)) > 0) /* read 65k chars */
for (i = 0; i < n; i++) /* count newlines in buf */
if (buf[i] == '\n')
nlines++;
if (buf[i - 1] != '\n') /* account for non-POSIX line end */
nlines++;
close (fd); /* close file */
/* open outfile for writing, create if it doesn't exist */
if ((fd = open (argv[2], O_WRONLY | O_CREAT, mode)) == -1) {
writeliteral (STDERR_FILENO, "error: outfile open failed.\n");
return 1;
}
write (fd, &nlines, sizeof nlines); /* write nlines to outfile */
close (fd); /* close file */
return 0;
}
/** write a string literal to 'fildes' */
void writeliteral (int fildes, const char *s)
{
size_t count = 0;
const char *p = s;
for (; *p; p++) {}
count = p - s;
write (fildes, s, count);
}
Example Input File
$ nl -ba ../dat/captnjack.txt
1 This is a tale
2 Of Captain Jack Sparrow
3 A Pirate So Brave
4 On the Seven Seas.
Example Use/Output
$ ./bin/readwrite_lineno ../dat/captnjack.txt ../dat/jacklines.dat
$ hexdump -n 16 -C ../dat/jacklines.dat
00000000 04 00 00 00 |....|
00000004
Look it over and let me know if you have any questions. It shows you why you may appreciate the printf family of functions format specifiers even more when you are done.
Your code only checks *buffer for newlines, which is the first character of each BUFFERSZ chunk you read, i.e. your code doesn't even look at most of the input. (It also doesn't check for end-of-file correctly: you need to look at read's return value for that.)
Here's a simple solution that emulates fgetc using read:
size_t lines = 0;
char c;
while (read(in_fd, &c, 1) == 1) {
if (c == '\n') {
lines++;
}
}
printf("Linecount: %zu\n", lines);
If you can't use printf either, a quick workaround is:
static void print_n(size_t n) {
if (n / 10) {
print_n(n / 10);
}
char c = '0' + n % 10;
write(1, &c, 1);
}
...
write(1, "Linecount: ", strlen("Linecount: "));
print_n(lines);
write(1, "\n", 1);
Reference : Count number of line using C
Use the code
FILE *fp = fopen("myfile.txt");
int ch;
int count=0;
do { ch = fgetc(fp);
if( ch== '\n')
count++;
}while( ch != EOF );
printf("Total number of lines %d\n",count);

How to keep track of how many read/write operations are performed...?

For class I was given this, "Develop a C program that copies an input file to an output file and counts the number of read/write operations." I know how to do the action copying the input file to the output file, but I am not entirely sure how to keep track of how many read/write operation were performed. This program is supposed to repeat the copying using different buffer sizes and output a listing of the number of read/write operations performed with each buffer size. I am just not sure how to do the part of counting the r/w operations. How could one go about doing this? Thank you in advance.
Here is my current code (updated):
#include <stdio.h>
#include "apue.h"
#include <fcntl.h>
#define BUFFSIZE 1
int main(void)
{
int n;
char buf[BUFFSIZE];
int input_file;
int output_file;
int readCount = 0;
int writeCount = 0;
input_file = open("test.txt", O_RDONLY);
if(input_file < 0)
{
printf("could not open file.\n");
}
output_file = creat("output.txt", FILE_MODE);
if(output_file < 0)
{
printf("error with output file.\n");
}
while((n = read(input_file, buf, BUFFSIZE)) > 0)
{
readCount++;
if(write(output_file, buf, n) == n){
writeCount++;
}else{
printf("Error writing");
}
}
if(n < 0)
{
printf("reading error");
}
printf("read/write count: %d\n", writeCount + readCount);
printf("read = %d\n", readCount);
printf("write = %d\n", writeCount);
}
And for the text file: test one two
The result is:
read/write count: 26
read = 13
write = 13
Process returned 0 (0x0) execution time : 0.003 s
Press ENTER to continue.
I was thinking that the write would be 12...but I am not sure...
You will need to increment a variable every time you call a function that does reading or writing. You may do that by making a function that wraps the standard i/o function.
For example, replace fread with something like this:
size_t fread_count(void *p, size_t size, size_t num, FILE *f){
iocount++;
return fread(p, size, num, f);
}
iocount would have to be in scope (e.g. global)
If you need to count reads and writes separately, use separate variables.
One that you increment for reads and one that you increment for writes.
-edit-
since you are using write() and read(), you could easily make an equivalent
function like above but using write and read instead of fwrite and fread
to help with trying different buffer sizes:
1) put the open/read/write/close and char buffer[], read/write counters, final printf statement for counters, etc into a separate function
2) in main(), add a table that contains the buffer sizes to be tried.
3) call the new function from main(),
including a parameter that indicates the buffer size to use

Read all characters written in FIFO using open() system call

I have a FIFO pipe, which is opened at both ends using open() in O_RDWR mode. At the reading end, read() is not reading all the characters, but lesser than that specified in the call. Is there a way to ensure that all characters are read using open()?
Thanks in advance
if (p != NULL){
printf("Inside p not null!\n");
if((fd = open(p, O_RDWR)) < 0){
perror("File could not be opened!");
exit(EXIT_FAILURE);
}
//FILE *rdptr = fopen(p,"r");
memset(buf,0,file_len);
rc = read(fd, buf, file_len);
printf("Number of bytes read: %d\n", rc);
printf("Data detected on FIFO\n");
buf[rc] = '\0';
char base[20] = "output.txt";
char name[20];
sprintf(name, "%d%s", suffix, base);
FILE *fptr = fopen(name,"ab+");
fd_wr = open(name,O_WRONLY);
charnum = write(fd_wr,buf,rc);
kill(id_A, SIGKILL);
//printf("No. of characters written: %d\n",charnum);
//FD_CLR(fd, &rdfs);
}
First minor comment: you should use O_RDONLY to open the file: don't use more permissions than necessary.
Second issue: if file_len is very large, it's possible that the writer has blocked trying to write the entire chunk of data (since a FIFO can only hold a certain amount of unread data). If that's the case, then read will only read the data that has been stored in the FIFO, and will immediately return with whatever it could read. This will allow the writer to write more bytes, which will then be read in the next read.
You should loop reads, adjusting an offset into the buffer, until the entire file_len bytes are read. Something like this:
size_t offset = 0;
while(offset < file_len) {
rc = read(fd, buf+offset, file_len-offset);
if(rc < 0) {
/* handle I/O error or something... */
} else {
offset += rc;
}
}

Resources