Using system calls to implement the unix cat command - c

For my OS class I have the assignment of implementing Unix's cat command with system calls (no scanf or printf). Here's what I got so far:
(Edited thanks to responses)
#include <sys/types.h>
#include <unistd.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
main(void)
{
int fd1;
int fd2;
char *buffer1;
buffer1 = (char *) calloc(100, sizeof(char));
char *buffer2;
buffer2 = (char *)calloc(100, sizeof(char));
fd1 = open("input.in", O_RDONLY);
fd2 = open("input2.in", O_RDONLY);
while(eof1){ //<-lseek condition to add here
read (fd1, buffer1, /*how much to read here?*/ );
write(1, buffer1, sizeof(buffer1)-1);
}
while (eof2){
read (fd2,buffer2, /*how much to read here?*/);
write(1, buffer2, sizeof(buffer2)-1);
}
}
The examples I have seen only show read with a known number of bytes. I don't know how much bytes each of the read files will have, so how do I specify read's last paramether?

Before you can read into a buffer, you have to allocate one. Either on the stack (easiest) or with mmap.
perror is a complicated library function, not a system call.
exit is not a system call on Linux. But _exit is.
Don't write more bytes than you have read before.
Or, in general: Read the documentation on all these system calls.
Edit: Here is my code, using only system calls. The error handling is somewhat limited, since I didn't want to re-implement perror.
#include <fcntl.h>
#include <unistd.h>
static int
cat_fd(int fd) {
char buf[4096];
ssize_t nread;
while ((nread = read(fd, buf, sizeof buf)) > 0) {
ssize_t ntotalwritten = 0;
while (ntotalwritten < nread) {
ssize_t nwritten = write(STDOUT_FILENO, buf + ntotalwritten, nread - ntotalwritten);
if (nwritten < 1)
return -1;
ntotalwritten += nwritten;
}
}
return nread == 0 ? 0 : -1;
}
static int
cat(const char *fname) {
int fd, success;
if ((fd = open(fname, O_RDONLY)) == -1)
return -1;
success = cat_fd(fd);
if (close(fd) != 0)
return -1;
return success;
}
int
main(int argc, char **argv) {
int i;
if (argc == 1) {
if (cat_fd(STDIN_FILENO) != 0)
goto error;
} else {
for (i = 1; i < argc; i++) {
if (cat(argv[i]) != 0)
goto error;
}
}
return 0;
error:
write(STDOUT_FILENO, "error\n", 6);
return 1;
}

You need to read as many bytes as will fit in the buffer. Right now, you don't have a buffer yet, all you got is a pointer to a buffer. That isn't initialized to anything. Chicken-and-egg, you therefore don't know how many bytes to read either.
Create a buffer.

There is usually no need to read the entire file in one gulp. Choosing a buffer size that is the same or a multiple of the host operating system's memory page size is a good way to go. 1 or 2 X the page size is probably good enough.
Using buffers that are too big can actually cause your program to run worse because they put pressure on the virtual memory system and can cause paging.

You could use open, fstat, mmap, madvise and write to make a very efficient cat command.
If Linux specific you could use open, fstat, fadvise and splice to make an even more efficient cat command.
The advise calls are to specify the SEQUENTIAL flags which will tell the kernel to do aggressive read-ahead on the file.
If you like to be polite to the rest of the system and minimize buffer cache use, you can do your copy in chunks of 32 megabytes or so and use the advise DONTNEED flags on the parts already read.
Note:
The above will only work if the source is a file. If the fstat fails to provide a size then you must fall back to using an allocated buffer and read, write. You can use splice too.

Use the stat function to find the size of your files before you read them. Alternatively, you can read chunks until you get an EOF.

Related

Can not read from a pipe, and another stdin issue

So, I asked here just a while ago, but half of that question was just me being dumb. And I still have issues. I hope that this will be clearer than the question before.
I'm writing POSIX cat, I nearly got it working, but I have couple of issues:
My cat can not read from a pipe and I really do not know why (redirecting (<) works fine)
I can not figure out how to make it continuously read stdin, without some issues. I had a version that worked "fine", but would create a stack-overflow. The other version wouldn't stop reading from stdin if there was only stdin i.e.: my-cat < file would read from stdin until it got terminated which it shouldn't, but it has to read from stdin and wait for termination if no files are suplied.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/stat.h>
#include <fcntl.h>
int main(int argc, char *argv[])
{
char opt;
while ((opt = getopt(argc, argv, "u")) != EOF) {
switch(opt) {
case 'u':
/* Make the output un-buffered */
setbuf(stdout, NULL);
break;
default:
break;
}
}
argc -= optind;
argv += optind;
int i = 0, fildes, fs = 0;
do {
/* Check for operands, if none or operand = "-". Read from stdin */
if (argc == 0 || !strcmp(argv[i], "-")) {
fildes = STDIN_FILENO;
} else {
fildes = open(argv[i], O_RDONLY);
}
/* Check for directories */
struct stat fb;
if (!fstat(fildes, &fb) && S_ISDIR(fb.st_mode)) {
fprintf(stderr, "pcat: %s: Is a directory\n", argv[i]);
i++;
continue;
}
/* Get file size */
fs = fb.st_size;
/* If bytes are read, write them to stdout */
char *buf = malloc(fs * sizeof(char));
while ((read(fildes, buf, fs)) > 0)
write(STDOUT_FILENO, buf, fs);
free(buf);
/* Close file if it's not stdin */
if (fildes != STDIN_FILENO)
close(fildes);
i++;
} while (i < argc);
return 0;
}
Pipes don't have a size, and nor do terminals. The contents of the st_size field is undefined for such files. (On my system it seems to always contain 0, but I don't think there is any cross-platform guarantee of that.)
So your plan of reading the entire file at one go and writing it all out again is not workable for non-regular files, and is risky even for them (the read is not guaranteed to return the full number of bytes requested). It's also an unnecessary memory hog if the file is large.
A better strategy is to read into a fixed-size buffer, and write out only the number of bytes you successfully read. You repeat this until end-of-file is reached, which is indicated by read() returning 0. This is how you solve your second problem.
On a similar note, write() is not guaranteed to write out the full number of bytes you asked it to, so you need to check its return value, and if it was short, try again to write out the remaining bytes.
Here's an example:
#define BUFSIZE 65536 // arbitrary choice, can be tuned for performance
ssize_t nread;
char buf[BUFSIZE]; // or char *buf = malloc(BUFSIZE);
while ((nread = read(filedes, buf, BUFSIZE)) > 0) {
ssize_t written = 0;
while (written < nread) {
ssize_t ret = write(STDOUT_FILENO, buf + written, nread - written);
if (ret <= 0)
// handle error
written += ret;
}
}
if (nread < 0)
// handle error
As a final comment, your program lacks error checking in general; e.g. if the file cannot be opened, it will proceed anyway with filedes == -1. It is important to check the return value of every system call you issue, and handle errors accordingly. This would be essential for a program to be used in real life, and even for toy programs created just as an exercise, it will be very helpful in debugging them. (Error checking would probably have given you some clues in figuring out what was wrong with this program, for instance.)
Your cat (You can call it my-cat, but I preferred to call it felix, just permit me the pun) should be used with stdio all the time to get the benefit of the buffering done by the stdio package. Below is a simplified version of cat using exclusively stdio package (almost exactly equal as it appears in K&R) and you'll see that is completely efficient as shown (you will see that the structure is almost exactly as yours, but I simplify the processing of the data copy /like K&R book/ and the processing of arguments /yours is a bit meshy/):
felix.c
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <getopt.h>
#define ERR(_code, _fmt, ...) do { \
fprintf(stderr,"%s: " _fmt, progname, \
##__VA_ARGS__); \
if (_code) exit(_code); \
} while (0)
char *progname = "cat";
void process(FILE *f);
int main(int argc, char **argv)
{
int opt;
while ((opt = getopt(argc, argv, "u")) != EOF) {
switch (opt) {
case 'u': setbuf(stdout, NULL); break;
}
}
/* for the case it has been renamed, calculate the basename
* of argv[0] (progname is used in the macro ERR above) */
progname = strrchr(argv[0], '/');
progname = progname
? progname + 1
: argv[0];
/* shift options */
argc -= optind;
argv += optind;
if (argc) {
int i;
for (i = 0; i < argc; i++) {
FILE *f = fopen(argv[i], "r");
if (!f) {
ERR(EXIT_FAILURE,
"%s: %s (errno = %d)\n",
argv[i], strerror(errno), errno);
}
process(f);
fclose(f);
}
} else {
process(stdin);
}
exit(EXIT_SUCCESS);
}
/* you don't need to complicate here, fgetc and putchar use buffering as you stated in main
* (no output buffering if you do the setbuf(NULL) and input buffering all the time). The buffer
* size is best to leave stdio to calculate it, as it queries the filesystem to get the best
* input/output size and create buffers this size. and the processing is simple with a loop like
* the one below. You'll get no appreciable difference between this and any other input/output.
* you can believe me, I've tested it. */
void process(FILE *f)
{
int c;
while ((c = fgetc(f)) != EOF) {
putchar(c);
}
}
As you see, nothing has been specially done to support redirection, as redirection is not done inside a program, but done by the program that calls it (in this case by the shell) When you start a program, you receive three already open file descriptors. These are the ones that the shell is using, or the ones that the shell just puts in the places of 0, 1, and 2 before starting your program. So your program has nothing to do to cope with redirection. Everything is done (in this case) in the shell... and this is why your program redirection works, even if you have not done anything for it to work. You have only to do redirection if you are going to call a program with its input, output or standard error redirected somewhere (and this somewhere is not the standard input, output or error you have received from your parent process)... but this is not the case of my-cat.

Why can I not mmap /proc/self/maps?

To be specific: why can I do this:
FILE *fp = fopen("/proc/self/maps", "r");
char buf[513]; buf[512] = NULL;
while(fgets(buf, 512, fp) > NULL) printf("%s", buf);
but not this:
int fd = open("/proc/self/maps", O_RDONLY);
struct stat s;
fstat(fd, &s); // st_size = 0 -> why?
char *file = mmap(0, s.st_size /*or any fixed size*/, PROT_READ, MAP_PRIVATE, fd, 0); // gives EINVAL for st_size (because 0) and ENODEV for any fixed block
write(1, file, st_size);
I know that /proc files are not really files, but it seems to have some defined size and content for the FILE* version. Is it secretly generating it on-the-fly for read or something? What am I missing here?
EDIT:
as I can clearly read() from them, is there any way to get the possible available bytes? or am I stuck to read until EOF?
They are created on the fly as you read them. Maybe this would help, it is a tutorial showing how a proc file can be implemented:
https://devarea.com/linux-kernel-development-creating-a-proc-file-and-interfacing-with-user-space/
tl;dr: you give it a name and read and write handlers, that's it. Proc files are meant to be very simple to implement from the kernel dev's point of view. They do not behave like full-featured files though.
As for the bonus question, there doesn't seem to be a way to indicate the size of the file, only EOF on reading.
proc "files" are not really files, they are just streams that can be read/written from, but they contain no pyhsical data in memory you can map to.
https://tldp.org/LDP/Linux-Filesystem-Hierarchy/html/proc.html
As already explained by others, /proc and /sys are pseudo-filesystems, consisting of data provided by the kernel, that does not really exist until it is read – the kernel generates the data then and there. Since the size varies, and really is unknown until the file is opened for reading, it is not provided to userspace at all.
It is not "unfortunate", however. The same situation occurs very often, for example with character devices (under /dev), pipes, FIFOs (named pipes), and sockets.
We can trivially write a helper function to read pseudofiles completely, using dynamic memory management. For example:
// SPDX-License-Identifier: CC0-1.0
//
#define _POSIX_C_SOURCE 200809L
#define _ATFILE_SOURCE
#define _GNU_SOURCE
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
#include <errno.h>
/* For example main() */
#include <stdio.h>
/* Return a directory handle for a specific relative directory.
For absolute paths and paths relative to current directory, use dirfd==AT_FDCWD.
*/
int at_dir(const int dirfd, const char *dirpath)
{
if (dirfd == -1 || !dirpath || !*dirpath) {
errno = EINVAL;
return -1;
}
return openat(dirfd, dirpath, O_DIRECTORY | O_PATH | O_CLOEXEC);
}
/* Read the (pseudofile) contents to a dynamically allocated buffer.
For absolute paths and paths relative to current durectory, use dirfd==AT_FDCWD.
You can safely initialize *dataptr=NULL,*sizeptr=0 for dynamic allocation,
or reuse the buffer from a previous call or e.g. getline().
Returns 0 with errno set if an error occurs. If the file is empty, errno==0.
In all cases, remember to free (*dataptr) after it is no longer needed.
*/
size_t read_pseudofile_at(const int dirfd, const char *path, char **dataptr, size_t *sizeptr)
{
char *data;
size_t size, have = 0;
ssize_t n;
int desc;
if (!path || !*path || !dataptr || !sizeptr) {
errno = EINVAL;
return 0;
}
/* Existing dynamic buffer, or a new buffer? */
size = *sizeptr;
if (!size)
*dataptr = NULL;
data = *dataptr;
/* Open pseudofile. */
desc = openat(dirfd, path, O_RDONLY | O_CLOEXEC | O_NOCTTY);
if (desc == -1) {
/* errno set by openat(). */
return 0;
}
while (1) {
/* Need to resize buffer? */
if (have >= size) {
/* For pseudofiles, linear size growth makes most sense. */
size = (have | 4095) + 4097 - 32;
data = realloc(data, size);
if (!data) {
close(desc);
errno = ENOMEM;
return 0;
}
*dataptr = data;
*sizeptr = size;
}
n = read(desc, data + have, size - have);
if (n > 0) {
have += n;
} else
if (n == 0) {
break;
} else
if (n == -1) {
const int saved_errno = errno;
close(desc);
errno = saved_errno;
return 0;
} else {
close(desc);
errno = EIO;
return 0;
}
}
if (close(desc) == -1) {
/* errno set by close(). */
return 0;
}
/* Append zeroes - we know size > have at this point. */
if (have + 32 > size)
memset(data + have, 0, 32);
else
memset(data + have, 0, size - have);
errno = 0;
return have;
}
int main(void)
{
char *data = NULL;
size_t size = 0;
size_t len;
int selfdir;
selfdir = at_dir(AT_FDCWD, "/proc/self/");
if (selfdir == -1) {
fprintf(stderr, "/proc/self/ is not available: %s.\n", strerror(errno));
exit(EXIT_FAILURE);
}
len = read_pseudofile_at(selfdir, "status", &data, &size);
if (errno) {
fprintf(stderr, "/proc/self/status: %s.\n", strerror(errno));
exit(EXIT_FAILURE);
}
printf("/proc/self/status: %zu bytes\n%s\n", len, data);
len = read_pseudofile_at(selfdir, "maps", &data, &size);
if (errno) {
fprintf(stderr, "/proc/self/maps: %s.\n", strerror(errno));
exit(EXIT_FAILURE);
}
printf("/proc/self/maps: %zu bytes\n%s\n", len, data);
close(selfdir);
free(data); data = NULL; size = 0;
return EXIT_SUCCESS;
}
The above example program opens a directory descriptor ("atfile handle") to /proc/self. (This way you do not need to concatenate strings to construct paths.)
It then reads the contents of /proc/self/status. If successful, it displays its size (in bytes) and its contents.
Next, it reads the contents of /proc/self/maps, reusing the previous buffer. If successful, it displays its size and contents as well.
Finally, the directory descriptor is closed as it is no longer needed, and the dynamically allocated buffer released.
Note that it is perfectly safe to do free(NULL), and also to discard the dynamic buffer (free(data); data=NULL; size=0;) between the read_pseudofile_at() calls.
Because pseudofiles are typically small, the read_pseudofile_at() uses a linear dynamic buffer growth policy. If there is no previous buffer, it starts with 8160 bytes, and grows it by 4096 bytes afterwards until sufficiently large. Feel free to replace it with whatever growth policy you prefer, this one is just an example, but works quite well in practice without wasting much memory.

How to keep track of how many read/write operations are performed...?

For class I was given this, "Develop a C program that copies an input file to an output file and counts the number of read/write operations." I know how to do the action copying the input file to the output file, but I am not entirely sure how to keep track of how many read/write operation were performed. This program is supposed to repeat the copying using different buffer sizes and output a listing of the number of read/write operations performed with each buffer size. I am just not sure how to do the part of counting the r/w operations. How could one go about doing this? Thank you in advance.
Here is my current code (updated):
#include <stdio.h>
#include "apue.h"
#include <fcntl.h>
#define BUFFSIZE 1
int main(void)
{
int n;
char buf[BUFFSIZE];
int input_file;
int output_file;
int readCount = 0;
int writeCount = 0;
input_file = open("test.txt", O_RDONLY);
if(input_file < 0)
{
printf("could not open file.\n");
}
output_file = creat("output.txt", FILE_MODE);
if(output_file < 0)
{
printf("error with output file.\n");
}
while((n = read(input_file, buf, BUFFSIZE)) > 0)
{
readCount++;
if(write(output_file, buf, n) == n){
writeCount++;
}else{
printf("Error writing");
}
}
if(n < 0)
{
printf("reading error");
}
printf("read/write count: %d\n", writeCount + readCount);
printf("read = %d\n", readCount);
printf("write = %d\n", writeCount);
}
And for the text file: test one two
The result is:
read/write count: 26
read = 13
write = 13
Process returned 0 (0x0) execution time : 0.003 s
Press ENTER to continue.
I was thinking that the write would be 12...but I am not sure...
You will need to increment a variable every time you call a function that does reading or writing. You may do that by making a function that wraps the standard i/o function.
For example, replace fread with something like this:
size_t fread_count(void *p, size_t size, size_t num, FILE *f){
iocount++;
return fread(p, size, num, f);
}
iocount would have to be in scope (e.g. global)
If you need to count reads and writes separately, use separate variables.
One that you increment for reads and one that you increment for writes.
-edit-
since you are using write() and read(), you could easily make an equivalent
function like above but using write and read instead of fwrite and fread
to help with trying different buffer sizes:
1) put the open/read/write/close and char buffer[], read/write counters, final printf statement for counters, etc into a separate function
2) in main(), add a table that contains the buffer sizes to be tried.
3) call the new function from main(),
including a parameter that indicates the buffer size to use

Reading bytes from /dev/random fails

I have a piece of code written in POSIX compliant C and it doesn't seem to work correctly. The goal is to read from /dev/random, the interface to the Linux/BSD/Darwin kernel's random number generator and output the written byte to a file. I'm not quite sure what I'm overlooking as I'm sure I've covered every ground. Anyway, here it is:
int incinerate(int number, const char * names[]) {
if (number == 0) {
// this shouldn't happen, but if it does, print an error message
fprintf(stderr, "Incinerator: no input files\n");
return 1;
}
// declare some stuff we'll be using
long long lengthOfFile = 0, bytesRead = 0;
int myRandomInteger;
// open the random file block device
int zeroPoint = open("/dev/random", O_RDONLY);
// start looping through and nuking files
for (int i = 1; i < number; i++) {
int filePoint = open(names[i], O_WRONLY);
// get the file size
struct stat st;
stat(names[i], &st);
lengthOfFile = st.st_size;
printf("The size of the file is %llu bytes.\n", lengthOfFile);
while (lengthOfFile != bytesRead) {
read(zeroPoint, &myRandomInteger, sizeof myRandomInteger);
write(filePoint, (const void*) myRandomInteger, sizeof(myRandomInteger));
bytesRead++;
}
close(filePoint);
}
return 0;
}
Any ideas? This is being developed on OS X but I see no reason why it shouldn't also work on Linux or FreeBSD.
If it helps, I've included the following headers:
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/stat.h>
Instead of
write(filePoint, (const void*) myRandomInteger, sizeof(myRandomInteger));
you surely meant to write
write(filePoint, (const void*) &myRandomInteger, sizeof(myRandomInteger));
didn't you? If you use the random bytes read from /dev/random as a pointer, you're almost certain to encounter a segfault sooner or later.

How to Search for New Lines while Reading from a File in C/C++

I am implementing my own version of the ("cat") command in Unix for practice. After i did that i became interested in implementing some flags like (-n) and (-b).
My Question: I am looking for a way to locate the blank and new lines while reading from my file. I can't remember what library or function i should use.
Here is the source code I am working on:
#include <fcntl.h>
#include <unistd.h>
static int cat_fd(int fd)
{
char buf[4096];
ssize_t nread;
while ((nread = read(fd, buf, sizeof buf)) > 0)
{
ssize_t ntotalwritten = 0;
while (ntotalwritten < nread)
{
ssize_t nwritten = write(STDOUT_FILENO, buf + ntotalwritten, nread - ntotalwritten);
if (nwritten < 1)
{
return -1;
}
ntotalwritten += nwritten;
}
}
return (nread == 0) ? 0 : -1;
}
static int cat(const char *fname)
{
int fd, success;
if ((fd = open(fname, O_RDONLY)) == -1)
{
return -1;
}
success = cat_fd(fd);
if (close(fd) != 0)
{
return -1;
}
return success;
}
int main(int argc, char **argv)
{
int i;
if (argc == 1)
{
if (cat_fd(STDIN_FILENO) != 0)
goto error;
}
else
{
for (i = 1; i < argc; i++)
{
if (cat(argv[i]) != 0)
{
goto error;
}
}
}
return 0;
error:
write(STDOUT_FILENO, "error\n", 6);
return 1;
}
Any ideas or suggestions concerning my question are greatly appreciated.
I would be even more grateful if you can type for me the complete function prototype that i shall be using as i am not an experienced programmer.
Thanks in advance for your help.
P.S: I am implementing the (-n) and (-b) flags. Thus, i am looking forward to write the line number at the beginning of each line in the file that i am reading.
While there is a function that does line-based file input in C (it's called fgets), you can't really use it for cat, because:
There's no way to know the maximum length of the line beforehand;
You'll lose portions of the input if it contains null bytes.
You'll have to look for newline symbols in your buffer after you read it, and once you find any, print the prefix of the buffer, followed by newline, line number, and the rest of the buffer (with additional processing of remaining newlines, of course).
An easier solution would be to switch to processing input one byte at a time; you can use FILE* and fgetc to use CRT-provided buffering so that you don't actually do a syscall for each read/write, or read file in blocks as you do now, and do byte processing inside the loop. Then it's a matter of writing a state machine - if a previous read character was a newline, then output a line number, unless this character is a newline and -b option is used, etc.
This still results in a less efficient solution, so you may want to treat cat without arguments specially - i.e. switch to byte-per-byte processing only if you need it. In fact, this is exactly what at least one of actual cat implementations does.
I recall reading that cat memory maps files for fast execution. Use mmap(2).
http://kernel.org/doc/man-pages/online/pages/man2/munmap.2.html
I found this example: http://ladweb.net/src/map-cat.c
I know this doesn't answer your question about newlines. I guess
memchr() would do the trick.

Resources