C program to use Unix System call for I/O [closed] - c

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 7 years ago.
Improve this question
My professor asked me to write a simple C program, then asked me to convert using Unix system calls. I have try changing the values around but nothing is working.
Requirement:
Write a new C program newcat, which performs exactly as oldcat, but uses the following UNIX system calls for I/O.
int read(int fd, char *buf, int n);
int write(int fd, char *buf, int n);
int open(char *name, int accessmode, int permission);
int close(int fd);
To open a file for read, you can use the symbolic constant O_RDONLY defined in fcntl.h header file to specify the accessmode. Simply pass 0 for permission. That is, the code will appear as follows:
fd = open (filename, O_RDONLY, 0);
You will need the following header files: sys/types.h, unistd.h and fcntl.h
#include <stdio.h>
/* oldcat: Concatenate files */
int main(int argc, char *argv[])
{
void filecopy(FILE *, FILE *); /* prototype for function */
int fd = open(*fp, O_RDONLy,0)
char *prog = argv[0]; /* program name for errors */
if (argc == 1) /* no args; copy standard input */
filecopy(0, 1);
else
while (--argc > 0)
if (fd == -1) {
fprintf(stderr, "%s: can't open %s\n", prog, *argv);
return(-1);
} else {
filecopy(fp, 1);
fclose(fp);
}
return(0);
}
/* filecopy: copy file ifp to ofp */
void filecopy(FILE *ifp, FILE *ofp)
{
int c;
while ((c = getc(ifp)) != EOF)
putc(c, ofp);
}
Is this the write idea? It still won't compile:
#include <stdio.h>
/* oldcat: Concatenate files */
int main(int argc, char *argv[])
{
void filecopy(int ifp, int ifo);
int fd = open(*File,O_RDONLY,0); //is this correct?
char *prog = argv[0];
if (argc == 1) /* no args; copy standard input */
filecopy(0, 1); //is this correct?
else
while (--argc > 0)
if ((fd == -1) //is this correct?{
fprintf(stderr, "%s: can't open %s\n", prog, *argv);
return(-1);
} else {
filecopy(*FILE, 1);//is this correct?
close(*FILE);//is this correct?
}
return(0);
}
/* filecopy: copy file ifp to ofp */
void filecopy(FILE *ifp, FILE *ofp)//NO CLUE HOW THIS SHOULD BE
{
int c;
while (c = read(fd ,&something,1)//What is &ch/&something?
putc(c, ofp);
}

Assuming your oldcat uses the C standard library calls (like fopen), it's a simple matter of mapping those to the UNIX calls.
At a high level:
fopen -> open
fread -> read
fwrite -> write
fclose -> close
For example, when opening your input file with:
FILE *fIn = fopen ("jargon.txt", "r");
you could instead use:
int inFd = open ("jargon.txt", O_RDONLY, 0);
The other calls are very similar, with similar functionality at the C standard library and UNIX system call levels. Details on those calls can usually be obtained from the manpages by entering something like man 2 open into your shell, or by plugging man open into your favourite search engine.
The only "tricky" mapping is if you've used getchar/putchar-style calls to do the actual reading and writing but that too becomes easy when you realise that (for example) reading a character is functionally identical to reading a block of size one:
int c = getc (fIn);
or:
char c;
int numread = read (inFd, &c, 1);
For your added question:
So to open a file: if (fd = open (fp, O_RDONLY, 0); ) == NULL)
Not quite. The fopen function returns NULL on error because it returns a pointer to a FILE structure.
The lower level calls use file descriptors rather than file handles, the former being a small integer value. So, instead of:
FILE *fp = fopen ("nosuchfile", "r");
if (fp == NULL) doSomethingIntelligent();
you would do something like:
int fd = open ("nosuchfile", O_RDONLY, 0);
if (fd == -1) doSomethingIntelligentUsing (errno);
In terms of what you need to change, the following comes off the top of my head (so may not be totally exhaustive but should be a very good start):
Add the required headers.
Stop using FILE* totally, using int instead.
Translate the fopen/fclose calls to open/close. This includes the function name, different parameters and different return types.
Modify filecopy to use file descriptors rather than file handles.
use 1 instead of stdout when calling filecopy (the latter is a FILE *).
As an example of how to do this, the following program testprog.c will read itself and echo each character to standard output:
#include <stdio.h>
#include <errno.h>
#include <fcntl.h>
int main (void) {
int num, ch, inFd;
// Open as read only.
inFd = open ("testprog.c", O_RDONLY, 0);
if (inFd == -1)
printf ("\n**Error %d opening file\n", errno);
// Get and output esach char until EOF/error.
while ((num = read (inFd, &ch, 1) != 0) == 1)
putchar (ch);
// Detect error.
if (num != 0)
printf ("\n**Error %d reading file\n", errno);
// Close file and exit.
close (inFd);
return 0;
}

Please note that documentation of linux sys calls is present in manual called man pages which you can access by using man command in bash shell in a linux system. As UNIX and Linux are quite similar (maybe equivalent) for the syscalls you are interested in you can check the man page for those syscalls in Linux.
All the four read, write, open and close linux syscalls are explained in man pages. You can access the manual page for these syscalls by typing below commands in shell:
man 2 read
man 2 write
man 2 open
man 2 close
These should probably guide you to right direction.

#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
#include <fcntl.h>
/* newcat: Concatenate files */
int main(int argc, char *argv[])
{
void filecopy(int ifp, int ofp); /* prototype for function */
int fd;
char *prog = argv[0]; /* program name for errors */
if (argc == 1) /* no args; copy standard input */
filecopy(0,1);
else
while (--argc > 0)
fd = open(*++argv , O_RDONLY,0);
if ( fd == -1) {
fprintf(stderr, "%s: can't open %s\n", prog, *argv);
return(-1);
} else {
filecopy(fd, 1);
close(fd);
}
return(0);
}
/* filecopy: copy file ifp to ofp */
void filecopy(int ifp, int ofp)
{
int c;
while (read(ifp,&c,ofp ) != 0)
write(ofp,&c,ofp);
}

Related

Can not read from a pipe, and another stdin issue

So, I asked here just a while ago, but half of that question was just me being dumb. And I still have issues. I hope that this will be clearer than the question before.
I'm writing POSIX cat, I nearly got it working, but I have couple of issues:
My cat can not read from a pipe and I really do not know why (redirecting (<) works fine)
I can not figure out how to make it continuously read stdin, without some issues. I had a version that worked "fine", but would create a stack-overflow. The other version wouldn't stop reading from stdin if there was only stdin i.e.: my-cat < file would read from stdin until it got terminated which it shouldn't, but it has to read from stdin and wait for termination if no files are suplied.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/stat.h>
#include <fcntl.h>
int main(int argc, char *argv[])
{
char opt;
while ((opt = getopt(argc, argv, "u")) != EOF) {
switch(opt) {
case 'u':
/* Make the output un-buffered */
setbuf(stdout, NULL);
break;
default:
break;
}
}
argc -= optind;
argv += optind;
int i = 0, fildes, fs = 0;
do {
/* Check for operands, if none or operand = "-". Read from stdin */
if (argc == 0 || !strcmp(argv[i], "-")) {
fildes = STDIN_FILENO;
} else {
fildes = open(argv[i], O_RDONLY);
}
/* Check for directories */
struct stat fb;
if (!fstat(fildes, &fb) && S_ISDIR(fb.st_mode)) {
fprintf(stderr, "pcat: %s: Is a directory\n", argv[i]);
i++;
continue;
}
/* Get file size */
fs = fb.st_size;
/* If bytes are read, write them to stdout */
char *buf = malloc(fs * sizeof(char));
while ((read(fildes, buf, fs)) > 0)
write(STDOUT_FILENO, buf, fs);
free(buf);
/* Close file if it's not stdin */
if (fildes != STDIN_FILENO)
close(fildes);
i++;
} while (i < argc);
return 0;
}
Pipes don't have a size, and nor do terminals. The contents of the st_size field is undefined for such files. (On my system it seems to always contain 0, but I don't think there is any cross-platform guarantee of that.)
So your plan of reading the entire file at one go and writing it all out again is not workable for non-regular files, and is risky even for them (the read is not guaranteed to return the full number of bytes requested). It's also an unnecessary memory hog if the file is large.
A better strategy is to read into a fixed-size buffer, and write out only the number of bytes you successfully read. You repeat this until end-of-file is reached, which is indicated by read() returning 0. This is how you solve your second problem.
On a similar note, write() is not guaranteed to write out the full number of bytes you asked it to, so you need to check its return value, and if it was short, try again to write out the remaining bytes.
Here's an example:
#define BUFSIZE 65536 // arbitrary choice, can be tuned for performance
ssize_t nread;
char buf[BUFSIZE]; // or char *buf = malloc(BUFSIZE);
while ((nread = read(filedes, buf, BUFSIZE)) > 0) {
ssize_t written = 0;
while (written < nread) {
ssize_t ret = write(STDOUT_FILENO, buf + written, nread - written);
if (ret <= 0)
// handle error
written += ret;
}
}
if (nread < 0)
// handle error
As a final comment, your program lacks error checking in general; e.g. if the file cannot be opened, it will proceed anyway with filedes == -1. It is important to check the return value of every system call you issue, and handle errors accordingly. This would be essential for a program to be used in real life, and even for toy programs created just as an exercise, it will be very helpful in debugging them. (Error checking would probably have given you some clues in figuring out what was wrong with this program, for instance.)
Your cat (You can call it my-cat, but I preferred to call it felix, just permit me the pun) should be used with stdio all the time to get the benefit of the buffering done by the stdio package. Below is a simplified version of cat using exclusively stdio package (almost exactly equal as it appears in K&R) and you'll see that is completely efficient as shown (you will see that the structure is almost exactly as yours, but I simplify the processing of the data copy /like K&R book/ and the processing of arguments /yours is a bit meshy/):
felix.c
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <getopt.h>
#define ERR(_code, _fmt, ...) do { \
fprintf(stderr,"%s: " _fmt, progname, \
##__VA_ARGS__); \
if (_code) exit(_code); \
} while (0)
char *progname = "cat";
void process(FILE *f);
int main(int argc, char **argv)
{
int opt;
while ((opt = getopt(argc, argv, "u")) != EOF) {
switch (opt) {
case 'u': setbuf(stdout, NULL); break;
}
}
/* for the case it has been renamed, calculate the basename
* of argv[0] (progname is used in the macro ERR above) */
progname = strrchr(argv[0], '/');
progname = progname
? progname + 1
: argv[0];
/* shift options */
argc -= optind;
argv += optind;
if (argc) {
int i;
for (i = 0; i < argc; i++) {
FILE *f = fopen(argv[i], "r");
if (!f) {
ERR(EXIT_FAILURE,
"%s: %s (errno = %d)\n",
argv[i], strerror(errno), errno);
}
process(f);
fclose(f);
}
} else {
process(stdin);
}
exit(EXIT_SUCCESS);
}
/* you don't need to complicate here, fgetc and putchar use buffering as you stated in main
* (no output buffering if you do the setbuf(NULL) and input buffering all the time). The buffer
* size is best to leave stdio to calculate it, as it queries the filesystem to get the best
* input/output size and create buffers this size. and the processing is simple with a loop like
* the one below. You'll get no appreciable difference between this and any other input/output.
* you can believe me, I've tested it. */
void process(FILE *f)
{
int c;
while ((c = fgetc(f)) != EOF) {
putchar(c);
}
}
As you see, nothing has been specially done to support redirection, as redirection is not done inside a program, but done by the program that calls it (in this case by the shell) When you start a program, you receive three already open file descriptors. These are the ones that the shell is using, or the ones that the shell just puts in the places of 0, 1, and 2 before starting your program. So your program has nothing to do to cope with redirection. Everything is done (in this case) in the shell... and this is why your program redirection works, even if you have not done anything for it to work. You have only to do redirection if you are going to call a program with its input, output or standard error redirected somewhere (and this somewhere is not the standard input, output or error you have received from your parent process)... but this is not the case of my-cat.

C program to open binary elf files, read from them, and print them out (like objcopy)

I am trying to implement a functionality similar to objcopy where bytes of a binary file (specifically the .text section) will be printed out using open() and read(). How would I set the buffer sizes and iterate till the end of a .text section so that I don't read more bytes than I have to in order to avoid errors?
Here is how you read a file using open() and read().
P.S I used fopen() and fread() instead of open() and read() because I am currently working with a Windows machine. However, the results will be the same for either.
int main()
{
FILE *file = fopen("input.txt", "r");
char buffer[2048];
if (file)
{
/* Loop will continue until an end of file is reached i.e. fread returns 0 elements read */
while (fread(buffer, 4, 1, file) == 1)
{
printf("%s", buffer);
}
fclose(file);
}
}
Update: For interpreting ELF files specifically, I would recommend taking a look at the following resources:
Check out the following code snippet. It shows how you can interpret an ELF file.
#include <stdio.h>
#include <libelf.h>
#include <stdlib.h>
#include <string.h>
static void failure(void);
void main(int argc, char **argv)
{
Elf32_Shdr *shdr;
Elf32_Ehdr *ehdr;
Elf *elf;
Elf_Scn *scn;
Elf_Data *data;
int fd;
unsigned int cnt;
/* Open the input file */
if ((fd = open(argv[1], O_RDONLY)) == -1)
exit(1);
/* Obtain the ELF descriptor */
(void)elf_version(EV_CURRENT);
if ((elf = elf_begin(fd, ELF_C_READ, NULL)) == NULL)
failure();
/* Obtain the .shstrtab data buffer */
if (((ehdr = elf32_getehdr(elf)) == NULL) ||
((scn = elf_getscn(elf, ehdr->e_shstrndx)) == NULL) ||
((data = elf_getdata(scn, NULL)) == NULL))
failure();
/* Traverse input filename, printing each section */
for (cnt = 1, scn = NULL; scn = elf_nextscn(elf, scn); cnt++)
{
if ((shdr = elf32_getshdr(scn)) == NULL)
failure();
(void)printf("[%d] %s\n", cnt,
(char *)data->d_buf + shdr->sh_name);
}
} /* end main */
static void
failure()
{
(void)fprintf(stderr, "%s\n", elf_errmsg(elf_errno()));
exit(1);
}
I would also recommend checking out the elfutils library, which can be found here.

Redirecting stdin and stdout?

So im trying to redirect the I/O to read command from file then when user runs the output command it will print the compiled command to output file.
For example on the terminal:
./run 2 < test.txt // This would take file using dup and take the input
Then when you want to output the compile:
./run 1 > output.txt // and it would put into an output file
So far i know how to output to a file but my problem is with the input. how do i get the command from the file using the dup2() function? I tried researching this but no luck.
#include <stdlib.h>
#include <stdio.h>
#include <fcntl.h>
#include <string.h>
#include <unistd.h>
char inputForOutput[100];
void functionOutput(int argc, char **argv){
int ofd; //Init of file desc.
ofd = open(argv[1], O_CREAT|O_TRUNC|O_WRONLY);
dup2(ofd, 1);//Duplicates to stdout
system("ls");//Copies commnd given to output_file
}
//Function is called when argument number is == 1
void functionInput(int argc, char **argv){
FILE *ifd;
printf("\n %s \n ", argv[2]);
ifd = fopen(argv[2] , "r");
if (ifd == NULL){
perror("No file found");
exit(1);
}
fscanf(ifd,"%s",inputForOutput);
printf("\n**%s**\n",inputForOutput);
}
int main(int argc, char **argv)
{
int output;
int input;
output = strcmp("1", argv[1]);
input = strcmp("2" ,argv[1]);
if (output == 0 ) { //Fail safe for number of arguments
functionOutput(argc, argv);
}
else if ( input == 0){
functionInput(argc, argv);
}
else{
fprintf(stderr, "How to use: %s function output_file\n", argv[0]); // FAIL SAFE IF INPUT DOES NOT MATCH BOTH FUNCTIONS
}
return 0;
}
To redirect input and output, use this format
myprogram > out.txt < in.txt //read from in.txt, write to out.txt
myprogram < in.txt > out.txt //read from in.txt, write to out.txt
myprogram < in.txt //redirect stdin only
myprogram > out.txt //redirect stdout only
myprogram //no redirection
...
This should work with any program. Example:
int main(void)
{
char buf[1000];
if(fgets(buf, sizeof(buf), stdin))
printf("write: %s\n", buf);
return 0;
}
To redirect stdin/stdout in the program, use the standard method
freopen("output.txt", "w", stdout);
printf("Testing...");
fclose(stdout);
freopen("input.txt", "r", stdin);
char buf[100];
fgets(buf, sizeof(buf), stdin);
fclose(stdin);
Alternatively, set FILE *fin = stdin; FILE* fout = stdout; to redirect the opposite way.
Next, to write a program using argv elements, always test argc first. The code below shows an example.
#include <stdio.h>
#include <string.h>
int redirect(int argc, char **argv, int *index)
{
//no more redirection!
if(*index >= argc)
return 1;
//not enough parameters
if(*index + 1 >= argc)
{
printf("wrong usage\n");
return 0;
}
if(strcmp(argv[*index], "<") == 0)
{
*index++; //next parameter is to redirect input
if(!freopen(argv[*index], "r", stdin))
printf("error, redirect input failed");
}
else if(strcmp(argv[*index], ">") == 0)
{
*index++; //next parameter is to redirect output
if(!freopen(argv[*index], "w", stdout))
printf("error, redirect output failed");
}
else
{
printf("wrong usage\n");
return 0;
}
return 1;
}
int main(int argc, char **argv)
{
int index = 1;
if(!redirect(argc, argv, &index))
return 1;
if(!redirect(argc, argv, &index))
return 1;
//read
char buf[1000];
if(fgets(buf, sizeof(buf), stdin))
{
//write
printf("write: %s\n", buf);
}
fclose(stdin);
fclose(stdout);
return 0;
}
With functionOutput() you have a good first attempt at capturing the output of a system command to a file. Actually, that is the function called when the first argument is 1, so you might want to update your comment. Also, you're creating a file with the name stored in argv[1], which we already know is 1 so it's probably not doing what you expect, and you probably want:
ofd = open(argv[2], O_CREAT|O_TRUNC|O_WRONLY);
With functionInput() you're reading the first non-whitespace entry from the file. If you're telling it to read the file which you output using the functionOutput() function, that is likely to be (some of) the name of the first file which was listed by ls.
I'm finding it unclear what you're wanting to do which isn't that. If you want to find out what the command was which you ran to generate the output, that information is not available from the file itself, because you didn't write it there. If that's what you want, you may want to consider writing the command as the first line of the file, followed by the output. Then when you read it, you can assume that the first line is the command run, followed by the output of that command.
If I understand your question, and you want to run your program in essentially two different modes, (1) you want to take input if there is input to be taken on stdin; and (2) if there is no input waiting, you want to do an output, then select/pselect or poll are what you are looking for.
For example select allows you to check whether there is input ready to be read on a file descriptor (or set of descriptors) and it will return the number of descriptors with input waiting (or -1 and set errno on error). You could simply use the STDIN_FILENO (a/k/a fd 0) to check if there is input on stdin, e.g.
#include <stdio.h>
#include <unistd.h> /* for STDIN_FILENO */
#include <sys/select.h> /* for pselect */
int input (int filedes)
{
fd_set set;
/* declare/initialize zero timeout */
struct timespec timeout = { .tv_sec = 0 };
/* Initialize the file descriptor set. */
FD_ZERO (&set);
FD_SET (filedes, &set);
/* check whether input is ready on filedes */
return pselect (filedes + 1, &set, NULL, NULL, &timeout, NULL);
}
int main (void)
{
if (input (STDIN_FILENO))
puts ("doing input routine");
else
puts ("doing output routine");
return 0;
}
(note: from the man page "select() uses a timeout that is a struct timeval (with seconds and microseconds), while pselect() uses a struct timespec (with seconds and nanoseconds).")
Example Use/Output
$ ./bin/select_peekstdin < file
doing input routine
$ ./bin/select_peekstdin
doing output routine

Segmentation Fault (Core Dumped) using read() and write()

I've been out of programming in C for almost 2 years and have recently gotten an assignment in school on using write() and read().
Somewhere in the code I'm receiving the Segmentation Fault error, possibly on the filecopy function is where I'd put my money on. I was trying GDB but I haven't used that since that last time I programmed in C so I turn to here.
The code.
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
#include <fcntl.h>
int main(int argc, char *argv[]) {
void filecopy(int infd, int outfd);
int fd = -1;
char *prog = argv[0];
if(argc == 1)
filecopy(STDIN_FILENO, STDOUT_FILENO);
else
while(--argc > 0) {
if((fd = open(*++argv, O_RDONLY, "rb")) == -1) {
// we don't have fprintf... but we have sprintf =]
char tmp[30];
sprintf(tmp, "%s: can't open %s\0", prog, *argv);
write(STDOUT_FILENO, &tmp, sizeof(tmp));
exit(-1);
} else {
filecopy(fd, STDOUT_FILENO);
close(fd);
}
}
exit(0);
}
void filecopy(int infd, int outfd) {
// char *buf[1]; <-- causes unreadable characters outputted by write
char *buf;
while(read(infd, buf, 1) != -1)
write(outfd, buf, sizeof(buf));
}
The input/output
Thanks!
char *buf; is an uninitialized pointer, writing data through that pointer is
undefined behaviour.
char buf[1024];
ssize_t len;
while((len = read(infd, buf, sizeof buf)) != -1)
write(outfd, buf, len);
would be correct.
Note that char *buf[1]; is a array (of dimension 1) of pointers, that's
different to an array of chars. Using that you would need to do
read(infd, buf[0], somelength), but here again buf[0] would be an
uninitialized pointer and you would have the same problem. That's why declaring
an char array of say 1024 (you can choose another size) is the correct thing
to do.
Also in main use strlen(tmp) and not sizeof(tmp)
char tmp[30];
sprintf(tmp, "%s: can't open %s\0", prog, *argv);
write(STDOUT_FILENO, &tmp, strlen(tmp));
strlen returns you the length of the string which might be smaller than 29 and
if you use sizeof(tmp) you might be writing garbage past the end of the
string. Note also that 0 may be too small for the whole string, I'd use a
larger number or construct the string using snprintf:
snprintf(tmp, sizeof tmp, "%s: can't open %s\0", prog, *argv);
would be more safe.
Last thing:
while(--argc > 0)
if((fd = open(*++argv, O_RDONLY, "rb")) == -1) {
...
While this is correct, I feel that this code is awkward and hard to read. It
would be so much simpler to read if you did:
for(int i = 1; i < argc; ++i)
if((fd = open(argv[i], O_RDONLY, "rb")) == -1) {
...
I've never seen open being called with "rb" as the mode. My man page says:
man 2 open
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int open(const char *pathname, int flags, mode_t mode);
[...]
The mode argument specifies the file mode bits be applied when a new file is created. This argument must be supplied when
O_CREAT or O_TMPFILE is specified in flags; if neither O_CREAT nor O_TMPFILE is specified, then mode is ignored.
The effective mode is modified by the process's umask in the usual way: in the absence of a default ACL, the mode of the created file is
(mode & ~umask). Note that this mode applies only to future accesses of the newly created file; the open() call that creates a
read-only file may well return a read/write file descriptor.
The following symbolic constants are provided for mode:
S_IRWXU 00700 user (file owner) has read, write, and execute permission
S_IRUSR 00400 user has read permission
S_IWUSR 00200 user has write permission
S_IXUSR 00100 user has execute permission
[...]
As you are neither using O_CREAT nor O_TMPFILE, this parameter will be
ignore and you are passing a char* as a mode_t which is integer in nature.
Hence your call should be:
if((fd = open(argv[i], O_RDONLY, 0)) == -1) {
...
Two adjustments are needed for you filecopy function:
You need to allocate space for your buffer. Right now you are using an uninitialized pointer and passing it to read which is undefined behavior.
You need to save the return value of read and pass the value to write
The end result should look something like this.
void filecopy(int infd, int outfd) {
char buf[1024];
size_t bytes_read;
while((bytes_read = read(infd, buf, sizeof buf)) != -1)
write(outfd, buf, bytes_read);
}
Running this through a static analysis tool gives 2 warnings:
1) The uninitialized variable that #Pablo points to
2) a buffer overrun when you sprintf *argv into tmp as *argv can very large (as #Pablo also suggested in his comment re: snprintf)

Opening gzipped files for reading in C without creating temporary files

I have some gzipped files that I want to read in C via fopen and fscanf. Is there anyway to do this without having to gunzip the files to temporary files?
Thanks.
You can use libzlib to open the gzipped files directly.
It also offers a "gzopen" function that behaves similar to fopen but operates on gzipped files. However, fscanf would probably not work on such a handle, since it expects normal FILE pointers.
If popen is fair game, you can do it with fopen and fscanf:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
int main(int argc, char *argv[])
{
const char prefix[] = "zcat ";
const char *arg;
char *cmd;
FILE *in;
char buf[4096];
if (argc != 2) {
fprintf(stderr, "Usage: %s file\n", argv[0]);
return 1;
}
arg = argv[1];
cmd = malloc(sizeof(prefix) + strlen(arg) + 1);
if (!cmd) {
fprintf(stderr, "%s: malloc: %s\n", argv[0], strerror(errno));
return 1;
}
sprintf(cmd, "%s%s", prefix, arg);
in = popen(cmd, "r");
if (!in) {
fprintf(stderr, "%s: popen: %s\n", argv[0], strerror(errno));
return 1;
}
while (fscanf(in, "%s", buf) == 1)
printf("%s: got [%s]\n", argv[0], buf);
if (ferror(in)) {
fprintf(stderr, "%s: fread: %s\n", argv[0], strerror(errno));
return 1;
}
else if (!feof(in)) {
fprintf(stderr, "%s: %s: unconsumed input\n", argv[0], argv[1]);
return 1;
}
return 0;
}
For example:
$ zcat file.gz
Every good boy does fine.
$ ./gzread file.gz
./gzread: got [Every]
./gzread: got [good]
./gzread: got [boy]
./gzread: got [does]
./gzread: got [fine.]
Do not use
sprintf(cmd, "zcat %s", argv[1]);
popen(cmd,"r");
to open .gz files. Properly escape argv[1] instead. You may otherwise end up with a vulnerability, especially when some injects an argument argv[1] such as
123;rm -rf /
It already helps to change the above instruction into
sprintf(cmd, "zcat \'%s\'",argv[1]);
You may also want to escape characters such as '\0', '\'', '\;' etc.
Newbie attempt at gzscanf():
#include <stdio.h>
#include <stdarg.h>
#include <zlib.h>
#define MAXLEN 256
int gzscanf(gzFile *stream, const char *fmt, ...) {
/* read one line from stream (up to newline) and parse with sscanf */
va_list args;
va_start(args, fmt);
int n;
static char buf[MAXLEN];
if (NULL == gzgets(stream, buf, MAXLEN)) {
printf("gzscanf: Failed to read line from gz file.\n");
exit(EXIT_FAILURE);
}
n = vsscanf(buf, fmt, args);
va_end(args);
return n;
}
You can use zlib and wrap it to a regular file pointer, this way you can use fscanf,fread,etc. transparently.
FILE *myfopen(const char *path, const char *mode)
{
#ifdef WITH_ZLIB
gzFile *zfp;
/* try gzopen */
zfp = gzopen(path,mode);
if (zfp == NULL)
return fopen(path,mode);
/* open file pointer */
return funopen(zfp,
(int(*)(void*,char*,int))gzread,
(int(*)(void*,const char*,int))gzwrite,
(fpos_t(*)(void*,fpos_t,int))gzseek,
(int(*)(void*))gzclose);
#else
return fopen(path,mode);
#endif
}
You can use zlib, but it will require you to replace your I/O calls to be zlib-specific.
you have to open a pipe to do this. The basic flow in pseudo code is:
create pipe // man pipe
fork // man fork
if (parent) {
close the writing end of the pipe // man 2 close
read from the pipe // man 2 read
} else if (child) {
close the reading end of the pipe // man 2 close
overwrite the file descriptor for stdout with the writing end of the pipe // man dup2
call exec() with gzip and the relevant parameters // man 3 exec
}
You can use the man pages in the comments for more details on how to do this.
It's quite simple to use zlib to open .gz files. There's a reasonable manual over at zlib.net.
Here's a quick example to get you started:
#include <stdio.h>
#include <zlib.h>
int main( int argc, char **argv )
{
// we're reading 2 text lines, and a binary blob from the given file
char line1[1024];
char line2[1024];
int blob[64];
if (argc > 1)
{
const char *filename = argv[1];
gzFile gz_in = gzopen( filename, "rb" ); // same as fopen()
if (gz_in != NULL)
{
if ( gzgets( gz_in, line1, sizeof(line1) ) != NULL ) // same as fgets()
{
if ( gzgets( gz_in, line2, sizeof(line2) ) != NULL )
{
if ( gzfread( blob, sizeof(int), 64, gz_in ) == 64 ) // same as fread()
{
printf("Line1: %s", line1);
printf("Line2: %s", line2);
// ...etc
}
}
}
gzclose(gz_in); // same as fclose()
}
else
{
printf( "Failed to GZ-open [%s]\n", filename );
}
}
return 0;
}
Remember to link with zlib, under UNIX gcc ... -lz

Resources