Reading a text file backwards in C - c

What's the best way to read a file backwards in C? I know at first you may be thinking that this is no use whatsoever, but most logs etc. append the most recent data at the end of the file. I want to read in text from the file backwards, buffering it into lines - that is
abc
def
ghi
should read ghi, def, abc in lines.
So far I have tried:
#include <stdio.h>
#include <stdlib.h>
void read_file(FILE *fileptr)
{
char currentchar = '\0';
int size = 0;
while( currentchar != '\n' )
{
currentchar = fgetc(fileptr); printf("%c\n", currentchar);
fseek(fileptr, -2, SEEK_CUR);
if( currentchar == '\n') { fseek(fileptr, -2, SEEK_CUR); break; }
else size++;
}
char buffer[size]; fread(buffer, 1, size, fileptr);
printf("Length: %d chars\n", size);
printf("Buffer: %s\n", buffer);
}
int main(int argc, char *argv[])
{
if( argc < 2) { printf("Usage: backwards [filename]\n"); return 1; }
FILE *fileptr = fopen(argv[1], "rb");
if( fileptr == NULL ) { perror("Error:"); return 1; }
fseek(fileptr, -1, SEEK_END); /* Seek to END of the file just before EOF */
read_file(fileptr);
return 0;
}
In an attempt to simply read one line and buffer it. Sorry that my code is terrible, I am getting so very confused. I know that you would normally allocate memory for the whole file and then read in the data, but for large files that constantly change I thought it would be better to read directly (especially if I want to search for text in a file).
Thanks in advance
* Sorry forgot to mention this will be used on Linux, so newlines are just NL without CR. *

You could just pipe the input through the program tac, which is like cat but backwards!
http://linux.die.net/man/1/tac

I recommend a more portable (hopefully) way of file size determination since fseek(binaryStream, offset, SEEK_END) is not guaranteed to work. See the code below.
I believe that files should be at least minimally buffered at the kernel level (e.g. buffering at least one block per file by default), so seeks should not incur significant amount of extra I/O and should only advance the file position internally. If the default buffering is not satisfactory, you may try to use setvbuf() to speed up the I/O.
#include <limits.h>
#include <string.h>
#include <stdio.h>
/* File must be open with 'b' in the mode parameter to fopen() */
long fsize(FILE* binaryStream)
{
long ofs, ofs2;
int result;
if (fseek(binaryStream, 0, SEEK_SET) != 0 ||
fgetc(binaryStream) == EOF)
return 0;
ofs = 1;
while ((result = fseek(binaryStream, ofs, SEEK_SET)) == 0 &&
(result = (fgetc(binaryStream) == EOF)) == 0 &&
ofs <= LONG_MAX / 4 + 1)
ofs *= 2;
/* If the last seek failed, back up to the last successfully seekable offset */
if (result != 0)
ofs /= 2;
for (ofs2 = ofs / 2; ofs2 != 0; ofs2 /= 2)
if (fseek(binaryStream, ofs + ofs2, SEEK_SET) == 0 &&
fgetc(binaryStream) != EOF)
ofs += ofs2;
/* Return -1 for files longer than LONG_MAX */
if (ofs == LONG_MAX)
return -1;
return ofs + 1;
}
/* File must be open with 'b' in the mode parameter to fopen() */
/* Set file position to size of file before reading last line of file */
char* fgetsr(char* buf, int n, FILE* binaryStream)
{
long fpos;
int cpos;
int first = 1;
if (n <= 1 || (fpos = ftell(binaryStream)) == -1 || fpos == 0)
return NULL;
cpos = n - 1;
buf[cpos] = '\0';
for (;;)
{
int c;
if (fseek(binaryStream, --fpos, SEEK_SET) != 0 ||
(c = fgetc(binaryStream)) == EOF)
return NULL;
if (c == '\n' && first == 0) /* accept at most one '\n' */
break;
first = 0;
if (c != '\r') /* ignore DOS/Windows '\r' */
{
unsigned char ch = c;
if (cpos == 0)
{
memmove(buf + 1, buf, n - 2);
++cpos;
}
memcpy(buf + --cpos, &ch, 1);
}
if (fpos == 0)
{
fseek(binaryStream, 0, SEEK_SET);
break;
}
}
memmove(buf, buf + cpos, n - cpos);
return buf;
}
int main(int argc, char* argv[])
{
FILE* f;
long sz;
if (argc < 2)
{
printf("filename parameter required\n");
return -1;
}
if ((f = fopen(argv[1], "rb")) == NULL)
{
printf("failed to open file \'%s\'\n", argv[1]);
return -1;
}
sz = fsize(f);
// printf("file size: %ld\n", sz);
if (sz > 0)
{
char buf[256];
fseek(f, sz, SEEK_SET);
while (fgetsr(buf, sizeof(buf), f) != NULL)
printf("%s", buf);
}
fclose(f);
return 0;
}
I've only tested this on windows with 2 different compilers.

There are quite a few ways you could do this, but reading a byte at a time is definitely one of poorer choices.
Reading the last, say, 4KB and then walking back up from the last character to the previous newline would be my choice.
Another option is to mmap the file, and just pretend that the file is a lump of memory, and scan backwards in that. [You can tell mmap you are reading backwards too, to make it prefetch data for you].
If the file is VERY large (several gigabytes), you may want to only use a small portion of the file in mmap.

If you want to learn how to do it, here's a Debian/Ubuntu example (for other like RPM based distros, adapt as needed):
~$ which tac
/usr/bin/tac
~$ dpkg -S /usr/bin/tac
coreutils: /usr/bin/tac
~$ mkdir srcs
~$ cd srcs
~/srcs$ apt-get source coreutils
(clip apt-get output)
~/srcs$ ls
coreutils-8.13 coreutils_8.13-3.2ubuntu2.1.diff.gz coreutils_8.13-3.2ubuntu2.1.dsc coreutils_8.13.orig.tar.gz
~/srcs$ cd coreutils-8.13/
~/srcs/coreutils-8.13$ find . -name tac.c
./src/tac.c
~/srcs/coreutils-8.13$ less src/tac.c
That's not too long, a bit over 600 lines, and while it packs some advanced features, and uses functions from other sources, the reverse line buffering implementation seems to be in that tac.c source file.

FSEEKing for every byte sounds PAINFULLY slow.
If you've got the memory, just read the entire file into memory and either reverse it or scan it backwards.
Another option would be Windows memory mapped files.

Related

How do I run C code on linux with input file from command line?

I'm trying to do some simple tasks in C and run them from the command line in Linux.
I'm having some problems with both C and running the code from the command line with a given filename given as a parameter. I've never written code in C before.
Remove the even numbers from a file. The file name is transferred to
the program as a parameter in the command line. The program changes
this file.
How do I do these?
read from a file and write the results over the same file
read numbers and not digits from the file (ex: I need to be able to read "22" as a single input, not two separate chars containing "2")
give the filename through a parameter in Linux. (ex: ./main.c file.txt)
my attempt at writing the c code:
#include <stdio.h>
int main ()
{
FILE *f = fopen ("arr.txt", "r");
char c = getc (f);
int count = 0;
int arr[20];
while (c != EOF)
{
if(c % 2 != 0){
arr[count] = c;
count = count + 1;
}
c = getc (f);
}
for (int i=0; i<count; i++){
putchar(arr[i]);
}
fclose (f);
getchar ();
return 0;
}
Here's a complete program which meets your requirements:
write the results over the same file - It keeps a read and write position in the file and copies characters towards the file beginning in case numbers have been removed; at the end, the now shorter file has to be truncated. (Note that with large files, it will be more efficient to write to a second file.)
read numbers and not digits from the file - It is not necessary to read whole numbers, it suffices to store the write start position of a number (this can be done at every non-digit) and the parity of the last digit.
give the filename through a parameter - If you define int main(int argc, char *argv[]), the first parameter is in argv[1] if argc is at least 2.
#include <stdio.h>
#include <ctype.h>
#include <unistd.h>
int main(int argc, char *argv[])
{
if (argc < 2) return 1; // no argument given
FILE *f = fopen(argv[1], "rb+");
if (!f) return 1; // if fopen failed
// read, write and number position
long rpos = 0, wpos = 0, npos = 0;
int even = 0, c; // int to hold EOF
while (c = getc(f), c != EOF)
{
if (isdigit(c)) even = c%2 == 0;
else
{
if (even) wpos = npos, even = 0;
npos = wpos+1; // next may be number
}
fseek(f, wpos++, SEEK_SET);
putc(c, f);
fseek(f, ++rpos, SEEK_SET);
}
ftruncate(fileno(f), wpos); // shorten the file
}
I'd do that like this (removing extra declarations => micro optimizations)
/**
* Check if file is avaiable.
*/
if (f == NULL)
{
printf("File is not available \n");
}
else
{
/**
* Populate array with even numbers.
*/
while ((ch = fgetc(f)) != EOF)
ch % 2 != 0 ? push(arr, ch); : continue;
/**
* Write to file those numbers.
*/
for (int i = 0; i < 20; i++)
fprintf(f, "%s", arr[i]);
}
Push implementation:
void push(int el, int **arr)
{
int *arr_temp = *arr;
*arr = NULL;
*arr = (int*) malloc(sizeof(int)*(n - 1));
(*arr)[0] = el;
for(int i = 0; i < (int)n - 1; i++)
{
(*arr)[i + 1] = arr_temp[i];
}
}
In order to write to the same file, without closing and opening it, you should provide both methods, w+ (writing and reading), and this method will clear it's content.
So, change the line where you open the file, for this.
FILE *f = fopen ("arr.txt", "w+");
You should look for ways of implementing dynamic arrays (pointers and memory management).
With this example you could simply go ahead and write yourself, inside the main loop, a temporary variable that stores a sequence of numbers, and stack those values
Something like this (pseudocode, have fun :)):
DELIMITER one of (',' | '|' | '.' | etc);
char[] temp;
if(ch not DELIMITER)
push ch on temp;
else
push temp to arr and clear it's content;
Hope this was useful.

C parse comments from textfile

So I am trying to implement a very trivial parser for reading a file and executing some commands. I guess very similar to bash scripts, but much simpler. I am having trouble figuring out how to tokenise the contents of a file given you are able to have comments denoted by #. To give you an example of how a source file might look
# Script to move my files across
# Author: Person Name
# First delete files
"rm -rf ~/code/bin/debug";
"rm -rf ~/.bin/tmp"; #deleting temp to prevent data corruption
# Dump file contents
"cat ~/code/rel.php > log.txt";
So far here is my code. Note that I am essentially using this little project as a means of become more comfortable and familiar with C. So pardon any obvious flaws in the code. Would appreciate the feedback.
// New line.
#define NL '\n'
// Quotes.
#define QT '"'
// Ignore comment.
#define IGN '#'
int main() {
if (argc != 2) {
show_help();
return 0;
}
FILE *fptr = fopen(argv[1], "r");
char *buff;
size_t n = 0;
int readlock = 0;
int qread = 0;
char c;
if (fptr == NULL){
printf("Error: invalid file provided %s for reading", argv[1]);
exit(1);
}
fseek(fptr, 0, SEEK_END);
long f_size = ftell(fptr);
fseek(fptr, 0, SEEK_SET);
buff = calloc(1, f_size);
// Read file contents.
// Stripping naked whitespace and comments.
// qread is when in quotation mode. Everything is stored even '#' until EOL or EOF.
while ((c = fgetc(fptr)) != EOF) {
switch(c) {
case IGN :
if (qread == 0) {
readlock = 1;
}
else {
buff[n++] = c;
}
break;
case NL :
readlock = 0;
qread = 0;
break;
case QT :
if ((readlock == 0 && qread == 0) || (readlock == 0 && qread == 1)) {
// Activate quote mode.
qread = 1;
buff[n++] = c;
}
else {
qread = 0;
}
break;
default :
if ((qread == 1 && readlock == 0) || (readlock == 0 && !isspace(c))) {
buff[n++] = c;
}
break;
}
}
fclose(fptr);
printf("Buffer contains %s \n", buff);
free(buff);
return 0;
}
So the above solution works but my question is...is there a better way to achieve the desired outcome ? At the moment i don't actually "tokenize" anything. Does the current implementation lack logic to be able create tokens based on the characters ?
It is way easier to read your file by whole lines:
char line[1024];
while(!feof(fptr))
{
if(!fgets (line , 1024 , fptr))
continue;
if(line[0] == '#') // comment
continue; // skip it
//... handle command in line here
}

Reading and writing to a file at the same time in C

Supposed to swap every two lines in a file until just one line remains or all lines are exhausted. I don't want to use another file in doing so.
Here's my code:
#include <stdio.h>
int main() {
FILE *fp = fopen("this.txt", "r+");
int i = 0;
char line1[100], line2[100];
fpos_t pos;
fgetpos(fp, &pos);
//to get the total line count
while (!feof(fp)) {
fgets(line1, 100, fp);
i++;
}
i /= 2; //no. of times to run the loop
rewind(fp);
while (i-- > 0) { //trying to use !feof(fp) condition to break the loop results in an infinite loop
fgets(line1, 100, fp);
fgets(line2, 100, fp);
fsetpos(fp, &pos);
fputs(line2, fp);
fputs(line1, fp);
fgetpos(fp, &pos);
}
fclose(fp);
return 0;
}
content in this.txt:
aaa
b
cc
ddd
ee
ffff
gg
hhhh
i
jj
content after running the program
b
aaa
ddd
cc
ddd
c
c
c
i
jj
I've even tried using fseek in place of fgetpos just to get the same wrong result.
From what I figured, after the second while loop has run two times (i.e the first four lines have been processed), the cursor is rightfully at 17th byte where it is supposed to be (as returned by the call to ftell(fp)) and even the file contents after the 4th line are unchanged and somehow for some reason when fgets is called when the loop is running for the third time, the contents read into arrays line1 and line2 are "c\n" and "ddd\n" respectively.
AGAIN, I don't want to use another file to accomplish this, I just need to figure out what exactly is going wrong behind the screen
Any leads would be appreciated. Thank you.
There are multiple problems in your code:
You do not check if fopen() succeeds, risking undefined behavior.
The loop to determine the total number of lines is incorrect.Learn why here: Why is “while ( !feof (file) )” always wrong?
You do not actually need to compute the total number of lines.
You should call fflush() to write the contents back to the file before changing from writing back to reading.
The C Standard specifies this restriction for files open in update mode:
7.21.5.3 The fopen function
[...] output shall not be directly followed by input without an intervening call to the fflush function or to a file positioning function (fseek, fsetpos, or rewind), and input shall not be directly followed by output without an intervening call to a file positioning function, unless the input operation encounters end-of-file.
This explains why just reading the file position after writing the lines in reverse order causes problems. Calling fflush() should solve this issue.
Here is a corrected version:
#include <stdio.h>
int main(void) {
FILE *fp;
char line1[100], line2[100];
fpos_t pos;
fp = fopen("this.txt", "r+");
if (fp == NULL) {
fprintf(stderr, "cannot open this.txt\n");
return 1;
}
while (fgetpos(fp, &pos) == 0 &&
fgets(line1, sizeof line1, fp) != NULL &&
fgets(line2, sizeof line2, fp) != NULL) {
fsetpos(fp, &pos);
fputs(line2, fp);
fputs(line1, fp);
fflush(fp);
}
fclose(fp);
return 0;
}
The buffer may not necessarily be flushed when changing the current position of the file. So it must be explicitly flushed.
E.g Use fflush(fp);
Change
fputs(line2,fp);
fputs(line1,fp);
to
fputs(line2,fp);
fputs(line1,fp);
fflush(fp);
Why not use two file pointers, both pointing to the same file, one to read and one to write? No need to keep track of the file position, no need to seek around, no need to flush then.
This approach spares you a lot of complicated stuff. Those unnecessary efforts are better invested in some sophisticated error checking/logging like below ;-):
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int main(void)
{
int result = EXIT_SUCCESS;
size_t blocks = 0;
int l1_done = 0;
int l2_done = 0;
FILE *fpin = fopen("this.txt", "r");
FILE *fpout = fopen("this.txt", "r+");
if (NULL == fpin)
{
result = EXIT_FAILURE;
perror("fopen() to for reading failed");
}
if (NULL == fpout)
{
result = EXIT_FAILURE;
perror("fopen() for writing failed");
}
while (EXIT_SUCCESS == result && !l1_done && !l2_done)
{
result = EXIT_FAILURE;
char line1[100];
char line2[100];
if ((l1_done = (NULL == fgets(line1, sizeof line1, fpin))))
{
if (ferror(fpin))
{
fprintf(stderr, "Reading line %zu failed.\n", 2*blocks);
break;
}
}
if ((l2_done = (NULL == fgets(line2, sizeof line2, fpin))))
{
if (ferror(fpin))
{
fprintf(stderr, "Reading line %zu failed.\n", 2*blocks + 1);
break;
}
}
{
size_t len = strlen(line1);
if (((sizeof line1 - 1) == len) && ('\n' != line1[len]))
{
fprintf(stderr, "Line %zu too long or new-line missing.\n", 2*blocks);
break;
}
}
{
size_t len = strlen(line2);
if (((sizeof line2 - 1) == len) && ('\n' != line2[len]))
{
fprintf(stderr, "Line %zu too long or new-line missing.\n", 2*blocks + 1);
break;
}
}
if (!l2_done)
{
if (EOF == fputs(line2, fpout))
{
fprintf(stderr, "Writing line %zu as line %zu failed.\n", 2*blocks + 1, 2*blocks);
break;
}
}
if (!l1_done)
{
if (EOF == fputs(line1, fpout))
{
fprintf(stderr, "Writing line %zu as line %zu failed.\n", 2*blocks, 2*blocks + 1);
break;
}
}
++blocks;
result = EXIT_SUCCESS;
}
if (EXIT_SUCCESS == result && !ll_done && l2_done)
{
fprintf(stderr, "Odd number of lines.\n");
}
fclose(fpin); /* Perhaps add error checking here as well ... */
fclose(fpout); /* Perhaps add error checking here as well ... */
return result;
}

Get the length of each line in file with C and write in output file

I am a biology student and I am trying to learn perl, python and C and also use the scripts in my work. So, I have a file as follows:
>sequence1
ATCGATCGATCG
>sequence2
AAAATTTT
>sequence3
CCCCGGGG
The output should look like this, that is the name of each sequence and the count of characters in each line and printing the total number of sequences in the end of the file.
sequence1 12
sequence2 8
sequence3 8
Total number of sequences = 3
I could make the perl and python scripts work, this is the python script as an example:
#!/usr/bin/python
import sys
my_file = open(sys.argv[1]) #open the file
my_output = open(sys.argv[2], "w") #open output file
total_sequence_counts = 0
for line in my_file:
if line.startswith(">"):
sequence_name = line.rstrip('\n').replace(">","")
total_sequence_counts += 1
continue
dna_length = len(line.rstrip('\n'))
my_output.write(sequence_name + " " + str(dna_length) + '\n')
my_output.write("Total number of sequences = " + str(total_sequence_counts) + '\n')
Now, I want to write the same script in C, this is what I have achieved so far:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char *argv[])
{
input = FILE *fopen(const char *filename, "r");
output = FILE *fopen(const char *filename, "w");
double total_sequence_counts = 0;
char sequence_name[];
char line [4095]; // set a temporary line length
char buffer = (char *) malloc (sizeof(line) +1); // allocate some memory
while (fgets(line, sizeof(line), filename) != NULL) { // read until new line character is not found in line
buffer = realloc(*buffer, strlen(line) + strlen(buffer) + 1); // realloc buffer to adjust buffer size
if (buffer == NULL) { // print error message if memory allocation fails
printf("\n Memory error");
return 0;
}
if (line[0] == ">") {
sequence_name = strcpy(sequence_name, &line[1]);
total_sequence_counts += 1
}
else {
double length = strlen(line);
fprintf(output, "%s \t %ld", sequence_name, length);
}
fprintf(output, "%s \t %ld", "Total number of sequences = ", total_sequence_counts);
}
int fclose(FILE *input); // when you are done working with a file, you should close it using this function.
return 0;
int fclose(FILE *output);
return 0;
}
But this code, of course is full of mistakes, my problem is that despite studying a lot, I still can't properly understand and use the memory allocation and pointers so I know I especially have mistakes in that part. It would be great if you could comment on my code and see how it can turn into a script that actually work. By the way, in my actual data, the length of each line is not defined so I need to use malloc and realloc for that purpose.
For a simple program like this, where you look at short lines one at a time, you shouldn't worry about dynamic memory allocation. It is probably good enough to use local buffers of a reasonable size.
Another thing is that C isn't particularly suited for quick-and-dirty string processing. For example, there isn't a strstrip function in the standard library. You usually end up implementing such behaviour yourself.
An example implementation looks like this:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#define MAXLEN 80 /* Maximum line length, including null terminator */
int main(int argc, char *argv[])
{
FILE *in;
FILE *out;
char line[MAXLEN]; /* Current line buffer */
char ref[MAXLEN] = ""; /* Sequence reference buffer */
int nseq = 0; /* Sequence counter */
if (argc != 3) {
fprintf(stderr, "Usage: %s infile outfile\n", argv[0]);
exit(1);
}
in = fopen(argv[1], "r");
if (in == NULL) {
fprintf(stderr, "Couldn't open %s.\n", argv[1]);
exit(1);
}
out = fopen(argv[2], "w");
if (in == NULL) {
fprintf(stderr, "Couldn't open %s for writing.\n", argv[2]);
exit(1);
}
while (fgets(line, sizeof(line), in)) {
int len = strlen(line);
/* Strip whitespace from end */
while (len > 0 && isspace(line[len - 1])) len--;
line[len] = '\0';
if (line[0] == '>') {
/* First char is '>': copy from second char in line */
strcpy(ref, line + 1);
} else {
/* Other lines are sequences */
fprintf(out, "%s: %d\n", ref, len);
nseq++;
}
}
fprintf(out, "Total number of sequences. %d\n", nseq);
fclose(in);
fclose(out);
return 0;
}
A lot of code is about enforcing arguments and opening and closing files. (You could cut out a lot of code if you used stdin and stdout with file redirections.)
The core is the big while loop. Things to note:
fgets returns NULL on error or when the end of file is reached.
The first lines determine the length of the line and then remove white-space from the end.
It is not enough to decrement length, at the end the stripped string must be terminated with the null character '\0'
When you check the first character in the line, you should check against a char, not a string. In C, single and double quotes are not interchangeable. ">" is a string literal of two characters, '>' and the terminating '\0'.
When dealing with countable entities like chars in a string, use integer types, not floating-point numbers. (I've used (signed) int here, but because there can't be a negative number of chars in a line, it might have been better to have used an unsigned type.)
The notation line + 1 is equivalent to &line[1].
The code I've shown doesn't check that there is always one reference per sequence. I'll leave this as exercide to the reader.
For a beginner, this can be quite a lot to keep track of. For small text-processing tasks like yours, Python and Perl are definitely better suited.
Edit: The solution above won't work for long sequences; it is restricted to MAXLEN characters. But you don't need dynamic allocation if you only need the length, not the contents of the sequences.
Here's an updated version that doesn't read lines, but read characters instead. In '>' context, it stored the reference. Otherwise it just keeps a count:
#include <stdlib.h>
#include <stdio.h>
#include <ctype.h> /* for isspace() */
#define MAXLEN 80 /* Maximum line length, including null terminator */
int main(int argc, char *argv[])
{
FILE *in;
FILE *out;
int nseq = 0; /* Sequence counter */
char ref[MAXLEN]; /* Reference name */
in = fopen(argv[1], "r");
out = fopen(argv[2], "w");
/* Snip: Argument and file checking as above */
while (1) {
int c = getc(in);
if (c == EOF) break;
if (c == '>') {
int n = 0;
c = fgetc(in);
while (c != EOF && c != '\n') {
if (n < sizeof(ref) - 1) ref[n++] = c;
c = fgetc(in);
}
ref[n] = '\0';
} else {
int len = 0;
int n = 0;
while (c != EOF && c != '\n') {
n++;
if (!isspace(c)) len = n;
c = fgetc(in);
}
fprintf(out, "%s: %d\n", ref, len);
nseq++;
}
}
fprintf(out, "Total number of sequences. %d\n", nseq);
fclose(in);
fclose(out);
return 0;
}
Notes:
fgetc reads a single byte from a file and returns this byte or EOF when the file has ended. In this implementation, that's the only reading function used.
Storing a reference string is implemented via fgetc here too. You could probably use fgets after skipping the initial angle bracket, too.
The counting just reads bytes without storing them. n is the total count, len is the count up to the last non-space. (Your lines probably consist only of ACGT without any trailing space, so you could skip the test for space and use n instead of len.)
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char *argv[]){
FILE *my_file = fopen(argv[1], "r");
FILE *my_output = fopen(argv[2], "w");
int total_sequence_coutns = 0;
char *sequence_name;
int dna_length;
char *line = NULL;
size_t size = 0;
while(-1 != getline(&line, &size, my_file)){
if(line[0] == '>'){
sequence_name = strdup(strtok(line, ">\n"));
total_sequence_coutns +=1;
continue;
}
dna_length = strlen(strtok(line, "\n"));
fprintf(my_output, "%s %d\n", sequence_name, dna_length);
free(sequence_name);
}
fprintf(my_output, "Total number of sequences = %d\n", total_sequence_coutns);
fclose(my_file);
fclose(my_output);
free(line);
return (0);
}

Piping log output through a C program for easy log rotation

I'm trying to make it really easy to logrotate some of my apps that log via bash redirection. Basically, I have a C program that reads STDIN into a buffer. It reads this buffer, and whenever it encounters a newline, it will write the output it has gathered to a file.
The difference in this program is that it does not leave the file open. It opens it for appending each time a new line is encountered. This works great with the logrotate utility, but I'm wondering if there's some sort of horrible unforseen issue I'm not accounting for that I'll run into later on.
Is it better just to implement signal handling in this utility and have logrotate send it a SIGHUP? Are there horrible performance penalties to what I'm doing?
So normally where you'd do:
./app >> output.log
With the logger util you do:
./app | ./mylogger output.log
Although I'm too bad in C, I'm not very well versed in its best practices. Any guidance would be greatly appreciated.
Here's the source:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
#define BUFSIZE 1024
#define MAX_WRITE_FAILS 3
/**
* outputs the given content to the specified file.
*/
int file_output(char *filename, char *content, size_t content_length)
{
FILE *fp;
fp = fopen(filename, "a");
content[content_length + 1] = '\0';
if(fp == NULL) return errno;
fwrite(content, sizeof(char), content_length, fp);
fclose(fp);
return 0;
}
/**
* Loops over STDIN and whenever it finds a newline, sends the current content
* buffer to the file specified on the command line.
*/
int main(int argc, char *argv[])
{
int i;
char buffer[BUFSIZE];
char *content = malloc(sizeof(char) * BUFSIZE);
size_t content_size = 0;
int content_buf_size = BUFSIZE;
int write_failures = 0;
char *file;
if(argc < 2)
{
fprintf(stderr, "Usage: logger <file>");
exit(1);
}
file = argv[1];
// loop over STDIN
while(fgets(buffer, BUFSIZE, stdin))
{
int output_err;
int buflength = strlen(buffer);
// loop over character for character, searching for newlines and
// appending our buffer to the output content as we go along
for(i = 0; i < buflength; i++)
{
char *old = content;
// check if we have a newline or end of string
if(buffer[i] == '\n' || buffer[i] == '\0' || (i != (buflength - 1) && buffer[i] == '\r' && buffer[i+1] == '\n'))
{
content[content_size] = '\n';
output_err = file_output(file, content, content_size + 1);
if(output_err == 0)
{
// success! reset the content size (ie more or less resets
// the output content string)
content_size = 0;
write_failures = 0;
}
else
{
// write failed, try to keep going. this will preserve our
// newline so that the next newline we encounter will write
// both lines (this AND and the next).
content_size++;
write_failures++;
}
}
if(write_failures >= MAX_WRITE_FAILS)
{
fprintf(stderr, "Failed to write output to file %d times (errno: %d). Quitting.\n", write_failures, output_err);
exit(3);
}
if(buffer[i] != '\n' && buffer[i] != '\r' && buffer[i] != '\0')
{
// copy buffer into content (if it's not a newline/null)
content[content_size] = buffer[i];
content_size++;
}
// check if we're pushing the limits of our content buffer
if(content_size >= content_buf_size - 1)
{
// we need to up the size of our output buffer
content_buf_size += BUFSIZE;
content = (char *)realloc(content, sizeof(char) * content_buf_size);
if(content == NULL)
{
fprintf(stderr, "Failed to reallocate buffer memory.\n");
free(old);
exit(2);
}
}
}
}
return 0;
}
Thanks!
Since my suggestion in the comments turned out to be what you needed, I am adding it as an answer, with more of an explanation.
When you have a logging application which can not be told to close its logfile (usually via SIGHUP), you can use the 'copytruncate' option in your logrotate.conf.
Here is the description from the man page:
Truncate the original log file in place after creating a copy,
instead of moving the old log file and optionally creating a new
one, It can be used when some program can not be told to close
its logfile and thus might continue writing (appending) to the
previous log file forever. Note that there is a very small time
slice between copying the file and truncating it, so some log-
ging data might be lost. When this option is used, the create
option will have no effect, as the old log file stays in place.
Source: http://linuxcommand.org/man_pages/logrotate8.html

Resources