So I am trying to implement a very trivial parser for reading a file and executing some commands. I guess very similar to bash scripts, but much simpler. I am having trouble figuring out how to tokenise the contents of a file given you are able to have comments denoted by #. To give you an example of how a source file might look
# Script to move my files across
# Author: Person Name
# First delete files
"rm -rf ~/code/bin/debug";
"rm -rf ~/.bin/tmp"; #deleting temp to prevent data corruption
# Dump file contents
"cat ~/code/rel.php > log.txt";
So far here is my code. Note that I am essentially using this little project as a means of become more comfortable and familiar with C. So pardon any obvious flaws in the code. Would appreciate the feedback.
// New line.
#define NL '\n'
// Quotes.
#define QT '"'
// Ignore comment.
#define IGN '#'
int main() {
if (argc != 2) {
show_help();
return 0;
}
FILE *fptr = fopen(argv[1], "r");
char *buff;
size_t n = 0;
int readlock = 0;
int qread = 0;
char c;
if (fptr == NULL){
printf("Error: invalid file provided %s for reading", argv[1]);
exit(1);
}
fseek(fptr, 0, SEEK_END);
long f_size = ftell(fptr);
fseek(fptr, 0, SEEK_SET);
buff = calloc(1, f_size);
// Read file contents.
// Stripping naked whitespace and comments.
// qread is when in quotation mode. Everything is stored even '#' until EOL or EOF.
while ((c = fgetc(fptr)) != EOF) {
switch(c) {
case IGN :
if (qread == 0) {
readlock = 1;
}
else {
buff[n++] = c;
}
break;
case NL :
readlock = 0;
qread = 0;
break;
case QT :
if ((readlock == 0 && qread == 0) || (readlock == 0 && qread == 1)) {
// Activate quote mode.
qread = 1;
buff[n++] = c;
}
else {
qread = 0;
}
break;
default :
if ((qread == 1 && readlock == 0) || (readlock == 0 && !isspace(c))) {
buff[n++] = c;
}
break;
}
}
fclose(fptr);
printf("Buffer contains %s \n", buff);
free(buff);
return 0;
}
So the above solution works but my question is...is there a better way to achieve the desired outcome ? At the moment i don't actually "tokenize" anything. Does the current implementation lack logic to be able create tokens based on the characters ?
It is way easier to read your file by whole lines:
char line[1024];
while(!feof(fptr))
{
if(!fgets (line , 1024 , fptr))
continue;
if(line[0] == '#') // comment
continue; // skip it
//... handle command in line here
}
Related
I am trying to read a file in C and then store values after a certain word is read. For example, in my input.txt file, the following are the contents:
GREETINGS
Hello 13
Namaste 24
Hola 36
FLAVORS
Vanilla 23
Chocolate 78
I want to read past GREETINGS and then store Hello and its value 13 then Namaste and its value 24, etc. And then read past FLAVORS and store Vanilla and its value 23, etc. And so on...
I know how to open a file but not sure how to proceed from there.
void readInput() {
char input_file[100];
FILE *fp;
printf("Please enter the name of the file you would like to read: \n");
scanf("%s", input_file);
printf("You entered: %s\n", input_file);
fp = fopen(input_file, "r");
if (fp == NULL) {
printf("File does not exist.\n");
exit(1);
}
else
printf("This file exists!\n");
}
From your sample data it is not clear whether the lines with data have the always the same format. So I
suggest that you store in a variable the current mode (that means GREETINGS, FLAVOR, etc) and you call
a specific parsing function for every mode depending of the value of the current
mode.
I'd create this enum first
typedef enum {
MODE_GREETINGS = 0, // <-- important that this value is 0
MODE_FLAVORS,
...
MODE_INVALID
} Modes;
and a variable (this can be a global variable) with all modes:
const char *modes_lookup[] = {
"GREETINGS",
"FLAVORS",
...
};
Then I'd write a function that returns the mode:
Modes get_mode(const char *src)
{
if(src == NULL)
return MODE_INVALID;
for(size_t i = 0; i < sizeof modes_lookup / sizeof modes_lookup[0]; ++i)
{
if(strcmp(src, modes_lookup[i]) == 0)
return i;
}
return MODE_INVALID;
}
Now you can parse the file like this:
char line[1024];
Modes mode = MODE_INVALID;
size_t lineno = 0;
while(fgets(line, sizeof line, fp))
{
lineno++;
// removing newline
line[strcspn(line, "\n")] = 0;
Modes newmode = get_mode(line);
// check if line is a GREETINGS, FLAVOR, etc line
if(newmode != MODE_INVALID)
{
// sets the new mode and continue with the next line
mode = newmode;
continue;
}
// parsing depending on the current mode
if(mode == MODE_GREETINGS)
{
if(parse_greetings(line) == 0)
{
fprintf(stderr, "Cannot parse data on line %zu\n", lineno);
break;
}
continue;
}
if(mode == MODE_FLAVORS) {
if(parse_flavor(line) == 0)
{
fprintf(stderr, "Cannot parse data on line %zu\n", lineno);
break;
}
continue;
}
// more (if mode == MODE_xxx) cases as needed
if(mode == MODE_INVALID)
{
// mode == MODE_INVALID
fprintf(stderr, "File does not start with a mode line.\n");
break;
}
}
Then you've got to write the parse_greetings, parse_flavor, parse_xxx functions. Like I said
before, it's not clear from your sample whether the format will be always the
same, so it depends on the format whether you use strtok or sscanf. Also
your description of how and where you want to store the values is very vague, so
I don't know what would be the best for your.
The easiest would be sscanf and if the data line gets more complex, you can then
use strtok.
For example with sscanf
int parse_greetings(const char *line)
{
if(line == NULL)
return 0;
char name[20];
int val;
if(sscanf(line, "%19s %d", name, val) != 2)
return 0;
// do whatever you need with 'name' and 'val'
return 1;
}
And with strtok:
int parse_greetings(char *line)
{
if(line == NULL)
return 0;
char name[20];
int val;
const char *delim = " \t\r\n";
char *token = strtok(line, NULL);
// line has only delimiters
if(token == NULL)
return 0;
strncpy(name, token, sizeof name / sizeof name[0]);
name[(sizeof name / sizeof name[0]) - 1] = 0;
token = strtok(NULL, delim);
if(token == NULL)
return 0;
char *end;
val = strtol(token, &end, 0);
// check if token is a number
if(*end != '\0')
return 0;
// do whatever you need with 'name' and 'val'
return 1;
}
I'm not sure what exact criteria you're using to choose which lines to skip, but the next step for reading lines is to use fscanf or fgets. This tutorial should help you get started with reading files in C.
Edit, to make this more relevant after your comment:
fgets may be a better choice. You'll need to read each line, then loop through each character, to check if they are all uppercase.
If they are not uppercase, you can use fscanf, using the format %s %d, to scan the data line.
Alright, so basically what I have to do is change all the numbers of a text file to dollar sign, I know how to scan for the specific character but I am stuck on how to replace that specific character with dollar sign. I don't want to use fseek or any library commands, how do I proceed and why isn't my code working?
#include<stdio.h>
main()
{
FILE* fptr;
char filename[50];
char string[100];
int i;
printf("Enter the name of the file to be opened: ");
scanf("%s",filename);
fptr=fopen(filename,"w");
if(fptr==NULL)
{
printf("Error occurred, try again.");
return 0;
}
fgets(string,"%s",fptr);
do
{
if(string[i]>='1' && string[i]<='9')
{
string[i]='$';
}
}
while(i!=100);
fclose(fptr);
}
There are basically two approaches at first glance, the first is to use fseek() and the second to read the file in its entirety and replace the characters to your criteria and finally write that in one shot. You can choose either of the approaches depending on your need. For large file you should prefer the former and for small file you can prefer the latter.
Here's an example code of the former:
#include <stdio.h>
int main() {
// Open the file
FILE *fptr = fopen("input.txt", "r+");
if (!fptr) {
printf("Error occurred, try again.");
return -1;
}
int c;
// Iterate through all characters in a file
while ((c = getc(fptr)) != EOF) {
// Check if this current character is a digit?
if (c >= '0' && c <= '9') {
// Go one character back
if (fseek(fptr, -1, SEEK_CUR) != 0) {
fprintf(stderr, "Error while going one char back\n");
return -1;
}
// Replace the character with a '$'
if (fputc('$', fptr) == EOF) {
fprintf(stderr, "Error while trying to replace\n");
return -1;
}
}
}
// Flush the changes to the disk
if (fflush(fptr) != 0) {
fprintf(stderr, "Error while flushing to disk\n");
return -1;
}
// Close the file
fclose(fptr);
return 0;
}
I searched through and didn’t get a quite working answer. Although I know that still there might be plenty of answers out there. Honestly I couldn’t find it as I am a beginner to C/C++ .
My problem is I have a text file which has data on it separated by pipes('|'). Actually a log file. In each entry things are separated by pipes('|') and each entry is separated by new line('\n')its really lengthy. So I wanted to do is that when user gives a sequence sequence=[2,5,7] the function should be able to read that array and give only the things starting with that pipe position. So here It should give 2nd ,5th, and 7th things in to a text file. down here is the code I used. It doesnt work for some reason I can't find. It gives the resulting text file printed out only with the '\n' and no more.Its more thant the entries in the file too.
minorSeparatorChar is the charactor given as '|'
majorSeparatorChar is the charactor given as '\n'
inFile Input text file
outFile output text file
minSepCount minor separator count
majSepCount major separator count
sequence is a global const int array
void getFormattedOutput(char * inFile, char * outFile, char minorSeparatorChar,char majorSeparatorChar){
FILE *readFile,*writeFile;
int charactor=0, minSepCount=0, i=0,majSepCount = 0;
int flagMin = 0;
char charactorBefore = NULL;
readFile = fopen(inFile,"r"); // opens the file for reading
writeFile = fopen(outFile,"w"); // opens the file for writing
if (readFile==NULL || writeFile == NULL){
printf("\nFile creation is not a sucess, Exiting program..\n");
exit(0);
}
while(charactor!=EOF){
charactorBefore = charactor;
if (charactor==minorSeparatorChar)
flagMin=1;
charactor = fgetc(readFile);
if(charactorBefore == minorSeparatorChar){
flagMin = 0;
if (minSepCount==sequence[i]){
fputc(charactor,writeFile);
continue;
}
i++;
minSepCount++;
}
else if (charactorBefore == majorSeparatorChar){
minSepCount=0;
i=0;
majSepCount++;
fputc('\n',writeFile);
}
else{
if(flagMin==1)
fputc(charactor,writeFile);
continue;
}
}
fclose(readFile);
fclose(writeFile);
}
for example if the input file has
33|333|67|787|7889|9876554|56
20151001|0|0|0|0||94|71
1|94|71|1|94|71|1
and if I give sequence [2,5,6]
It should print to out file as
67 9876554 56
0 94 71
71 71 1
I ultimately concluded that there were too many flags and controls and variables in your code and that I couldn't make head or tail of what they were up to, and decided to rewrite the code. I couldn't see in your code how you knew how many fields were in the sequence, for example.
I write in C11 (C99), but in this program, that simply means that I declare variables when they're needed, not at the top of the function. If it's a problem (C89/C90), move the declarations to the top of the function.
I also find that the names used were so long that they obscured the purpose of the variables. You may think I've gone too far in the other direction; more significantly, your professor (teacher) may think that. So be it; names are fungible and global search and replace works well.
I also don't see how your code is supposed to interpolate semi-arbitrary numbers of blanks between the fields, so I've actually ducked the issue. This code outputs the field separator (minor_sep — a length reduction of minorSeparatorChar) and the record separator (major_sep — reduced from majorSeparatorChar) at the appropriate points.
I note that field numbers start with field 0 in your code. I'm not convinced your code would ever output data from field 0, but that is somewhat tangential given the rewrite.
I ended up with:
#include <stdio.h>
#include <stdlib.h>
static const int sequence[] = { 2, 5, 7 };
static const int seqlen = 3;
static
void getFormattedOutput(char *inFile, char *outFile, char minor_sep, char major_sep)
{
FILE *ifp = fopen(inFile, "r"); // opens the file for reading
FILE *ofp = fopen(outFile, "w"); // opens the file for writing
if (ifp == NULL || ofp == NULL)
{
printf("\nFile creation is not a success, Exiting program..\n");
exit(0);
}
int c;
int seqnum = 0;
int fieldnum = 0;
while ((c = getc(ifp)) != EOF)
{
if (c == major_sep)
{
putc(major_sep, ofp);
fieldnum = 0;
seqnum = 0;
}
else if (c == minor_sep)
{
if (seqnum < seqlen && fieldnum == sequence[seqnum])
{
putc(minor_sep, ofp);
seqnum++;
}
fieldnum++;
}
else if (fieldnum == sequence[seqnum])
fputc(c, ofp);
}
fclose(ifp);
fclose(ofp);
}
int main(void)
{
getFormattedOutput("/dev/stdin", "/dev/stdout", '|', '\n');
return 0;
}
When I run it (I called it split, though it isn't a good choice since there is also a standard command split), I get:
$ echo "fld0|fld1|fld2|fld3|fld4|fld5|fld6|fld7|fld8|fld9" | ./split
fld2|fld5|fld7|
$ echo "fld0|fld1|fld2|fld3|fld4|fld5|fld6" | ./split
fld2|fld5|
$
The only possible objection is that there is a field terminator rather than a field separator. As you can see, a terminator is not hard to implement; making it into a separator (so there isn't a pipe after the last field on the line, even when the line doesn't have as many fields as there are elements in the sequence — see the second sample output) is trickier. The code needs to output a separator when it reads the first character of a field that should be printed after the first such field. This code achieves that:
#include <stdio.h>
#include <stdlib.h>
static const int sequence[] = { 2, 5, 7 };
static const int seqlen = 3;
static
void getFormattedOutput(char *inFile, char *outFile, char minor_sep, char major_sep)
{
FILE *ifp = fopen(inFile, "r"); // opens the file for reading
FILE *ofp = fopen(outFile, "w"); // opens the file for writing
if (ifp == NULL || ofp == NULL)
{
printf("\nFile creation is not a success, Exiting program..\n");
exit(0);
}
int c;
int seqnum = 0;
int fieldnum = 0;
int sep = 0;
while ((c = getc(ifp)) != EOF)
{
if (c == major_sep)
{
putc(major_sep, ofp);
fieldnum = 0;
seqnum = 0;
sep = 0;
}
else if (c == minor_sep)
{
if (seqnum < seqlen && fieldnum == sequence[seqnum])
seqnum++;
fieldnum++;
sep = minor_sep;
}
else if (fieldnum == sequence[seqnum])
{
if (sep != 0)
{
putc(sep, ofp);
sep = 0;
}
putc(c, ofp);
}
}
fclose(ifp);
fclose(ofp);
}
int main(void)
{
getFormattedOutput("/dev/stdin", "/dev/stdout", '|', '\n');
return 0;
}
Example run:
$ {
> echo "Afld0|Afld1|Afld2|Afld3|Afld4|Afld5|Afld6|Afld7|Afld8|Afld9"
> echo "Bfld0|Bfld1|Bfld2|Bfld3|Bfld4|Bfld5|Bfld6|Bfld7|Bfld8|Bfld9"
> echo "Cfld0|Cfld1|Cfld2|Cfld3|Cfld4|Cfld5|Cfld6|Cfld7|Cfld8|Cfld9"
> echo "Dfld0|Dfld1|Dfld2|Dfld3|Dfld4|Dfld5|Dfld6|Dfld7|Dfld8|Dfld9"
> echo "Efld0|Efld1|Efld2|Efld3|Efld4|Efld5|Efld6|Efld7|Efld8|Efld9"
> } | ./split
|Afld2|Afld5|Afld7
|Bfld2|Bfld5|Bfld7
|Cfld2|Cfld5|Cfld7
|Dfld2|Dfld5|Dfld7
|Efld2|Efld5|Efld7
$
What's the best way to read a file backwards in C? I know at first you may be thinking that this is no use whatsoever, but most logs etc. append the most recent data at the end of the file. I want to read in text from the file backwards, buffering it into lines - that is
abc
def
ghi
should read ghi, def, abc in lines.
So far I have tried:
#include <stdio.h>
#include <stdlib.h>
void read_file(FILE *fileptr)
{
char currentchar = '\0';
int size = 0;
while( currentchar != '\n' )
{
currentchar = fgetc(fileptr); printf("%c\n", currentchar);
fseek(fileptr, -2, SEEK_CUR);
if( currentchar == '\n') { fseek(fileptr, -2, SEEK_CUR); break; }
else size++;
}
char buffer[size]; fread(buffer, 1, size, fileptr);
printf("Length: %d chars\n", size);
printf("Buffer: %s\n", buffer);
}
int main(int argc, char *argv[])
{
if( argc < 2) { printf("Usage: backwards [filename]\n"); return 1; }
FILE *fileptr = fopen(argv[1], "rb");
if( fileptr == NULL ) { perror("Error:"); return 1; }
fseek(fileptr, -1, SEEK_END); /* Seek to END of the file just before EOF */
read_file(fileptr);
return 0;
}
In an attempt to simply read one line and buffer it. Sorry that my code is terrible, I am getting so very confused. I know that you would normally allocate memory for the whole file and then read in the data, but for large files that constantly change I thought it would be better to read directly (especially if I want to search for text in a file).
Thanks in advance
* Sorry forgot to mention this will be used on Linux, so newlines are just NL without CR. *
You could just pipe the input through the program tac, which is like cat but backwards!
http://linux.die.net/man/1/tac
I recommend a more portable (hopefully) way of file size determination since fseek(binaryStream, offset, SEEK_END) is not guaranteed to work. See the code below.
I believe that files should be at least minimally buffered at the kernel level (e.g. buffering at least one block per file by default), so seeks should not incur significant amount of extra I/O and should only advance the file position internally. If the default buffering is not satisfactory, you may try to use setvbuf() to speed up the I/O.
#include <limits.h>
#include <string.h>
#include <stdio.h>
/* File must be open with 'b' in the mode parameter to fopen() */
long fsize(FILE* binaryStream)
{
long ofs, ofs2;
int result;
if (fseek(binaryStream, 0, SEEK_SET) != 0 ||
fgetc(binaryStream) == EOF)
return 0;
ofs = 1;
while ((result = fseek(binaryStream, ofs, SEEK_SET)) == 0 &&
(result = (fgetc(binaryStream) == EOF)) == 0 &&
ofs <= LONG_MAX / 4 + 1)
ofs *= 2;
/* If the last seek failed, back up to the last successfully seekable offset */
if (result != 0)
ofs /= 2;
for (ofs2 = ofs / 2; ofs2 != 0; ofs2 /= 2)
if (fseek(binaryStream, ofs + ofs2, SEEK_SET) == 0 &&
fgetc(binaryStream) != EOF)
ofs += ofs2;
/* Return -1 for files longer than LONG_MAX */
if (ofs == LONG_MAX)
return -1;
return ofs + 1;
}
/* File must be open with 'b' in the mode parameter to fopen() */
/* Set file position to size of file before reading last line of file */
char* fgetsr(char* buf, int n, FILE* binaryStream)
{
long fpos;
int cpos;
int first = 1;
if (n <= 1 || (fpos = ftell(binaryStream)) == -1 || fpos == 0)
return NULL;
cpos = n - 1;
buf[cpos] = '\0';
for (;;)
{
int c;
if (fseek(binaryStream, --fpos, SEEK_SET) != 0 ||
(c = fgetc(binaryStream)) == EOF)
return NULL;
if (c == '\n' && first == 0) /* accept at most one '\n' */
break;
first = 0;
if (c != '\r') /* ignore DOS/Windows '\r' */
{
unsigned char ch = c;
if (cpos == 0)
{
memmove(buf + 1, buf, n - 2);
++cpos;
}
memcpy(buf + --cpos, &ch, 1);
}
if (fpos == 0)
{
fseek(binaryStream, 0, SEEK_SET);
break;
}
}
memmove(buf, buf + cpos, n - cpos);
return buf;
}
int main(int argc, char* argv[])
{
FILE* f;
long sz;
if (argc < 2)
{
printf("filename parameter required\n");
return -1;
}
if ((f = fopen(argv[1], "rb")) == NULL)
{
printf("failed to open file \'%s\'\n", argv[1]);
return -1;
}
sz = fsize(f);
// printf("file size: %ld\n", sz);
if (sz > 0)
{
char buf[256];
fseek(f, sz, SEEK_SET);
while (fgetsr(buf, sizeof(buf), f) != NULL)
printf("%s", buf);
}
fclose(f);
return 0;
}
I've only tested this on windows with 2 different compilers.
There are quite a few ways you could do this, but reading a byte at a time is definitely one of poorer choices.
Reading the last, say, 4KB and then walking back up from the last character to the previous newline would be my choice.
Another option is to mmap the file, and just pretend that the file is a lump of memory, and scan backwards in that. [You can tell mmap you are reading backwards too, to make it prefetch data for you].
If the file is VERY large (several gigabytes), you may want to only use a small portion of the file in mmap.
If you want to learn how to do it, here's a Debian/Ubuntu example (for other like RPM based distros, adapt as needed):
~$ which tac
/usr/bin/tac
~$ dpkg -S /usr/bin/tac
coreutils: /usr/bin/tac
~$ mkdir srcs
~$ cd srcs
~/srcs$ apt-get source coreutils
(clip apt-get output)
~/srcs$ ls
coreutils-8.13 coreutils_8.13-3.2ubuntu2.1.diff.gz coreutils_8.13-3.2ubuntu2.1.dsc coreutils_8.13.orig.tar.gz
~/srcs$ cd coreutils-8.13/
~/srcs/coreutils-8.13$ find . -name tac.c
./src/tac.c
~/srcs/coreutils-8.13$ less src/tac.c
That's not too long, a bit over 600 lines, and while it packs some advanced features, and uses functions from other sources, the reverse line buffering implementation seems to be in that tac.c source file.
FSEEKing for every byte sounds PAINFULLY slow.
If you've got the memory, just read the entire file into memory and either reverse it or scan it backwards.
Another option would be Windows memory mapped files.
Hey guys, I'm currently trying to implement a function using C that takes in two file names as command line arguments and compare them lexicographically.
The function will return -1 if the contents of the first file are less than the contents of the second file, 1 if the contents of the second file are less than the contents of the first file, and 0 if the files are identical.
Please give me some advice on how I should start with this.
[EDIT]
Hey guys sorry if there's any unclear part in the question, so I'll just post the link to the question here: Original question. Thing is it's an uni assignment so we're expected to do it using only basic C properties, probably only including stdio.h, stdlib.h, and string.h. Sorry for the trouble caused. Also here's the code I already have, my main problem now is that the function doesn't know that file1.txt (refer to the link) has it's first line longer than file2.txt, but is actually lexicographically less:
int filecmp(char firstFile[], char secondFile[])
{
int similarity = 0;
FILE *file1 = fopen(firstFile, "r");
FILE *file2 = fopen(secondFile, "r");
char line1[BUFSIZ];
char line2[BUFSIZ];
while (similarity == 0)
{
if (fgets(line1, sizeof line1, file1) != NULL)
{
if (fgets(line2, sizeof line2, file2) != NULL)
{
int length;
if (strlen(line1) > strlen(line2))
{
length = strlen(line1);
}
else
{
length = strlen(line2);
}
for (int i = 0; i < length; i++)
{
if (line1[i] < line2[i]) similarity = -1;
if (line1[i] > line2[i]) similarity = 1;
}
}
else
{
similarity = 1; //As file2 is empty
}
}
else
{
if (fgets(line2, sizeof line2, file2) != NULL)
{
similarity = -1; // As file1 is empty
}
else break;
}
}
fclose(file1);
fclose(file2);
return similarity;
}
[END EDIT]
Many thanks,
Jonathan Chua
Take a look the source code of the UNIX cmp utility, e.g. here. The relevant file is regular.c. If you can't use mmap, the principle of implementation through fgetc() is the same: keep reading a single character from each of the two files as long as they compare the same. When (if!) you find a difference, return the result of the comparison. The borderline case of one file being proper prefix of the other (e.g. "ABC" "ABCCC") can be resolved by treating EOF as an infinitely small value. This is already neatly solved in C as fgetc() guarantees to return a negative value ONLY on EOF; proper characters are >= 0.
Are you allowed to use strcmp?
If so (untested):
int ret = 0;
while (ret == 0)
{
char line1 [ MAX_LINE_LEN ];
char line2 [ MAX_LINE_LEN ];
if (fgets(line1, MAX_LINE_LEN, file1) != NULL )
{
if (fgets(line2, MAX_LINE_LEN, file2) != NULL )
{
ret = strcmp(line1, line2);
}
else
{
ret = 1;
}
}
else
{
if (fgets(line2, MAX_LINE_LEN, file2) != NULL )
{
ret = -1;
}
else
{
break;
}
}
}
return ret;