Strange output from custom string object - c

I'm working on the second half of a program for class and the objective of the program is simple, but I can't figure out what's causing this output for my program. Basically, we have to read a file, using a function we wrote for the string header. We should then print out all the four-letter words in that file, obviously ignoring punctuation and whitespace. I've got the logic for that down, but what I can't figure out is why, even though I check to see if the length of the string is 4 before printing it, I sometimes get output that's clearly longer than 4. Here is input text from the file I'm using.
This is a test of the program which will only print out the four letter words in this file. Let's see if it works!
And this is the output I'm getting...
This
test
willham
onlyham
fourtam
thissrm
filesrm
Here is the main program: http://pastebin.com/xviETPFm
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include "mystring.h"
int fTerminate(char ch, int * pbDiscardChar);
int main(int argc, char ** argv) {
MYSTRING str;
FILE * in;
if((str = mystring_init_default()) == MYSTRING_STATUS_ERROR) {
printf("Error initializing MYSTRING object.\n");
return -1;
}
if((in = fopen("book.txt", "r")) == NULL) {
printf("Error opening file \"book.txt\". Does the file exist?\n");
return -1;
}
while(mystring_input(str, in, 1, fTerminate) != MYSTRING_STATUS_ERROR) {
if(mystring_size(str) == 4) {
mystring_output(str, stdout);
printf("\n");
}
}
mystring_destroy(&str);
return 0;
}
int fTerminate(char ch, int * pbDiscardChar) {
// Terminate on whitespace characters or non-alpha characters.
return (*pbDiscardChar = ((isspace(ch) || (isalpha(ch) == 0))?1:0));
}
And just in case you need it, here is the input function: http://pastebin.com/vD71hGEt
MyString_Status mystring_input(MYSTRING hString,
FILE * hFile,
int bIgnoreLeadingWhiteSpace,
int (*fTerminate)(char ch, int * pbDiscardChar)) {
char ch = '\0';
int eofCheck = 0;
int t, discard;
mystring_truncate(hString, 0);
if(hFile == NULL) return MYSTRING_STATUS_ERROR;
eofCheck = fscanf(hFile, "%c", &ch);
// If bIgnoreWhiteSpace is true, gobble leading whitespace.
if(bIgnoreLeadingWhiteSpace) {
while(isspace(ch)) {
eofCheck = fscanf(hFile, "%c", &ch);
if(eofCheck == EOF) return MYSTRING_STATUS_ERROR;
}
}
// Add all valid characters to the string, overwriting the old string.
while(eofCheck != EOF) {
t = fTerminate(ch, &discard);
if(discard == 0) mystring_push(hString, ch);
if(t) return MYSTRING_STATUS_SUCCESS;
eofCheck = fscanf(hFile, "%c", &ch);
}
if(eofCheck == EOF) return MYSTRING_STATUS_ERROR;
return MYSTRING_STATUS_SUCCESS;
}
It clearly works for the first two strings, so what happened with the rest of them? Does my computer just like ham?

Related

Reading only letters in C using fscanf

Is it possible to use fscanf to read words without symbols from a text file
this function prints one word on a single line but if the word had comma or brackets it will print those too is there anyway to only print letters?
void load(const char *file)
{
FILE *inFile = fopen(file , "r");
if (inFile == NULL )
{
return false;
}
char word[LENGTH];
while (fscanf(inFile, "%s", word) != EOF)
{
printf("%s\n", word);
}
}
Your function will not compile at all as it is returning value and it is declared as void
Read char by char and print only what you want.
#include <stdio.h>
#include <stdbool.h>
#include <ctype.h>
bool load(const char *file)
{
FILE *inFile = fopen(file , "r");
if (inFile == NULL )
{
return false;
}
char word;
int retval;
while ((retval = fscanf(inFile, "%c", &word)) != EOF && retval == 1)
{
if(isalnum((unsigned char)word) || isspace((unsigned char)word))
printf("%c", word);
}
fclose(inFile);
return true;
}
Is it possible to use fscanf to read words without symbols from a text file
Well, yes...
You can use a scanset to tell which characters to match. Like:
%[a-zA-Z]
This will match/accept all upper and lower case letters and reject all other characters.
But... That will give you another problem. When the next character the the file doesn't match, the file pointer won't be advanced and you are kind of stuck. To handle that you need a way to skip non-matching characters, e.g. fgetc to read a single character.
Something like:
char word[32];
while (1)
{
int res = fscanf(inFile, "%31[a-zA-Z]", word); // At most 31 characters and only letters
if (res == EOF) break;
if (res == 1)
{
printf("%s\n", word);
}
else
{
// Didn't match, skip a character
if (fgetc(inFile) == EOF) break;
}
}
With a file like:
Hallo World, having() fun ....
Oh,yes...
the output is
Hallo
World
having
fun
Oh
yes
An alternative that only uses fsanf could be:
char word[32];
while (1)
{
int res = fscanf(inFile, "%31[a-zA-Z]", word); // Read letters
if (res == EOF) break;
if (res == 1)
{
printf("%s\n", word);
}
res = fscanf(inFile, "%31[^a-zA-Z]", word); // Skip non-letters
if (res == EOF) break;
}
Notice the ^ in the second scanset. It changes the meaning of the scanset to be "Don't match ...." So the code alternate between "Read a word consisting of letters" and "Skip everything not being letters" which is likely a better way of doing this than the fgetc method above.
That said, I normally prefer reading the file using fgets and then parse the buffer afterwards (e.g. using sscanf) but that's another story.

How to make a C program that can read a data and copy some in a variable?

I'm a student, I am wondering...
How can I make a program that can Get some data from my text file to a variable on my program and print them
Example:
My Text File
I,Ate,Cookies
She,Drink,Coffee
Tom,Wears,Pyjamas
My code
main()
{
FILE *fp=fileopen("c:\\textfile.txt","r");
char name[20],action[20],item[20];
prinf("Enter name: \n");
scanf("%s",&name);
/* I dont Know what to do next */
}
I though about some checking code:
if (name==nametxt) /*nametxt is the first line on the text file */
{
printf("%s\n %s\n %s\n",name,action,item);
}
If the name is "I",the output would look like this :
Enter name:
I
I
Eat
Cookies
A help will satisfy my curiosity thanks in advance
You are reading characters from file until you receive new line character (\n) or fill an array, then you return characters stored in an array passed by caller.
From this returned array you may get separated values with strtok.
Repeat until you receive 0 from getline (Getline received EOF from file.)
Here is simple example with your own getline function which you may modify.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int getline(char s[],int lim, FILE * fp)
{
int c, i;
for (i=0; i < lim-1 && (c=fgetc(fp))!=EOF && c!='\n'; ++i)
{
s[i] = c;
}
if (c == '\n')
{
s[i] = c;
++i;
}
s[i] = '\0';
return i;
}
int main()
{
FILE * fp = fopen("c:\\textfile.txt", "r");
char line[100];
char * ptr;
while (getline(line, 100, fp))
{
ptr = strtok(line, ",");
while( ptr != NULL )
{
printf(" %s\n", ptr);
ptr = strtok(NULL, ",");
}
}
return 0;
}
Output
I
Ate
Cookies
She
Drink
Coffee
Tom
Wears
Pyjamas
Storing strings into variable isnt tough, here is an example
strcpy(name, ptr);
But be careful, writing outside of bounds have undefined behavior.
strncpy(name, ptr, 100); You can limit number of copied characters with strncpy, but be careful, this function is error-prone.
You can do like this,
Go on reading characters from a file, after every character is read compare with ',' character.
If the character read is ',' then you have finished reading the name, otherwise store it in a character array and continue reading the file.
Once you hit ',' character, terminate the character array with null character(Now you have a complete name with you).
Compare this character array with a string you receive as input using a strcmp(String compare function). If its it matches decide what you wanna do?
I hope i am clear.
There is different ways to read data from a FILE * in C :
You read only one character : int fgetc(FILE *fp);.
You read a whole line : char *fgets(char *buf, int n, FILE *fp); (take care to buf, it must point to allocate memory).
You read a formatted string, which is your case here : int fscanf(FILE *stream, const char *format, ...), it works like printf() :
This way :
char name[20], action[20], item[20];
FILE *f = fopen("myfile.txt", "r");
if (! f)
return;
if (3 == fscanf(f, "%19[^,\n],%19[^,\n],%19[^,\n]\n", name, action, item))
printf("%s %s %s\n", name, action, item)
%30[^,\n], here is used to read of whole object of your line, except , or \n, which will read item by item the content of your string.
start with like this
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define DATA_FILE "data.txt"
#define LEN 19
#define SIZE (LEN+1)
//Stringification
#define S_(n) #n
#define S(n) S_(n)
enum { NOT_FOUND, FIND };
int pull_data(const char name[SIZE], char action[SIZE], char item[SIZE]){
int ret = NOT_FOUND;
FILE *fp = fopen(DATA_FILE, "r");//fileopen --> fopen
if(fp == NULL){
perror("fopen:");
exit(EXIT_FAILURE);
} else {
char nametxt[SIZE];
*action = *item = 0;
while(fscanf(fp, "%" S(LEN) "[^,],%" S(LEN) "[^,],%" S(LEN) "[^\n]%*c", //"%19[^,],%19[^,],%19[^\n]%*c"
nametxt, action, item) == 3){
if(strcmp(name, nametxt) == 0){//Use strcmp for comparison of strings
ret = FIND;
break;
}
}
}
fclose(fp);
return ret;
}
int main(void){
char name[SIZE], action[SIZE], item[SIZE];
printf("Enter name: \n");//prinf --> printf
if(scanf("%" S(LEN) "s", name) == 1){
if(pull_data(name, action, item) == FIND){
printf("%s\n%s\n%s\n", name, action, item);
} else {
printf("%s not found.\n", name);
}
}
}

proper use of scanf in a while loop to validate input

I made this code:
/*here is the main function*/
int x , y=0, returned_value;
int *p = &x;
while (y<5){
printf("Please Insert X value\n");
returned_value = scanf ("%d" , p);
validate_input(returned_value, p);
y++;
}
the function:
void validate_input(int returned_value, int *p){
getchar();
while (returned_value!=1){
printf("invalid input, Insert Integers Only\n");
getchar();
returned_value = scanf("%d", p);
}
}
Although it is generally working very well but when I insert for example "1f1" , it accepts the "1" and does not report any error and when insert "f1f1f" it reads it twice and ruins the second read/scan and so on (i.e. first read print out "invalid input, Insert Integers Only" and instead for waiting again to re-read first read from the user, it continues to the second read and prints out again "invalid input, Insert Integers Only" again...
It needs a final touch and I read many answers but could not find it.
If you don't want to accept 1f1 as valid input then scanf is the wrong function to use as scanf returns as soon as it finds a match.
Instead read the whole line and then check if it only contains digits. After that you can call scanf
Something like:
#include <stdio.h>
int validateLine(char* line)
{
int ret=0;
// Allow negative numbers
if (*line && *line == '-') line++;
// Check that remaining chars are digits
while (*line && *line != '\n')
{
if (!isdigit(*line)) return 0; // Illegal char found
ret = 1; // Remember that at least one legal digit was found
++line;
}
return ret;
}
int main(void) {
char line[256];
int i;
int x , y=0;
while (y<5)
{
printf("Please Insert X value\n");
if (fgets(line, sizeof(line), stdin)) // Read the whole line
{
if (validateLine(line)) // Check that the line is a valid number
{
// Now it should be safe to call scanf - it shouldn't fail
// but check the return value in any case
if (1 != sscanf(line, "%d", &x))
{
printf("should never happen");
exit(1);
}
// Legal number found - break out of the "while (y<5)" loop
break;
}
else
{
printf("Illegal input %s", line);
}
}
y++;
}
if (y<5)
printf("x=%d\n", x);
else
printf("no more retries\n");
return 0;
}
Input
1f1
f1f1
-3
Output
Please Insert X value
Illegal input 1f1
Please Insert X value
Illegal input f1f1
Please Insert X value
Illegal input
Please Insert X value
x=-3
Another approach - avoid scanf
You could let your function calculate the number and thereby bypass scanf completely. It could look like:
#include <stdio.h>
int line2Int(char* line, int* x)
{
int negative = 0;
int ret=0;
int temp = 0;
if (*line && *line == '-')
{
line++;
negative = 1;
}
else if (*line && *line == '+') // If a + is to be accepted
line++; // If a + is to be accepted
while (*line && *line != '\n')
{
if (!isdigit(*line)) return 0; // Illegal char found
ret = 1;
// Update the number
temp = 10 * temp;
temp = temp + (*line - '0');
++line;
}
if (ret)
{
if (negative) temp = -temp;
*x = temp;
}
return ret;
}
int main(void) {
char line[256];
int i;
int x , y=0;
while (y<5)
{
printf("Please Insert X value\n");
if (fgets(line, sizeof(line), stdin))
{
if (line2Int(line, &x)) break; // Legal number - break out
printf("Illegal input %s", line);
}
y++;
}
if (y<5)
printf("x=%d\n", x);
else
printf("no more retries\n");
return 0;
}
Generally speaking, it is my opinion that you are better to read everything from the input (within the range of your buffer size, of course), and then validate the input is indeed the correct format.
In your case, you are seeing errors using a string like f1f1f because you are not reading in the entire STDIN buffer. As such, when you go to call scanf(...) again, there is still data inside of STDIN, so that is read in first instead of prompting the user to enter some more input. To read all of STDIN, you should do something the following (part of code borrowed from Paxdiablo's answer here: https://stackoverflow.com/a/4023921/2694511):
#include <stdio.h>
#include <string.h>
#include <stdlib.h> // Used for strtol
#define OK 0
#define NO_INPUT 1
#define TOO_LONG 2
#define NaN 3 // Not a Number (NaN)
int strIsInt(const char *ptrStr){
// Check if the string starts with a positive or negative sign
if(*ptrStr == '+' || *ptrStr == '-'){
// First character is a sign. Advance pointer position
ptrStr++;
}
// Now make sure the string (or the character after a positive/negative sign) is not null
if(*ptrStr == NULL){
return NaN;
}
while(*ptrStr != NULL){
// Check if the current character is a digit
// isdigit() returns zero for non-digit characters
if(isdigit( *ptrStr ) == 0){
// Not a digit
return NaN;
} // else, we'll increment the pointer and check the next character
ptrStr++;
}
// If we have made it this far, then we know that every character inside of the string is indeed a digit
// As such, we can go ahead and return a success response here
// (A success response, in this case, is any value other than NaN)
return 0;
}
static int getLine (char *prmpt, char *buff, size_t sz) {
int ch, extra;
// Get line with buffer overrun protection.
if (prmpt != NULL) {
printf ("%s", prmpt);
fflush (stdout);
}
if (fgets (buff, sz, stdin) == NULL)
return NO_INPUT;
// If it was too long, there'll be no newline. In that case, we flush
// to end of line so that excess doesn't affect the next call.
// (Per Chux suggestions in the comments, the "buff[0]" condition
// has been added here.)
if (buff[0] && buff[strlen(buff)-1] != '\n') {
extra = 0;
while (((ch = getchar()) != '\n') && (ch != EOF))
extra = 1;
return (extra == 1) ? TOO_LONG : OK;
}
// Otherwise remove newline and give string back to caller.
buff[strlen(buff)-1] = '\0';
return OK;
}
void validate_input(int responseCode, char *prompt, char *buffer, size_t bufferSize){
while( responseCode != OK ||
strIsInt( buffer ) == NaN )
{
printf("Invalid input.\nPlease enter integers only!\n");
fflush(stdout); /* It might be unnecessary to flush here because we'll flush STDOUT in the
getLine function anyway, but it is good practice to flush STDOUT when printing
important information. */
responseCode = getLine(prompt, buffer, bufferSize); // Read entire STDIN
}
// Finally, we know that the input is an integer
}
int main(int argc, char **argv){
char *prompt = "Please Insert X value\n";
int iResponseCode;
char cInputBuffer[100];
int x, y=0;
int *p = &x;
while(y < 5){
iResponseCode = getLine(prompt, cInputBuffer, sizeof(cInputBuffer)); // Read entire STDIN buffer
validate_input(iResponseCode, prompt, cInputBuffer, sizeof(cInputBuffer));
// Once validate_input finishes running, we should have a proper integer in our input buffer!
// Now we'll just convert it from a string to an integer, and store it in the P variable, as you
// were doing in your question.
sscanf(cInputBuffer, "%d", p);
y++;
}
}
Just as a disclaimer/note: I have not written in C for a very long time now, so I do apologize in advance if there are any error in this example. I also did not have an opportunity to compile and test this code before posting because I am in a rush right now.
If you're reading an input stream that you know is a text stream, but that you are not sure only consists of integers, then read strings.
Also, once you've read a string and want to see if it is an integer, use the standard library conversion routine strtol(). By doing this, you both get a confirmation that it was an integer and you get it converted for you into a long.
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
bool convert_to_long(long *number, const char *string)
{
char *endptr;
*number = strtol(string, &endptr, 10);
/* endptr will point to the first position in the string that could
* not be converted. If this position holds the string terminator
* '\0' the conversion went well. An empty input string will also
* result in *endptr == '\0', so we have to check this too, and fail
* if this happens.
*/
if (string[0] != '\0' && *endptr == '\0')
return false; /* conversion succesful */
return true; /* problem in conversion */
}
int main(void)
{
char buffer[256];
const int max_tries = 5;
int tries = 0;
long number;
while (tries++ < max_tries) {
puts("Enter input:");
scanf("%s", buffer);
if (!convert_to_long(&number, buffer))
break; /* returns false on success */
printf("Invalid input. '%s' is not integer, %d tries left\n", buffer,
max_tries - tries);
}
if (tries > max_tries)
puts("No valid input found");
else
printf("Valid input: %ld\n", number);
return EXIT_SUCCESS;
}
ADDED NOTE: If you change the base (the last parameter to strtol()) from 10 to zero, you'll get the additional feature that your code converts hexadecimal numbers and octal numbers (strings starting with 0x and 00 respectively) into integers.
I took #4386427 idea and just added codes to cover what it missed (leading spaces and + sign), I tested it many times and it is working perfectly in all possible cases.
#include<stdio.h>
#include <ctype.h>
#include <stdlib.h>
int validate_line (char *line);
int main(){
char line[256];
int y=0;
long x;
while (y<5){
printf("Please Insert X Value\n");
if (fgets(line, sizeof(line), stdin)){//return 0 if not execute
if (validate_line(line)>0){ // check if the string contains only numbers
x =strtol(line, NULL, 10); // change the authentic string to long and assign it
printf("This is x %d" , x);
break;
}
else if (validate_line(line)==-1){printf("You Have Not Inserted Any Number!.... ");}
else {printf("Invalid Input, Insert Integers Only.... ");}
}
y++;
if (y==5){printf("NO MORE RETRIES\n\n");}
else{printf("%d Retries Left\n\n", (5-y));}
}
return 0;}
int validate_line (char *line){
int returned_value =-1;
/*first remove spaces from the entire string*/
char *p_new = line;
char *p_old = line;
while (*p_old != '\0'){// loop as long as has not reached the end of string
*p_new = *p_old; // assign the current value the *line is pointing at to p
if (*p_new != ' '){p_new++;} // check if it is not a space , if so , increment p
p_old++;// increment p_old in every loop
}
*p_new = '\0'; // add terminator
if (*line== '+' || *line== '-'){line++;} // check if the first char is (-) or (+) sign to point to next place
while (*line != '\n'){
if (!(isdigit(*line))) {return 0;} // Illegal char found , will return 0 and stop because isdigit() returns 0 if the it finds non-digit
else if (isdigit(*line)){line++; returned_value=2;}//check next place and increment returned_value for the final result and judgment next.
}
return returned_value; // it will return -1 if there is no input at all because while loop has not executed, will return >0 if successful, 0 if invalid input
}

Parsing simple name/value pair settings in config file with leading and terminating spaces - C

This is the code I made so far. I apologize if my buffer sizes are an overkill.
The idea is to read the entire configuration file (in this example, it's file.conf), and for now we assume it exists. I'll add error checking later.
Once the file is read into stack space, then the getcfg() function searches the configuration data for the specified name, and if it's found, returns the corresponding value. My function works when the configuration file contains leading spaces before names or values; such spaces are ignored.
Say this is my configuration file:
something=data
apples=oranges
fruit=banana
animals= cats
fried =chicken
My code will work correctly with the first four entries of the config file. for example, if I use "something" as the name, then "data" will be returned.
The last item won't work as of yet because of the trailing spaces after "fried" and before the =. I want to be able to have my function automatically remove those spaces, too, especially in case an option format such as
somethingelse = items
begins to be used. (Note the spaces on both sides of the = sign.)
What can I do to make a less CPU-intensive version of my program that also detects and removes trailing spaces from the name and value when processing the name and values?
Here's my current code:
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
int getcfg(char* buf, char *name, char *val) {
int fl = 0, n = 0;
char cfg[1][10000], *p = buf;
memset(cfg, 0, sizeof(cfg));
while (*p) {
if (*p == '\n') {
if (strcmp(cfg[0], name) == 0) {
strcpy(val, cfg[1]);
return 1;
}
memset(cfg, 0, sizeof(cfg));
n = 0;
fl = 0;
} else {
if (*p == '=') {
n = 0;
fl = 1;
} else {
if (n != 0 || *p != ' ') {
cfg[fl][n] = *p;
n++;
}
}
}
p++;
}
return 0;
}
int main() {
char val[10000], buf[100000]; //val=value of config item, buf=buffer for entire config file ( > 100KB config file is nuts)
memset(buf, 0, sizeof(buf));
memset(val, 0, sizeof(val));
int h = open("file.conf", O_RDONLY);
if (read(h, buf, sizeof(buf)) < 1) {
printf("Can't read\n");
}
close(h);
printf("Value stat = %d ", getcfg(buf, "Item", val));
printf("Result = '%s'\n", val);
return 0;
}
Behold is a small (~15 lines) sscanf-based read_params() function which does the job. As a bonus, it understands the comments and complains about erroneous lines (if any):
$ cat config_file.c
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#include <sys/errno.h>
#define ARRAY_SIZE(a) ((sizeof (a)) / (sizeof (a)[0]))
enum { MAX_LEN=128 };
struct param {
char name[MAX_LEN];
char value[MAX_LEN];
};
void strtrim(char *s)
{
char *p = s + strlen(s);
while (--p >= s && isspace(*p))
*p = '\0';
}
int read_params(FILE *in, struct param *p, int max_params)
{
int ln, n=0;
char s[MAX_LEN];
for (ln=1; max_params > 0 && fgets(s, MAX_LEN, in); ln++) {
if (sscanf(s, " %[#\n\r]", p->name)) /* emty line or comment */
continue;
if (sscanf(s, " %[a-z_A-Z0-9] = %[^#\n\r]",
p->name, p->value) < 2) {
fprintf(stderr, "error at line %d: %s\n", ln, s);
return -1;
}
strtrim(p->value);
printf("%d: name='%s' value='%s'\n", ln, p->name, p->value);
p++, max_params--, n++;
}
return n;
}
int main(int argc, char *argv[])
{
FILE *f;
struct param p[32];
f = argc == 1 ? stdin : fopen(argv[1], "r");
if (f == NULL) {
fprintf(stderr, "failed to open `%s': %s\n", argv[1],
strerror(errno));
return 1;
}
if (read_params(f, p, ARRAY_SIZE(p)) < 0)
return 1;
return 0;
}
Let's see how it works (quotes mark the beginning and the end of each line for clarity):
$ cat bb | sed -e "s/^/'/" -e "s/$/'/" | cat -n
1 'msg = Hello World! '
2 'p1=v1'
3 ' p2=v2 # comment'
4 ' '
5 'P_3 =v3'
6 'p4= v4#comment'
7 ' P5 = v5 '
8 ' # comment'
9 'p6 ='
$ ./config_file bb
1: name='msg' value='Hello World!'
2: name='p1' value='v1'
3: name='p2' value='v2'
5: name='P_3' value='v3'
6: name='p4' value='v4'
7: name='P5' value='v5'
error at line 9: p6 =
Note: as an additional bonus, the value can be anything, except #\n\r chars, including spaces, as can be seen above with the 'Hello World!' example. If it's not what needed, add space and tab into the exception list at the second sscanf() for the value (or specify accepted characters there instead) and drop strtrim() function.
I'll provide a straight-forward version, with everything being done in main and no key:value saving - the function only recognizes where they are and print them. I used the input file you gave and added one more line in the end as something = more_data.
This version of the parser does not recognize multiple data itens (itens separated by spaces in the data fields, you'll have to figure it out as an exercise).
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main(void)
{
int fd = open("file.conf", O_RDONLY, 0);
int i = 0;
char kv[100];
char c;
while (read(fd,&c,1) == 1) {
/* ignoring spaces and tabs */
if (c == '\t' || c == ' ') continue;
else if (c == '=') {
/* finished reading a key */
kv[i] = 0x0;
printf("key found [%s] ", kv);
i = 0;
continue;
} else if (c == '\n') {
/* finished reading a value */
kv[i] = 0x0;
printf(" with data [%s]\n", kv);
i = 0;
continue;
}
kv[i++] = c;
}
close(fd);
return 0;
}
And the output is:
key found [something] with data [data]
key found [apples] with data [oranges]
key found [fruit] with data [banana]
key found [animals] with data [cats]
key found [fried] with data [chicken]
key found [something] with data [more_data]
Explanation
while (read(fd,&c,1) == 1): reads one character at a time from the file.
if (c == '\t' || c == ' ') continue;: this is responsible for ignoring the white-spaces and tabs wherever they are.
else if (c == '='): If the program finds a = character, it concludes that what it just read was a key and treats it. What's inside that if should be easy to understand.
else if (c == '\n'): Then it uses a new-line character to recognize the end of a value. Again, what's inside the if is not hard to understand.
kv[i++] = c;: This is where we save the char value into the buffer kv.
So, with some minor changes, you can adapt this bit of code to become a parsing function that will suit your needs.
Edit and new code
As pointed out by John Bollinger in the comments, using read inside a while to read one character at a time is very costly. I'll post a second version of the program using the same input method OP was using (reading the whole file at once into a buffer) and then parsing it with another function.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
void parse(char *s)
{
char c, kv[100];
int i;
while ((c = *s++)) {
/* ignoring spaces and tabs */
if (c == '\t' || c == ' ') continue;
else if (c == '=') {
/* finished reading a key */
kv[i] = 0x0;
printf("key found [%s] ", kv);
i = 0;
continue;
} else if (c == '\n') {
/* finished reading a value */
kv[i] = 0x0;
printf(" with data [%s]\n", kv);
i = 0;
continue;
}
kv[i++] = c;
}
}
int main(void)
{
int fd = open("file.conf", O_RDONLY, 0);
char buffer[1000];
/* use the reading method that suits you best */
read(fd, buffer, sizeof buffer);
/* only thing parse() expects is a null-terminated string */
parse(buffer);
close(fd);
return 0;
}
It is very unusual to read a whole config file into memory as a flat image, and especially to keep such an image as the internal representation. One would ordinarily parse the file contents into key/value pairs as you go, and store a representation of those pairs.
Also, your use of read() is incorrect, as you cannot safely assume that it will read all bytes of the file in one call. One normally must call read() in a loop, keeping track of the return value from each call to know both when the end of the file is reached and where in the buffer to put the next bytes read.
If the configuration is supposed to be completely generic, so that you don't know in advance what keywords to expect, then you might organize the configuration data in a hash table or a binary search tree, with the parameter names as the keys. If you do know what parameters to expect (or at least which to allow), then you might have a variable or a struct member for each one.
Naturally, the approach to parameter lookup must be paired correctly with the data structure in which you store the parameters. Any of the approaches I suggested will make looking up multiple configuration parameters far faster. They would also avoid wasting memory, and would adapt to extremely large configurations (or at least could do so).
How best to approach reading the file depends on details of your config file format, such as whether keys and/or values are permitted to contain internal spaces, whether more than one key/value pair may appear on the same line, and whether there is an upper bound on the allowed length of config file lines or of keys and values. Here's an approach that expects one key/value pair per line, supports keys and values that contain internal whitespace (but not newlines), but neither of which is longer than 1023 characters, and where keys are not permitted to contain the '=' character:
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <assert.h>
int main() {
char key[1024];
char value[1024];
FILE *config;
int done;
config = fopen("file.conf", "r");
if (!config) {
perror("while opening file.conf");
return 1;
}
do {
char nl = '\0';
int nfields = fscanf(config, " %1023[^=\n]= %1023[^\n]%c", key, value, &nl);
int i;
done = 1;
if (nfields == EOF) {
if (ferror(config)) {
/* handle read error ... */
perror("while reading file.conf");
} else {
/* trailing empty line(s); ignore ... */
}
break;
} else if (nfields == 3) {
if (nl != '\n') {
/* handle excessive-length value ... */
} else {
done = 0;
}
} else if (nfields == 1) {
/* handle excessive-length key ... */
break;
} else {
assert(nfields == 2);
/* last key/value pair, not followed by a newline */
}
if (key[0] == '=') {
/* handle missing key ... */
break;
}
/* successfully read a key / value pair; truncate trailing whitespace */
for (i = strlen(key); key[--i] == ' '; ) {
/* nothing */
}
key[i + 1] ='\0';
for (i = strlen(value); value[--i] == ' '; ) {
/* nothing */
}
value[i + 1] ='\0';
/* record the key / value pair somewhere (but here we just print it) ... */
printf("key: [%s] value: [%s]\n", key, value);
} while (!done);
fclose(config);
return 0;
}
Important points to note about that include:
No mechanism for storing the key / value pairs is provided. I gave you a few options, and there are others, but you must decide what's best for your own purposes. Rather, the program above addresses the problem of parsing your config data once for all, so that you can avoid parsing it de novo every time you perform a lookup.
The code relies on fscanf() to consume any leading whitespace before the key and value, but in order to accommodate internal whitespace in the key and value, it cannot do the same for trailing whitespace.
Instead, it manually trims trailing whitespace from key and value.
The fscanf() format uses explicit field widths to avoid buffer overruns. It uses the %[ and %c field descriptors to scan data that may be or include whitespace.
Although it may look longish, do note how much of that code is dedicated to error handling.
Divide and conquer.
Getting the data and parsing it are best handled with 2 separate routines.
1) Use fgets() or other code with read() to read a line
int foo(FILE *inf) {
char buffer[1000];
while (fgets(buffer, sizeof buffer, inf)) {
if (Parse_KeyValue(buffer, &key_offset, &value_offset)) {
fprintf(stderr, "Bad Line '%s'\n", buffer);
return 1;
}
printf("'%s'='%s'\n", &buffer[key_offset], &buffer[value_offset]);
}
}
2) Parse the line. (Sample unchecked code)
// 0: Success
// 1: failure
int Parse_KeyValue(char *line, size_t *key_offset, size_t *value_offset) {
char *p = line;
while (isspace((unsigned char) *p)) p++;
*key_offset = p - line;
const char *end = p;
while (*p != '=') {
if (*p == '\0') return 1; // fail, no `=` found
if (!isspace((unsigned char) *p)) {
end = p+1;
}
p++;
}
*end = '\0';
p++; // consume `=`
while (isspace((unsigned char) *p)) p++;
*value_offset = p - line;
end = p;
while (*p) {
if (!isspace((unsigned char) *p)) {
end = p+1;
}
p++;
}
*end = '\0';
return 0;
}
This does allow for valid "" key and value. Adjust as needed.

Prints new line after '\0' character in C

I'm currently doing an assignment where we are to recreate three switches of the cat command, -n/-T/-E. We are to compile and enter in two parameters, the switch and the file name. I store the textfile contents into a buffer.
int main(int argc, char *argv[]){
int index = 0;
int number = 1;
int fd, n, e, t;
n = e = t = 0;
char command[5];
char buffer[BUFFERSIZE];
strcpy(command, argv[1]);
fd = open(argv[2], O_RDONLY);
if( fd == -1)
{
perror(argv[2]);
exit(1);
}
read(fd, buffer,BUFFERSIZE);
if( !strcmp("cat", command)){
printf("%s\n", buffer);
}
else if( !strcmp("-n", command)){
n = 1;
}
else if( !strcmp("-E", command)){
e = 1;
}
else if( !strcmp("-T", command)){
t = 1;
}
else if( !strcmp("-nE", command) || !strcmp("-En", command)){
n = e = 1;
}
else if( !strcmp("-nT", command) || !strcmp("-Tn", command)){
n = t = 1;
}
else if( !strcmp("-ET", command) || !strcmp("-TE", command)){
t = e = 1;
}
else if( !strcmp("-nET", command) || !strcmp("-nTE", command) ||
!strcmp("-TnE", command) || !strcmp("-EnT", command) ||
!strcmp("-ETn", command) || !strcmp("-TEn", command)){
n = e = t = 1;
}
else{
printf("Invalid Switch Entry");
}
if(n){
printf("%d ", number++);
}
while(buffer[index++] != '\0' && ( n || e || t)){
if(buffer[index] == '\n' && e && n){
printf("$\n%d ", number++);
}
else if(buffer[index] == '\n' && e){
printf("$\n");
}
else if(buffer[index] == '\t' && t){
printf("^I");
}
else if(buffer[index] == '\n' && n){
printf("\n%d ", number++);
}
else {
printf("%c", buffer[index]);
}
}
printf("\n");
close(fd);
return 0;
}
Everything works perfectly except when I try to use the -n command. It adds an extra new line. I use a textfile that has
hello
hello
hello world!
instead of
1 hello
2 hello
3 hello world!
it will print out this:
1 hello
2 hello
3 hello world!
4
For some reason it adds the extra line after the world!
Am I missing something simple?
This might not fix your problem, but I don't see any code to put the terminating null character in buffer. Try:
// Reserve one character for the null terminator.
ssize_t n = read(fd, buffer, BUFFERSIZE-1);
if ( n == -1 )
{
// Deal with error.
printf("Unable to read the contents of the file.\n");
exit(1); //???
}
buffer[n] = '\0';
The three cat options that you implement have different "modes":
-T replaces a character (no tab is written);
-E prepends a character with additional output (the new-line character is still written);
-n prepends each line with additional output.
You can handle the first two modes directly. The third mode requires information from the character before: A new line starts at the start of the file and after a new-line character has been read. So you need a flag to keep track of that.
(Your code prints a line number after a new-line character is found. That means that you have to treat the first line explicitly and that you get one too many line umber at the end. After all, a file with n lines has n new-line characters and you print n + 1 line numbers.)
Other issues:
As R Sahu has pointed out, your input isn't null-terminated. You don't really need a null terminator here: read returns the number of bytes read or an error code. You can use that number as limit for index.
You incmenet index in the while condition, which means that you look at the character after the one you checked inside the loop, which might well be the null character. You will also miss the first character in the file.
In fact, you don't need a buffer here. When the file is larger than you buffer, you truncate it. You could call read in a loop until you read fewer bytes than BUFFERSIZE, but the simplest way in this case is to read one byte after the other and process it.
You use too many compound conditions. This isn't wrong per se, but it makes for complicated code. Your main loop reads like a big switch when there are in fact only a few special cases to treat.
The way you determine the flags is both too complicated and too restricted. You chack all combinations of flags, which is 6 for the case that all flags are given. What if you add another flag? Are you going to write 24 more strcmps? Look for the minus sign as first character and then at the letters one by one, setting flags and printing error messages as you go.
You don't need to copy argv[1] to command; you are only inspecting it. And you are introducing a source of error: If the second argument is longer than 4 characters, you will get undefined behaviour, very likely a crash.
If you don't give any options, the file name should be argv[1] instead of argv[2].
Putting this (sans the flag parsing) into practice:
FILE *f = fopen(argv[2], "r");
int newline = 1; // marker for line numbers
// Error checking
for (;;)
{
int c = fgetc(f); // read one character
if (c == EOF) break; // terminate loop on end of file
if (newline) {
if (n) printf("%5d ", number++);
newline = 0;
}
if (c == '\n') {
newline = 1;
if (e) putchar('$');
}
if (c == '\t' && t) {
putchar('^');
putchar('I');
} else {
putchar(c);
}
}
fclose(f);
Edit: If you are restricted to using the Unix open, close and read, you can still use the approach above. You need an additional loop that reads blocks of a certain size with read. The read function returns the value of the bytes read. If that is less than the number of bytes asked for, stop the loop.
The example below adds yet an additional loop that allows to concatenate several files.
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#define BUFFERSIZE 0x400
int main(int argc, char *argv[])
{
int n = 0;
int e = 0;
int t = 0;
int number = 0;
int first = 1;
while (first < argc && *argv[first] == '-') {
char *str = argv[first] + 1;
while (*str) {
switch (*str) {
case 'n': n = 1; break;
case 'E': e = 1; break;
case 'T': t = 1; break;
default: fprintf(stderr, "Unknown switch -%c.\n", *str);
exit(0);
}
str++;
}
first++;
}
while (first < argc) {
int fd = open(argv[first], O_RDONLY);
int newline = 1;
int bytes;
if (fd == -1) {
fprintf(stderr, "Could not open %s.\n", argv[first]);
exit(1);
}
do {
char buffer[BUFFERSIZE];
int i;
bytes = read(fd, buffer,BUFFERSIZE);
for (i = 0; i < bytes; i++) {
int c = buffer[i];
if (newline) {
if (n) printf("%5d ", number++);
newline = 0;
}
if (c == '\n') {
newline = 1;
if (e) putchar('$');
}
if (c == '\t' && t) {
putchar('^');
putchar('I');
} else {
putchar(c);
}
}
} while (bytes == BUFFERSIZE);
close(fd);
first++;
}
return 0;
}

Resources