how to stop my program from skipping characters before saving them - c

I am making a simple program to read from a file character by character, puts them into tmp and then puts tmp in input[i]. However, the program saves a character in tmp and then saves the next character in input[i]. How do I make it not skip that first character?
I've tried to read into input[i] right away but then I wasn't able to check for EOF flag.
FILE * file = fopen("input.txt", "r");
char tmp;
char input[5];
tmp= getc(file);
input[0]= tmp;
int i=0;
while((tmp != ' ') && (tmp != '\n') && (tmp != EOF)){
tmp= getc(file);
input[i]=tmp;
length++;
i++;
}
printf("%s",input);
It's supposed to print "ADD $02", but instead it prints "DD 02".

You are doing things in the wrong order in your code: The way your code is structures, reading and storing the first char is moved out of the loop. In the loop, that char is then overwritten. In that case start with i = 1.
Perhaps you want to read the first character anyway, but I guess you want to read everything up to the first space, which might be the first character. Then do this:
#include <stdio.h>
int main(void)
{
char input[80];
int i = 0;
int c = getchar();
while (c != ' ' && c != '\n' && c != EOF) {
if (i + 1 < sizeof(input)) { // store char if the is room
input[i++] = c;
}
c = getchar();
}
input[i] = '\0'; // null-terminate input
puts(input);
return 0;
}
Things to note:
The first character is read before the loop. the loop condition and the code that stores the char then use that char. Just before the end of the loop body, the next char is read, which will then be processed in the next iteration.
You don't enforce that the char buffer input cannot be overwritten. This is dangerous, especially since your buffer is tiny.
When you construct strings char by char, you should null-terminate it by placing an explicit '\0' at the end. You have to make sure that there is space for that terminator. Nearly all system functions like puts or printf("%s", ...) expect the string to be null-terminated.
Make the result of getchar an int, so that you can distinguish between all valid character codes and the special value EOF.
The code above is useful if the first and subsequent calls to get the next item are different, for example when tokenizing a string with strtok. Here, you can also choose another approach:
while (1) { // "infinite loop"
int c = getchar(); // read a char first thing in a loop
if (c == ' ' || c == '\n' || c == EOF) break;
// explicit break when done
if (i + 1 < sizeof(input)) {
input[i++] = c;
}
}
This approach has the logic of processing the chars in the loop body only, but you must wrap it in an infinite loop and then use the explicit break.

Related

puts() output is appended "time" string

I get very unexpected output from quite simple code
char ch = getchar(), word[100], *p = word;
while (ch != '\n') {
*(p++) = ch;
ch = getchar();
}
puts(word);
output of any 17 character input is appended by "time" like
12345678901234567time
if exceeds "time" is overwritten like
1234567890123456789me
Am I doing something wrong?
puts expects a pointer to string. And a string needs to have a terminating null character - \0 - to signify where the string ends.
But in your case, you did not write the \0 at the end to signify that the string ends there.
You need to do:
char ch = getchar(), word[100], *p = word;
/* Also check that you are not writing more than 100 chars */
int i = 1;
while(ch != '\n' && i++ < 100){
*(p++) = ch;
ch = getchar();
}
*p = '\0'; /* write the terminaring null character */
puts(word);
Before, when you were not writing the terminating null character you could not expect anything determinate to print. It could also have been 12345678901234567AnyOtherWord or something.
There are multiple issues in your code:
You do not null terminate the string you pass to puts(), invoking undefined behavior... in your case, whatever characters happen to be present in word after the last one read from stdin are printed after these and until (hopefully) a '\0' byte is finally found in memory.
You read a byte from stdin into a char variable: this does not allow you to check for EOF, and indeed you do not.
If you read a long line, you will write bytes beyond the end if the word array, invoking undefined behavior. If the end of file is encountered before a '\n' is read from stdin, you will definitely write beyond the end of the buffer... Try for example giving an empty file as input for your program.
Here is a corrected version:
char word[100];
char *p = word;
int ch;
while ((ch = getchar()) != EOF && ch != '\n') {
/* check for long line: in this case, we truncate the line */
if (p < word + sizeof(word) - 1) {
*p++ = ch;
}
}
*p = '\0';
puts(word);

Is this a valid use of fgetc?

My input stream is from a text file with a list of words separated by the \n character.
The function stringcompare is a function that will compare the equivalence of two strings, case insensitive.
I have two string arrays, word[50] and dict[50]. word is a string that would be given by the user.
Basically what I want to do is pass word[] and each word in the text file as arguments of the stringcompare function.
I've compiled and run this code but it is wrong. Very wrong. What am I doing wrong? Can I even use fgetc() like this? would dict[] even be a string array after the inner loop is done?
char c, r;
while((c = fgetc(in)) != EOF){
while((r = fgetc(in)) != '\n'){
dict[n] = r;
n++;
}
dict[n+1] = '\0'; //is this necessary?
stringcompare(word, dict);
}
It is wrong.
The return value of fgetc() should be stored to int, not char, especially when it will be compared with EOF.
You might forgot to initialize n.
You will miss the first character of each line, which is stored to c.
Use dict[n] = '\0'; instead of dict[n+1] = '\0'; because n is already incremented in the loop.
Possible fix:
int c, r;
while((c = fgetc(in)) != EOF){
ungetc(c, in); // push the read character back to the stream for reading by fgetc later
n = 0;
// add check for EOF and buffer overrun for safety
while((r = fgetc(in)) != '\n' && r != EOF && n + 1 < sizeof(dict) / sizeof(dict[0])){
dict[n] = r;
n++;
}
dict[n] = '\0'; //this is necessary
stringcompare(word, dict);
}

Reading data from a file, only alpha characters

I'm working on a program for school right now in c and I'm having trouble reading text from a file. I've only ever worked in Java before so I'm not completely familiar with c yet and this has got me thoroughly stumped even though I'm sure it's pretty simple.
Here's an example of how the text can be formatted in the file we have to read:
boo22$Book5555bOoKiNg#bOo#TeX123tEXT(JOHN)
I have to take in each word and store it in a data structure, and a word is only alpha characters, so no numbers or special characters. I already have the data structure working properly so I just need to get each word into a char array and then add it to my structure. It has to keep reading each char until it gets to a non-alpha char value. I've tried looking into the different ways to scan in from a file and I'm not sure what would be best for my scenario.
Here's the code I have right now for my input:
char str[MAX_WORD_SIZE];
char c;
int index = 0;
while (fscanf(dictionaryInputFile, "%c", c) != EOF) //while not at end of file
{
if (isalpha(c)) //if current character is a letter
{
tolower(c); //ignores case in word
str[index] = c; //add char to string
index++;
}
else if (str[0] != '\0') //If a word
{
str[index] = '\0'; //Make sure no left over characters in String
dictionaryRoot = insertNode(str, dictionaryRoot); //insert word to dictionary
index = 0; //reset index
str[index] = '\0'; //Set first character to null since word has been added
}
}
My thinking was that if it doesn't hit that first if statement then I have to check if str is a word or not, that's why it checks if the 0 index of str is null or not. I'm guessing the else if statement I have is not right though, but I can't figure out a way to end the current word I'm building and then reset str to null when it's added to my data structure. Right now when I run this I get a segmentation fault if I pass the txt file as an argument.
I'd just like to know if I'm on the right track and if not maybe some help on how I should be reading this data.
This is my first time posting here so I hope I included everything you'll need to help me, if not just let me know and I'd be happy to add more information.
Biggest problem: Incorrect use of fscanf(). #BLUEPIXY
// while (fscanf(dictionaryInputFile, "%c", c) != EOF)
while (fscanf(dictionaryInputFile, "%c", &c) != EOF)
No protection against overflow.
// str[index] = c; //add char to string
if (index >= MAX_WORD_SIZE - 1) Handle_TooManySomehow();
Not sure why testing against '\0' when '\0' is also a non-alpha.
Pedantically, isalpha() is problematic when a signed char is passed. Better to pass the unsigned char value: is...((unsigned char) c)), when code knows it is not EOF. Alternatively, save the input using int ch = fgetc(stream) and use is...(ch)).
Minor: Better to use size_t for array indexes than int, but be careful as size_t is unsigned. size_t is important should the array become large, unlike in this case.
Also, when EOF received, any data in str is ignored, even if it contained a word. #BLUEPIXY.
For the most part, OP is on the right track.
Follows is a sample non-tested approach to illustrate not overflowing the buffer.
Test for full buffer, then read in a char if needed. If a non-alpha found, add to dictionary if a non-zero length work was accumulated.
char str[MAX_WORD_SIZE];
int ch;
size_t index = 0;
for (;;) {
if ((index >= sizeof str - 1) ||
((ch = fgetc(dictionaryInputFile)) == EOF) ||
(!isalpha(ch))) {
if (index > 0) {
str[index] = '\0';
dictionaryRoot = insertNode(str, dictionaryRoot);
index = 0;
}
if (ch == EOF) break;
}
else {
str[index++] = tolower(ch);
}
}

Dynamically created C string

I'm trying to get an expression from the user and put it in a dynamically created string. Here's the code:
char *get_exp() {
char *exp, *tmp = NULL;
size_t size = 0;
char c;
scanf("%c", &c);
while (c != EOF && c != '\n') {
tmp = realloc(exp, ++size * sizeof char);
if (tmp == NULL)
return NULL;
exp = tmp;
exp[size-1] = c;
scanf("%c", &c);
}
tmp = realloc(exp, size+1 * sizeof char);
size++;
exp = tmp;
exp[size] = '\0';
return exp;
}
However, the first character read is a newline char every time for some reason, so the while loop exits. I'm using XCode, may that be the cause of the problem?
No, XCode is not part of your problem (it is a poor workman who blames his tools).
You've not initialized exp, which is going to cause problems.
Your code to detect EOF is completely broken; you must test the return value of scanf() to detect EOF. You'd do better using getchar() with int c:
int c;
while ((c = getchar()) != EOF && c != '\n')
{
...
}
If you feel you must use scanf(), then you need to test each call to scanf():
char c;
while (scanf("%c", &c) == 1 && c != EOF)
{
...
}
You do check the result of realloc() in the loop; that's good. You don't check the result of realloc() after the loop (and you aren't shrinking your allocation); please check every time.
You should consider using a mechanism that allocates many bytes at a time, rather than one realloc() per character read; that is expensive.
Of course, if the goal is simply to read a line, then it would be simplest to use POSIX getline(), which handles all the allocation for you. Alternatively, you can use
fgets() to read the line. You might use a fixed buffer to collect the data, and then copy that to an appropriately sized dynamically allocated buffer. You would also allow for the possibility that the line is very long, so you'd check that you'd actually got the newline.
Here on Windows XP/cc, like Michael said, it works if exp is initialized to NULL.
Here's a fixed code, with comments explaining what is different from your code in the question:
char *get_exp()
{
// keep variables with narrowest scope possible
char *exp = NULL;
size_t size = 0;
// use a "forever" loop with break in the middle, to avoid code duplication
for(;;) {
// removed sizeof char, because that is defined to be 1 in C standard
char *tmp = realloc(exp, ++size);
if (tmp == NULL) {
// in your code, you did not free already reserved memory here
free(exp); // free(NULL) is allowed (does nothing)
return NULL;
}
exp = tmp;
// Using getchar instead of scanf to get EOF,
// type int required to have both all byte values, and EOF value.
// If you do use scanf, you should also check it's return value (read doc).
int ch = getchar();
if (ch == EOF) break; // eof (or error, use feof(stdin)/ferror(stdin) to check)
if (ch == '\n') break; // end of line
exp[size - 1] = ch; // implicit cast to char
}
if (exp) {
// If we got here, for loop above did break after reallocing buffer,
// but before storing anything to the new byte.
// Your code put the terminating '\0' to 1 byte beyond end of allocation.
exp[size-1] = '\0';
}
// else exp = strdup(""); // uncomment if you want to return empty string for empty line
return exp;
}

Reading a file in C

I have an input file I need to extract words from. The words can only contain letters and numbers so anything else will be treated as a delimiter. I tried fscanf,fgets+sscanf and strtok but nothing seems to work.
while(!feof(file))
{
fscanf(file,"%s",string);
printf("%s\n",string);
}
Above one clearly doesn't work because it doesn't use any delimiters so I replaced the line with this:
fscanf(file,"%[A-z]",string);
It reads the first word fine but the file pointer keeps rewinding so it reads the first word over and over.
So I used fgets to read the first line and use sscanf:
sscanf(line,"%[A-z]%n,word,len);
line+=len;
This one doesn't work either because whatever I try I can't move the pointer to the right place. I tried strtok but I can't find how to set delimitters
while(p != NULL) {
printf("%s\n", p);
p = strtok(NULL, " ");
This one obviously take blank character as a delimitter but I have literally 100s of delimitters.
Am I missing something here becasue extracting words from a file seemed a simple concept at first but nothing I try really works?
Consider building a minimal lexer. When in state word it would remain in it as long as it sees letters and numbers. It would switch to state delimiter when encountering something else. Then it could do an exact opposite in the state delimiter.
Here's an example of a simple state machine which might be helpful. For the sake of brevity it works only with digits. echo "2341,452(42 555" | ./main will print each number in a separate line. It's not a lexer but the idea of switching between states is quite similar.
#include <stdio.h>
#include <string.h>
int main() {
static const int WORD = 1, DELIM = 2, BUFLEN = 1024;
int state = WORD, ptr = 0;
char buffer[BUFLEN], *digits = "1234567890";
while ((c = getchar()) != EOF) {
if (strchr(digits, c)) {
if (WORD == state) {
buffer[ptr++] = c;
} else {
buffer[0] = c;
ptr = 1;
}
state = WORD;
} else {
if (WORD == state) {
buffer[ptr] = '\0';
printf("%s\n", buffer);
}
state = DELIM;
}
}
return 0;
}
If the number of states increases you can consider replacing if statements checking the current state with switch blocks. The performance can be increased by replacing getchar with reading a whole block of the input to a temporary buffer and iterating through it.
In case of having to deal with a more complex input file format you can use lexical analysers generators such as flex. They can do the job of defining state transitions and other parts of lexer generation for you.
Several points:
First of all, do not use feof(file) as your loop condition; feof won't return true until after you attempt to read past the end of the file, so your loop will execute once too often.
Second, you mentioned this:
fscanf(file,"%[A-z]",string);
It reads the first word fine but the file pointer keeps rewinding so it reads the first word over and over.
That's not quite what's happening; if the next character in the stream doesn't match the format specifier, scanf returns without having read anything, and string is unmodified.
Here's a simple, if inelegant, method: it reads one character at a time from the input file, checks to see if it's either an alpha or a digit, and if it is, adds it to a string.
#include <stdio.h>
#include <ctype.h>
int get_next_word(FILE *file, char *word, size_t wordSize)
{
size_t i = 0;
int c;
/**
* Skip over any non-alphanumeric characters
*/
while ((c = fgetc(file)) != EOF && !isalnum(c))
; // empty loop
if (c != EOF)
word[i++] = c;
/**
* Read up to the next non-alphanumeric character and
* store it to word
*/
while ((c = fgetc(file)) != EOF && i < (wordSize - 1) && isalnum(c))
{
word[i++] = c;
}
word[i] = 0;
return c != EOF;
}
int main(void)
{
char word[SIZE]; // where SIZE is large enough to handle expected inputs
FILE *file;
...
while (get_next_word(file, word, sizeof word))
// do something with word
...
}
I would use:
FILE *file;
char string[200];
while(fscanf(file, "%*[^A-Za-z]"), fscanf(file, "%199[a-zA-Z]", string) > 0) {
/* do something with string... */
}
This skips over non-letters and then reads a string of up to 199 letters. The only oddness is that if you have any 'words' that are longer than 199 letters they'll be split up into multiple words, but you need the limit to avoid a buffer overflow...
What are your delimiters? The second argument to strtok should be a string containing your delimiters, and the first should be a pointer to your string the first time round then NULL afterwards:
char * p = strtok(line, ","); // assuming a , delimiter
printf("%s\n", p);
while(p)
{
p = strtok(NULL, ",");
printf("%S\n", p);
}

Resources