I'm very new to C and I'm still learning the basics. I'm creating an application that reads in a text file and breaks down the words individually. My intention will be to count the amount of times each word occurs.
Anyway, the last do-while loop in the code below executes fine, and then crashes. This loop prints memory address to this word (pointer) and then prints the word. It accomplishes this fine, and then crashes on the last iteration. My intention is to push this memory address into a singly linked list, albeit once it's stopped crashing.
Also, just a quick mention regarding the array sizes below; I yet figured out how to set the correct size needed to hold the word character array etc because you must define the size before the array is filled, and I don't know how to do this. Hence why I've set them to 1024.
#include<stdio.h>
#include<string.h>
int main (int argc, char **argv) {
FILE * pFile;
int c;
int n = 0;
char *wp;
char wordArray[1024];
char delims[] = " "; // delims spaces in the word array.
char *result = NULL;
result = strtok(wordArray, delims);
char holder[1024];
pFile=fopen (argv[1],"r");
if (pFile == NULL) perror ("Error opening file");
else {
do {
c = fgetc (pFile);
wordArray[n] = c;
n++;
} while (c != EOF);
n = 0;
fclose (pFile);
do {
result = strtok(NULL, delims);
holder[n] = *result; // holder stores the value of 'result', which should be a word.
wp = &holder[n]; // wp points to the address of 'holder' which holds the 'result'.
n++;
printf("Pointer value = %d\n", wp); // Prints the address of holder.
printf("Result is \"%s\"\n", result); // Prints the 'result' which is a word from the array.
//sl_push_front(&wp); // Push address onto stack.
} while (result != NULL);
}
return 0;
}
Please ignore the bad program structure, as I mentioned, I'm new to this!
Thanks
As others have pointed out, your second loop attempts to dereference result before you check for it being NULL. Restructure your code as follows:
result = strtok( wordArray, delims ); // do this *after* you have read data into
// wordArray
while( result != NULL )
{
holder[n] = *result;
...
result = strtok( NULL, delims );
}
Although...
You're attempting to read the entire contents of the file into memory before breaking it up into words; that's not going to work for files bigger than the size of your buffer (currently 1K). If I may make a suggestion, change your code such that you're reading individual words as you go. Here's an example that breaks the input stream up into words delimited by whitespace (blanks, newlines, tabs, etc.) and punctuation (period, comma, etc.):
#include <stdio.h>
#include <ctype.h>
int main(int argc, char **argv)
{
char buffer[1024];
int c;
size_t n = 0;
FILE *input = stdin;
if( argc > 1 )
{
input = fopen( argv[1], "r");
if (!input)
input = stdin;
}
while(( c = fgetc(input)) != EOF )
{
if (isspace(c) || ispunct(c))
{
if (n > 0)
{
buffer[n] = 0;
printf("read word %s\n", buffer);
n = 0;
}
}
else
{
buffer[n++] = c;
}
}
if (n > 0)
{
buffer[n] = 0;
printf("read word %s\n", buffer);
}
fclose(input);
return 0;
}
No warranties express or implied (having pounded this out before 7:00 a.m.). But it should give you a flavor of how to parse a file as you go. If nothing else, it avoids using strtok, which is not the greatest of tools for parsing input. You should be able to adapt this general structure to your code. For best results, you should abstract that out into its own function:
int getNextWord(FILE *stream, char *buf, size_t bufsize)
{
int c;
size_t n = 0;
while(( c = fgetc(input)) != EOF && n < bufsize)
{
if (isspace(c) || ispunct(c))
{
if (n > 0)
{
buf[n] = 0;
n = 0;
}
}
else
{
buffer[n++] = c;
}
}
if (n > 0)
{
buffer[n] = 0;
printf("read word %s\n", buffer);
}
if (n == 0)
return 0;
else
return 1;
}
and you would call it like
void foo(void)
{
char word[SOME_SIZE];
...
while (getNextWord(inFile, word, sizeof word))
{
do_something_with(word);
}
...
}
If you expect in your do...while code, that result could be null (this is the condition for loop break), how do you think this code-line:
holder[n] = *result;
must work? It seems to me, that it is the reason for crashing in your program.
Change do while loop to while
use
while (condition)
{
}
instead of
do {
}while(condition)
It is crashing because you are trying to derefrance a NULL pointer result in do while loop.
I work mostly with Objective-C and was just looking at your question for fun, but I may have a solution.
Before setting n=0; after your first do-while loop, create another variable called totalWords and set it equal to n, totalWords can be declared anywhere within the file (except within one of the do-while loops), but can be defined at the top to the else block since its lifetime is short:
totalWords = n;
then you can set n back to zero:
n = 0;
Your conditional for the final do-while loop should then say:
...
} while (n <= ++totalWords);
The logic behind the application will thus say, count the words in the file (there are n words, which is the totalWords in the file). When program prints the results to the console, it will run the second do-while loop, which will run until n is one result past the value of totalWords (this ensures that you print the final word).
Alternately, it is better practice and clearer for other programmers to use a loop and a half:
do {
result = strtok(NULL, delims);
holder[n] = *result;
wp = &holder[n];
printf("Pointer value = %d\n", wp);
printf("Result is \"%s\"\n", result);
//sl_push_front(&wp); // Push address onto stack.
if (n == totalWords) break; // This forces the program to exit the do-while after we have printed the last word
n++; // We only need to increment if we have not reached the last word
// if our logic is bad, we will enter an infinite loop, which will tell us while testing that our logic is bad.
} while (true);
Related
I have a file that contains words and their synonyms each on a separate line.
I am writing this code that should read the file line by line then display it starting from the second word which is the synonym.
I used the variable count in the first loop in order to be able to count the number of synonyms of each word because the number of synonyms differs from one to another. Moreover I used the condition synonyms[i]==',' because each synonym is separate by a comma.
The purpose of me writing such code is to put them in a binary search tree in order to have a full dictionary.
The code doesn't contain any error yet it is not working.
I have tried to each the loop but that didn't work too.
Sample input from the file:
abruptly - dead, short, suddenly
acquittance - release
adder - common, vipera
Sample expected output:
dead short suddenly
acquittance realse
common vipera
Here is the code:
void LoadFile(FILE *fp){
int count;
int i;
char synonyms[50];
char word[50];
while(fgets(synonyms,50,fp)!=NULL){
for (i=0;i<strlen(synonyms);i++)
if (synonyms[i]==',' || synonyms[i]=='\n')
count++;
}
while(fscanf(fp,"%s",word)==1){
for(i=1;i<strlen(synonyms);i++){
( fscanf(fp,"%s",synonyms)==1);
printf("%s",synonyms);
}
}
}
int main(){
char fn[]="C:/Users/CLICK ONCE/Desktop/Semester 4/i2206/Project/Synonyms.txt";
FILE *fp;
fp=fopen(fn,"rt");
if (fp==NULL){
printf("Cannot open this file");
}
else{
LoadFile(fp);
}
return 0;
}
Here is my solution. I have split the work into functions for readability. The actual parsing is done in parsefunction. That function thakes into account hyphenated compound words such as seventy-two. The word and his synonyms must be separated by an hyphen preceded by at least one space.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
// Trim leading and trailing space characters.
// Warning: string is modified
char* trim(char* s) {
char* p = s;
int l = strlen(p);
while (isspace(p[l - 1])) p[--l] = 0;
while (*p && isspace(*p)) ++p, --l;
memmove(s, p, l + 1);
return s;
}
// Warning: string is modified
int parse(char* line)
{
char* token;
char* p;
char* word;
if (line == NULL) {
printf("Missing input line\n");
return 0;
}
// first find the word delimiter: an hyphen preceded by a space
p = line;
while (1) {
p = strchr(p, '-');
if (p == NULL) {
printf("Missing hypen\n");
return 0;
}
if ((p > line) && (p[-1] == ' ')) {
// We found an hyphen preceded by a space
*p = 0; // Replace by nul character (end of string)
break;
}
p++; // Skip hyphen inside hypheneted word
}
word = trim(line);
printf("%s ", word);
// Next find synonyms delimited by a coma
char delim[] = ", ";
token = strtok(p + 1, delim);
while (token != NULL) {
printf("%s ", token);
token = strtok(NULL, delim);
}
printf("\n");
return 1;
}
int LoadFile(FILE* fp)
{
if (fp == NULL) {
printf("File not open\n");
return 0;
}
int ret = 1;
char str[1024]; // Longest allowed line
while (fgets(str, sizeof(str), fp) != NULL) {
str[strcspn(str, "\r\n")] = 0; // Remove ending \n
ret &= parse(str);
}
return ret;
}
int main(int argc, char *argv[])
{
FILE* fp;
char* fn = "Synonyms.txt";
fp = fopen(fn, "rt");
if (fp == NULL) {
perror(fn);
return 1;
}
int ret = LoadFile(fp);
fclose(fp);
return ret;
}
I think the biggest conceptual misunderstanding demonstrated in the code is a failure to understand how fgets and fscanf work.
Consider the following lines of code:
while(fgets(synonyms,50,fp)!=NULL){
...
while(fscanf(fp,"%49s",word)==1){
for(i=1;i<strlen(synonyms);i++){
fscanf(fp,"%49s",synonyms);
printf("%s",synonyms);
}
}
}
The fgets reads one line of the input. (Unless there is an input line that is greater than 49 characters long (48 + a newline), in which case fgets will only read the first 49 characters. The code should check for that condition and handle it.) The next fscanf then reads a word from the next line of input. The first line is effectively being discarded! If the input is formatted as expected, the 2nd scanf will read a single - into synonyms. This makes strlen(synonyms) evaluate to 1, so the for loop terminates. The while scanf loop then reads another word, and since synonyms still contains a string of length 1, the for loop is never entered. while scanf then proceeds to read the rest of the file. The next call to fgets returns NULL (since the fscanf loop has read to the end of the file) so the while/fgets loop terminates after 1 iteration.
I believe the intention was for the scanfs inside the while/fgets to operate on the line read by fgets. To do that, all the fscanf calls should be replaced by sscanf.
When running the following C file, copying the character to fgetc to my tmp pointer results in unknown characters being copied over for some reason. The characters received from fgetc() are the expected characters. However, for some reason when assigning this character to my tmp pointer unknown characters get copied over.
I've tried looking for the reason why online, but haven't found any luck. From what I have read it could be something to do with UTF-8 and ASCII issues. However, I'm not sure about the fix. I'm a relatively new C programmer and still new to memory management.
Output:
TMP: Hello, DATA!�
TEXT: Hello, DATA!�
game.c:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <allegro5/allegro5.h>
#include <allegro5/allegro_font.h>
const int WIN_WIDTH = 1366;
const int WIN_HEIGHT = 768;
char *readFile(const char *fileName) {
FILE *file;
file = fopen(fileName, "r");
if (file == NULL) {
printf("File could not be opened for reading.\n");
}
size_t tmpSize = 1;
char *tmp = (char *)malloc(tmpSize);
if (tmp == NULL) {
printf("malloc() could not be called on tmp.\n");
}
for (int c = fgetc(file); c != EOF; c = fgetc(file)) {
if (c != NULL) {
if (tmpSize > 1)
tmp = (char *)realloc(tmp, tmpSize);
tmp[tmpSize - 1] = (char *)c;
tmpSize++;
}
}
tmp[tmpSize] = 0;
fclose(file);
printf("TMP: %s\n", tmp);
return tmp;
}
int main(int argc, char **argv) {
al_init();
al_install_keyboard();
ALLEGRO_TIMER* timer = al_create_timer (1.0 / 30.0);
ALLEGRO_EVENT_QUEUE *queue = al_create_event_queue();
ALLEGRO_DISPLAY* display = al_create_display(WIN_WIDTH, WIN_HEIGHT);
ALLEGRO_FONT* font = al_create_builtin_font();
al_register_event_source(queue, al_get_keyboard_event_source());
al_register_event_source(queue, al_get_display_event_source(display));
al_register_event_source(queue, al_get_timer_event_source(timer));
int redraw = 1;
ALLEGRO_EVENT event;
al_start_timer(timer);
char *text = readFile("game.DATA");
printf("TEXT: %s\n", text);
while (1) {
al_wait_for_event(queue, &event);
if (event.type == ALLEGRO_EVENT_TIMER)
redraw = 1;
else if ((event.type == ALLEGRO_EVENT_KEY_DOWN) || (event.type == ALLEGRO_EVENT_DISPLAY_CLOSE))
break;
if (redraw && al_is_event_queue_empty(queue)) {
al_clear_to_color(al_map_rgb(0, 0, 0));
al_draw_text(font, al_map_rgb(255, 255, 255), 0, 0, 0, text);
al_flip_display();
redraw = false;
}
}
free(text);
al_destroy_font(font);
al_destroy_display(display);
al_destroy_timer(timer);
al_destroy_event_queue(queue);
return 0;
}
game.DATA file:
Hello, DATA!
What I use to run the program:
gcc game.c -o game $(pkg-config allegro-5 allegro_font-5 --libs --cflags)
--EDIT--
I tried taking the file reading code and running it in a new c file, for some reason it works there, but not when in the game.c file with allegro code.
test.c:
#include <stdlib.h>
#include <stdio.h>
char *readFile(const char *fileName) {
FILE *file;
file = fopen(fileName, "r");
if (file == NULL) {
printf("File could not be opened for reading.\n");
}
size_t tmpSize = 1;
char *tmp = (char *)malloc(tmpSize);
if (tmp == NULL) {
printf("malloc() could not be called on tmp.\n");
}
for (int c = fgetc(file); c != EOF; c = fgetc(file)) {
if (c != NULL) {
if (tmpSize > 1)
tmp = (char *)realloc(tmp, tmpSize);
tmp[tmpSize - 1] = (char *)c;
tmpSize++;
}
}
tmp[tmpSize] = 0;
fclose(file);
printf("TMP: %s\n", tmp);
return tmp;
}
void main() {
char *text = readFile("game.DATA");
printf("TEXT: %s\n", text);
free(text);
return 0;
}
Produces the correct output always:
TMP: Hello, DATA!
TEXT: Hello, DATA!
When you write a loop that updates various things each time through, like you do with tmpSize in your loop here, it's important to have a handle on what the theoretical computer science types call your "loop invariants". That is, what is it that's true each time through the loop? It's important not only to maintain your loop invariants properly, but also to pick your loop invariants so that they're easy to maintain, and easy for a later reader to understand and to verify.
Since tmpSize starts out as 1, I'm guessing your loop invariant is trying to be, "tmpSize is always one more than the size of the string I've read so far". A reason for picking that slightly-strange loop invariant is, of course, that you'll need that extra byte for the terminating \0. The other clue is that you're setting tmp[tmpSize-1] = c;.
But here's the first problem. When we exit the loop, and if tmpSize is still one more than the size of the string you've read so far, let's see what happens. Suppose we read three characters. So tmpSize should be 4. So we'll set tmp[4] = 0;. But wait! Remember, arrays in C are 0-based. So the three characters we read are in tmp[0], tmp[1], and tmp[2], and we want the terminating \0 character to go into tmp[3], not tmp[4]. Something is wrong.
But actually, it's worse than that. I wasn't at all sure I understood the loop invariant, so I cheated, and inserted a few debugging printouts. Right before the realloc call, I added
printf("realloc %zu\n", tmpSize);
and at the end, right before the tmp[tmpSize] = 0; line, I added
printf("final %zu\n", tmpSize);
The last few lines it printed (while reading a game.DATA file containing "Hello, DATA!" just like yours) were:
...
realloc 10
realloc 11
realloc 12
final 13
But this is off by two! If the last reallocation gave the array a size of 12, the valid indices are from 0 to 11. But somehow we end up writing the \0 into cell 13.
It took me a while to figure it out, but the second problem is that you do the reallocation at the top of the loop, before you've incremented tmpLen.
To me, the loop invariant of "one more than the size of the string read so far" is just too hard to think about. I very much prefer to use a loop invariant where the "size" variable keeps track of the number of characters I have read, not +1 or -1 off of that. Let's see how that loop might look. (I've also cleaned up a few other things.)
size_t tmpSize = 0;
char *tmp = malloc(tmpSize+1);
if (tmp == NULL) {
printf("malloc() failed.\n");
exit(1);
}
for (int c = getc(file); c != EOF; c = getc(file)) {
printf("realloc %zu\n", tmpSize+1+1);
tmp = realloc(tmp, tmpSize+1+1); /* +1 for c, +1 for \0 */
if (tmp == NULL) {
printf("realloc() failed.\n");
exit(1);
}
tmp[tmpSize] = c;
tmpSize++;
}
printf("final %zu\n", tmpSize);
tmp[tmpSize] = '\0';
There's still something fishy here -- I said I didn't like "fudge factors" like +1, and here I've got two -- but at least now the debugging printouts go
...
realloc 11
realloc 12
realloc 13
final 12
so it looks like I'm not overrunning the allocated memory any more.
To make this even better, I want to take a slightly different approach. You're not supposed to worry abut efficiency at first, but I can tell you that a loop that calls realloc to make the buffer bigger by 1, each time it reads a character, can end up being really inefficient. So let's make a few more changes:
size_t nchAllocated = 0;
size_t nchRead = 0;
char *tmp = NULL;
for (int c = getc(file); c != EOF; c = getc(file)) {
if(nchAllocated <= nchRead) {
nchAllocated += 10;
printf("realloc %zu\n", nchAllocated);
tmp = realloc(tmp, nchAllocated);
if (tmp == NULL) {
printf("realloc() failed.\n");
exit(1);
}
}
tmp[nchRead++] = c;
}
printf("final %zu\n", nchRead);
tmp[nchRead] = '\0';
Now there are two separate variables: nchAllocated keeps track of exactly how many characters I've allocated, and nchRead keeps track of exactly how many characters I've read. And although I've doubled the number of "counter" variables, in doing so I've simplified a lot of other things, so I think it's a net improvement.
First of all, notice that there are no +1 fudge factors any more, at all.
Second, this loop doesn't call realloc every time -- instead it allocates characters 10 at a time. And because there are separate variables for the number of characters allocated versus read, it can keep track of the fact that it may have allocated more characters than it has read so far. For this code, the debugging printouts are:
realloc 10
realloc 20
final 12
Another little improvement is that we don't have to "preallocate" the array -- there's no initial malloc call. One of our loop invariants is that nchAllocated is the number of characters allocated, and we start this out as 0, and if there are no characters allocated, then it's okay that tmp starts out as NULL. This relies on the fact that when you call realloc for the first time, with tmp equal to NULL, realloc is fine with that, and essentially acts like malloc.
But there's one question you might be asking: If I got rid of all my fudge factors, where do we arrange to allocate one extra byte to hold the terminating \0 character? It's there, but it's subtle: it's lurking in the test
if(nchAllocated <= nchRead)
The very first time through the loop, nchAllocated will be 0, and nchRead will be 0, but this test will be true, so we'll allocate our first chunk of 10 characters, and we're off and running. (If we didn't care about the \0 character, the test nchAllocated < nchRead would have sufficed.)
...But, actually, I've made a mistake! There's a subtle bug here!
What if the file being read is empty? tmp will start out NULL, and we'll never make any trips through the loop, so tmp will remain NULL, so when we assign tmp[nchRead] = 0 it'll blow up.
And actually, it's worse than that. If you trace through the logic very carefully, you'll find that any time the file size is an exact multiple of 10, not enough space gets allocated for the \0, after all.
And this indicates a significant drawback of the "allocate characters 10 at a time" scheme. The code is now harder to test, because the control flow is different for files whose size is a multiple of 10. If you never happen to test that case, you won't realize that this program has a bug in it.
The way I usually fix this is to notice that the \0 byte I have to add to terminate the string is sort of balanced by the EOF character I read that indicated the end of the file. Maybe, when I read the EOF, I can use it to remind me to allocate space for the \0. That's actually easy enough to do, and it looks like this:
int c;
while(1) {
c = getc(file);
if(nchAllocated <= nchRead) {
nchAllocated += 10;
printf("realloc %zu\n", nchAllocated);
tmp = realloc(tmp, nchAllocated);
if (tmp == NULL) {
printf("realloc() failed.\n");
exit(1);
}
}
if(c == EOF)
break;
tmp[nchRead++] = c;
}
printf("final %zu\n", nchRead);
tmp[nchRead] = '\0';
The trick here is that we don't test for EOF until after we've checked that there's enough space in the buffer, and called realloc if necessary. It's as if we allocate space in the buffer for the EOF -- except then we use that space for the \0 instead. This is what I meant by "use it to remind me to allocate space for the \0".
Now, I have to admit that there's still a drawback here, in that the loop is now somewhat unconventional. A loop that has while(1) at the top looks like an infinite loop. This one has
if(c == EOF) break;
down in the middle of it, so it is literally a "break in the middle" loop. (This is by contrast to conventional for and while loops, which are "break at the top", or a do/while loop, which is "break at the bottom".) Personally, I find this to be a useful idiom, and I use it all the time. But some programmers, and perhaps your instructor, would frown on it, because it's "weird", it's "different", it's "unconventional". And to some extent they're right: unconventional programming is somewhat dangerous programming, and is bad if later maintenance programmers can't understand it because they don't recognize or don't understand the idioms in it. (It's sort of the programming equivalent of the English word "ain't", or a split infinitive.)
Finally, if you're still with me, I have one more point to make. (And if you are still with me, thank you. I realize this answer has gotten very long, but I hope you're learning something.)
Earlier I said that "a loop that calls realloc to make the buffer bigger by 1, each time it reads a character, can end up being really inefficient." It turns out that a loop that makes the buffer bigger by 10 isn't much better, and can still be significantly inefficient. You can do a little better by incrementing it by 50 or 100, but if you're dealing with input that might be really big (thousands of characters or more), you're usually better off increasing the buffer size by leaps and bounds, perhaps by multiplying it by some factor, rather than adding. So here's the final version of that part of the loop:
if(nchAllocated <= nchRead) {
if(nchAllocated == 0) nchAllocated = 10;
else nchAllocated *= 2;
printf("realloc %zu\n", nchAllocated);
tmp = realloc(tmp, nchAllocated);
And even this improvement -- multiplying by 2, rather than adding something -- comes with a cost: we need an extra test, to special-case the first trip through the loop, because nchAllocated started out as 0, and 0 × 2 = 0.
Your reallocation scheme is incorrect: the array is always too short by one byte and the null terminator is written one position past the end of the string, instead of at the end of the string. This causes an extra byte to be printed, with whatever value happens to be in memory in the block returned by realloc(), which is uninitialized.
It is less confusing to use tmpLen as the length of the string read si far and allocate 2 extra bytes for the newly read character and the null terminator.
Furthermore the test c != NULL makes no sense: c is byte and NULL is a pointer. Similarly, tmp[tmpSize - 1] = (char *)c; is incorrect: you should just write
tmp[tmpSize - 1] = c;
Here is a corrected version:
char *readFile(const char *fileName) {
FILE *file = fopen(fileName, "r");
if (file == NULL) {
printf("File could not be opened for reading.\n");
return NULL;
}
size_t tmpLen = 0;
char *tmp = (char *)malloc(tmpLen + 1);
if (tmp == NULL) {
printf("malloc() could not be called on tmp.\n");
fclose(file);
return NULL;
}
int c;
while ((c = fgetc(file)) != EOF) {
char *new_tmp = (char *)realloc(tmp, tmpLen + 2);
if (new_tmp == NULL) {
printf("realloc() failure for %zu bytes.\n", tmpLen + 2);
free(tmp);
fclose(file);
return NULL;
}
tmp = new_tmp;
tmp[tmpLen++] = c;
}
tmp[tmpLen] = '\0';
fclose(file);
printf("TMP: %s\n", tmp);
return tmp;
}
It is usually better to reallocate in chunks or with a geometric size increment. Here is a simple implementation:
char *readFile(const char *fileName) {
FILE *file = fopen(fileName, "r");
if (file == NULL) {
printf("File could not be opened for reading.\n");
return NULL;
}
size_t tmpLen = 0;
size_t tmpSize = 16;
char *tmp = (char *)malloc(tmpSize);
char *newTmp;
if (tmp == NULL) {
printf("malloc() could not be called on tmp.\n");
fclose(file);
return NULL;
}
int c;
while ((c = fgetc(file)) != EOF) {
if (tmpSize - tmpLen < 2) {
size_t newSize = tmpSize + tmpSize / 2;
newTmp = (char *)realloc(tmp, newSize);
if (newTmp == NULL) {
printf("realloc() failure for %zu bytes.\n", newSize);
free(tmp);
fclose(file);
return NULL;
}
tmpSize = newSize;
tmp = newTmp;
}
tmp[tmpLen++] = c;
}
tmp[tmpLen] = '\0';
fclose(file);
printf("TMP: %s\n", tmp);
// try to shrink allocated block to the minimum size
// if realloc() fails, return the current block
// it seems impossible for this reallocation to fail
// but the C Standard allows it.
newTmp = (char *)realloc(tmp, tmpLen + 1);
return newTmp ? newTmp : tmp;
}
I am trying to read in a file that contains digits operated by commas and store them in an array without the commas present.
For example: processes.txt contains
0,1,3
1,0,5
2,9,8
3,10,6
And an array called numbers should look like:
0 1 3 1 0 5 2 9 8 3 10 6
The code I had so far is:
FILE *fp1;
char c; //declaration of characters
fp1=fopen(argv[1],"r"); //opening the file
int list[300];
c=fgetc(fp1); //taking character from fp1 pointer or file
int i=0,number,num=0;
while(c!=EOF){ //iterate until end of file
if (isdigit(c)){ //if it is digit
sscanf(&c,"%d",&number); //changing character to number (c)
num=(num*10)+number;
}
else if (c==',' || c=='\n') { //if it is new line or ,then it will store the number in list
list[i]=num;
num=0;
i++;
}
c=fgetc(fp1);
}
But this is having problems if it is a double digit. Does anyone have a better solution? Thank you!
For the data shown with no space before the commas, you could simply use:
while (fscanf(fp1, "%d,", &num) == 1 && i < 300)
list[i++] = num;
This will read the comma after the number if there is one, silently ignoring when there isn't one. If there might be white space before the commas in the data, add a blank before the comma in the format string. The test on i prevents you writing outside the bounds of the list array. The ++ operator comes into its own here.
First, fgetc returns an int, so c needs to be an int.
Other than that, I would use a slightly different approach. I admit that it is slightly overcomplicated. However, this approach may be usable if you have several different types of fields that requires different actions, like a parser. For your specific problem, I recommend Johathan Leffler's answer.
int c=fgetc(f);
while(c!=EOF && i<300) {
if(isdigit(c)) {
fseek(f, -1, SEEK_CUR);
if(fscanf(f, "%d", &list[i++]) != 1) {
// Handle error
}
}
c=fgetc(f);
}
Here I don't care about commas and newlines. I take ANYTHING other than a digit as a separator. What I do is basically this:
read next byte
if byte is digit:
back one byte in the file
read number, irregardless of length
else continue
The added condition i<300 is for security reasons. If you really want to check that nothing else than commas and newlines (I did not get the impression that you found that important) you could easily just add an else if (c == ... to handle the error.
Note that you should always check the return value for functions like sscanf, fscanf, scanf etc. Actually, you should also do that for fseek. In this situation it's not as important since this code is very unlikely to fail for that reason, so I left it out for readability. But in production code you SHOULD check it.
My solution is to read the whole line first and then parse it with strtok_r with comma as a delimiter. If you want portable code you should use strtok instead.
A naive implementation of readline would be something like this:
static char *readline(FILE *file)
{
char *line = malloc(sizeof(char));
int index = 0;
int c = fgetc(file);
if (c == EOF) {
free(line);
return NULL;
}
while (c != EOF && c != '\n') {
line[index++] = c;
char *l = realloc(line, (index + 1) * sizeof(char));
if (l == NULL) {
free(line);
return NULL;
}
line = l;
c = fgetc(file);
}
line[index] = '\0';
return line;
}
Then you just need to parse the whole line with strtok_r, so you would end with something like this:
int main(int argc, char **argv)
{
FILE *file = fopen(argv[1], "re");
int list[300];
if (file == NULL) {
return 1;
}
char *line;
int numc = 0;
while((line = readline(file)) != NULL) {
char *saveptr;
// Get the first token
char *tok = strtok_r(line, ",", &saveptr);
// Now start parsing the whole line
while (tok != NULL) {
// Convert the token to a long if possible
long num = strtol(tok, NULL, 0);
if (errno != 0) {
// Handle no value conversion
// ...
// ...
}
list[numc++] = (int) num;
// Get next token
tok = strtok_r(NULL, ",", &saveptr);
}
free(line);
}
fclose(file);
return 0;
}
And for printing the whole list just use a for loop:
for (int i = 0; i < numc; i++) {
printf("%d ", list[i]);
}
printf("\n");
I need to parse a string in C by removing all non-alphabetic characters from it. To do this I am checking the ascii value of every char and making sure its within the correct bounds. It works just the way I want it to, so that's not the problem. What I am having trouble with, however, is storing the resulting strings after the parse is completed. (I am 3 weeks into C by the way) Also if you notice that I used weird sizes for the arrays, that's because I purposely made them bigger than they needed to be.
char * carry[2]; // This is to simulate argv
carry[1] = "hello1whats2up1"; // 0 is title so I placed at 1
char array[strlen(carry[1])]; // char array of string length
strcpy(array, carry[1]); // copied string to char array
char temp[strlen(carry[1]) + 1]; // Reusable char array
char * finalAnswer[10];
int m = 0, x = 0; // Indexes
if ((sizeof(carry))/8 > 1) { // We were given arguments
printf("Array: %lu\n\n", sizeof(array));
for (int i = 0; i < sizeof(array); i++)
{
if(isalpha(array[i])) { // A-Z & a-z
//printf("%s\n", temp);
temp[x] = array[i]; // Placing chars in temp array
x++;
}
else {
printf("String Length: %lu \nString Name: %s \nWord Index: %d \n\n",
strlen(temp), temp, m); // Testing Purposes
strcpy(finalAnswer[m], temp); // Copies temp into the final answer *** Source of Error
for(int w = 0; w < sizeof(temp); w++) { temp[w] = '\0'; } // Clears temp
x = 0;
m++;
}
}
printf("String Length: %lu \nString Name: %s \nWord Index: %d \n",
strlen(temp), temp, m); // Testing Purposes
strcpy(finalAnswer[m], temp);
for(int w = 0; w < sizeof(temp); w++) { temp[w] = '\0'; } // Clears temp
x = 0;
}
else { printf("No Arguments Given\n"); }
printf("\n");
** Edit
The error I keep getting is when I try copying temp to finalAnswer
** Edit 2
I solved the problem I was having with char * finalAnswer[10]
When I was trying to use strcpy on finalAnswer, I never specified the size that was needed to store the particular string. Works fine after I did it.
Since you have solved the actual string parsing, your last comment, I shall take as the actual requirement.
"... I want to create a list of words with varying length that can be accessed by index ..."
That is certainly not a task to be solved easily if one is "three weeks into C". Data structure that represents that is what main() second argument is:
// array (of unknown size)
// of pointers to char
char * argv[] ;
This can be written as an pointer to pointer:
// same data structure as char * []
char ** list_of_words ;
And this is pushing you straight into the deep waters of C. An non trivial C data structure. As a such it might require a bit more than four weeks of C.
But we can be creative. There is "inbuilt in C" one non trivial data structure we might use. A file.
We can write the words into the file. One word one line. And that is our output: list of words, separated by new line character, stored in a file.
We can even imagine and write a function that will read the word from that result "by index". As you (it seems) need.
// hint: there is a FILE * behind
int words_count = result_size () ;
const char * word = result_get_word(3) ;
Now, I have boldly gone ahead and have written "all" of it, beside that last "crucial" part. After all, I am sure you would like to contribute too.
So the working code (minus the result_size) and result_get_word() ) is alive and kicking here: https://wandbox.org/permlink/uLpAplNl6A3fgVGw
To avoid the "Wrath of Khan" I have also pasted it here:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <ctype.h>
/*
task: remove all non alpha chars from a given string, store the result
*/
int process_and_save (FILE *, const char *) ;
int dump_result(FILE *) ;
int main( const int argc, const char * argv[] )
{
const char * filename = "words.txt";
const char * to_parse = "0abra123ka456dabra789" ;
(void)(&argc) ; (void)argv ; // pacify the compiler warnings
printf("\nInput: %s", to_parse ) ;
int retval = process_and_save(fopen(filename, "w"), to_parse ) ;
if ( EXIT_FAILURE != retval )
{
printf("\n\nOutput:\n") ;
retval = dump_result(fopen(filename, "r"));
}
return retval ;
}
int process_and_save (FILE * fp, const char * input )
{
if(!fp) {
perror("File opening failed");
return EXIT_FAILURE;
}
//
char * walker = (char *)(input) ;
while ( walker++ )
{
if ( ! *walker ) break ;
if ( isalpha(*walker) ) {
fprintf( fp, "%c", *walker ) ;
// I am alpha but next one is not
// so write word end, next
if ( ! isalpha(*(walker +1) ) )
fprintf( fp, "\n" ) ;
}
}
fclose(fp);
return EXIT_SUCCESS;
}
int dump_result(FILE* fp )
{
if(!fp) {
perror("\nFile opening failed");
return EXIT_FAILURE;
}
int c; while ((c = fgetc(fp)) != EOF) { putchar(c); }
if (ferror(fp))
puts("\nI/O error when reading");
fclose(fp);
return EXIT_SUCCESS;
}
I think this is functional and does the job of parsing and storing the result. Not in the complex data structure but in the simple file. The rest should be easy. If need help please do let me know.
Let's say I've got the file
5f2
3f6
2f1
And the code:(The printf should print the second numbers (i.e 2,6, and 1) but it doesn't
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>
int main (int argc, char * argv[])
{
FILE *ptr;
char str[100];
char * token;
int a, b, i;
int arr[4];
if(argc > 1)
{
ptr = fopen(argv[1],"r");
if(ptr == NULL)
{
exit(1);
}
}
else
{
exit(1);
}
//And I'm looking to parse the numbers between the "f" so..
while(fgets(str,100,ptr) != NULL)
{
token = strstr(str,"f");
if(token != NULL)
{
a = atol(str); // first number
b = atol(token+1); // second number
arr[i] = b; // store each b value (3 of em) into this array
}
i++;
printf("Values are %d\n",arr[i]); //should print 2,6 and 1
}
}
I've tried to move the printf outside the loop, but that seems to print an even weirder result, I've seen posts about storing integers from a file into an array before, however since this involves using strstr, I'm not exactly sure the procedure is the same.
int i,j=0;
while(fgets(str,sizeof(str),file) != NULL)
{
size_t n = strlen(str);
if(n>0 && str[n-1] == '\n')
str[n-1] = '\0';
i = str[strlen(str)-1] - '0'; /* Convert the character to int */
printf("%d\n",i);// Or save it to your int array arr[j++] = i;
}
Just move to the last character as shown and print it out as integer.
PS: fgets() comes with a newline character you need to suppress it as shown
You are never initializing i, then you are reading into arr[i] (which just happens to not crash right there), then increment i (to "undefined value + 1"), then print arr[i] -- i.e., you are writing to and reading from uninitialized memory.
Besides, your FILE * is ptr, not file. And you should get into the habit of using strtol() instead of atol(), because the former allows you to properly check for success (and recover from error).