I have a program that reads in a file, goes through line by line, and breaks up the line word by word. My problem is, I am about to store each word in a array but I need to use strcmp function to verify the word doesnt already exist. Anyways below is my code and my question is, why is my program printing out 1 so many times? I was expecting it to only print it out twice because this occurs twice in my text file.
while (fgets(line, sizeof(line), fi) != NULL) { // looping through each line
line_count += 1;
for (j = 0; j < sizeof(line); j++) { // convert all words to lowercase
line[j] = tolower(line[j]);
}
result = strtok(line, delimiters);
while (result != NULL) {
word_count += 1;
if (strcmp(result, "this")) {
printf("1\n");
}
result = strtok(NULL, delimiters); // get the next token
}
}
Below is my text file:
This is the first test.
This is the second test.
strcmp() returns 0 if the string matches. You're checking for a truthy value. You really want strcmp(result, "this") == 0.
You will also need to make the match case insensitive, which is usually called stricmp().
Do you try again after change "strcmp(result, "this")" to "strcmp(result, "This")" ?
Related
I am trying to read in a file that contains digits operated by commas and store them in an array without the commas present.
For example: processes.txt contains
0,1,3
1,0,5
2,9,8
3,10,6
And an array called numbers should look like:
0 1 3 1 0 5 2 9 8 3 10 6
The code I had so far is:
FILE *fp1;
char c; //declaration of characters
fp1=fopen(argv[1],"r"); //opening the file
int list[300];
c=fgetc(fp1); //taking character from fp1 pointer or file
int i=0,number,num=0;
while(c!=EOF){ //iterate until end of file
if (isdigit(c)){ //if it is digit
sscanf(&c,"%d",&number); //changing character to number (c)
num=(num*10)+number;
}
else if (c==',' || c=='\n') { //if it is new line or ,then it will store the number in list
list[i]=num;
num=0;
i++;
}
c=fgetc(fp1);
}
But this is having problems if it is a double digit. Does anyone have a better solution? Thank you!
For the data shown with no space before the commas, you could simply use:
while (fscanf(fp1, "%d,", &num) == 1 && i < 300)
list[i++] = num;
This will read the comma after the number if there is one, silently ignoring when there isn't one. If there might be white space before the commas in the data, add a blank before the comma in the format string. The test on i prevents you writing outside the bounds of the list array. The ++ operator comes into its own here.
First, fgetc returns an int, so c needs to be an int.
Other than that, I would use a slightly different approach. I admit that it is slightly overcomplicated. However, this approach may be usable if you have several different types of fields that requires different actions, like a parser. For your specific problem, I recommend Johathan Leffler's answer.
int c=fgetc(f);
while(c!=EOF && i<300) {
if(isdigit(c)) {
fseek(f, -1, SEEK_CUR);
if(fscanf(f, "%d", &list[i++]) != 1) {
// Handle error
}
}
c=fgetc(f);
}
Here I don't care about commas and newlines. I take ANYTHING other than a digit as a separator. What I do is basically this:
read next byte
if byte is digit:
back one byte in the file
read number, irregardless of length
else continue
The added condition i<300 is for security reasons. If you really want to check that nothing else than commas and newlines (I did not get the impression that you found that important) you could easily just add an else if (c == ... to handle the error.
Note that you should always check the return value for functions like sscanf, fscanf, scanf etc. Actually, you should also do that for fseek. In this situation it's not as important since this code is very unlikely to fail for that reason, so I left it out for readability. But in production code you SHOULD check it.
My solution is to read the whole line first and then parse it with strtok_r with comma as a delimiter. If you want portable code you should use strtok instead.
A naive implementation of readline would be something like this:
static char *readline(FILE *file)
{
char *line = malloc(sizeof(char));
int index = 0;
int c = fgetc(file);
if (c == EOF) {
free(line);
return NULL;
}
while (c != EOF && c != '\n') {
line[index++] = c;
char *l = realloc(line, (index + 1) * sizeof(char));
if (l == NULL) {
free(line);
return NULL;
}
line = l;
c = fgetc(file);
}
line[index] = '\0';
return line;
}
Then you just need to parse the whole line with strtok_r, so you would end with something like this:
int main(int argc, char **argv)
{
FILE *file = fopen(argv[1], "re");
int list[300];
if (file == NULL) {
return 1;
}
char *line;
int numc = 0;
while((line = readline(file)) != NULL) {
char *saveptr;
// Get the first token
char *tok = strtok_r(line, ",", &saveptr);
// Now start parsing the whole line
while (tok != NULL) {
// Convert the token to a long if possible
long num = strtol(tok, NULL, 0);
if (errno != 0) {
// Handle no value conversion
// ...
// ...
}
list[numc++] = (int) num;
// Get next token
tok = strtok_r(NULL, ",", &saveptr);
}
free(line);
}
fclose(file);
return 0;
}
And for printing the whole list just use a for loop:
for (int i = 0; i < numc; i++) {
printf("%d ", list[i]);
}
printf("\n");
I am trying to write a C program that can filter through lines. It is supposed to print only one line when there are consecutive duplicate lines. I have to use arrays of chars to compare the lines. The size of the arrays are inconsequential (set at 79 chars for the project). I have initialized the arrays as such:
char newArray [MAXCHARS];
char oldArray [MAXCHARS];
and have filled the array by using this for loop, to check for newlines and the end of file:
for(i = 0; i<MAXCHARS;i++){
if((newChar = getc(ifp)) != EOF){
if(newChar != '/n'){
oldArray[i] = newChar;
oldCount++;
}
else if(newChar == '/n'){
oldArray[i] = newChar;
oldCount++;
break;
}
}
else{
endOf = true;
break;
}
}
To cycle through the next line(s) and search for duplicates, I am using a while loop that is initially set to true. It fills the next array up to the newline and tests for EOF as well. Then, I use two for loops to test the arrays. If they are the same at each position in the arrays, duplicate remains unchanged and nothing is printed. If they are not the same, duplicate is set to false and a function (testArrays) is called to print the contents of each array.
while(duplicate){
newCount = 0;
/* fill second array, test for newlines and EOF*/
for(i =0; i< MAXCHARS; i++){
if((newChar = getc(ifp)) != EOF){
if(newChar != '/n'){
newArray[i] = newChar;
newCount++;
}
else if(newChar == '/n'){
newArray[i] = newChar;
newCount++;
break;
}
}
else{
endOf = true;
break;
}
}
/* test arrays against each other to spot duplicate lines*
if they are duplicates, continue the while loop getting new
arrays of characters in newArray until these tests fail*/
for(i =0; i< oldCount; i++){
if(oldArray[i] == newArray[i]){
continue;
}
else{
duplicate = false;
break;
}
}
for(i =0; i <newCount; i++){
if(oldArray[i] == newArray[i]){
continue;
}
else{
duplicate = false;
break;
}
}
if(endOf && duplicate){
testArray(oldArray);
break;
}
}
if((endOf && !duplicate) || (!endOf && !duplicate)){
testArray(oldArray);
testArray(newArray);
}
I find that this does not work and consecutive identical lines are being printed anyways. I cannot figure out how this could be happening. I know this is a lot of code to wade through but it is pretty straight forward and I think that another set of eyes on this will spot the problem easily. Thanks for the help.
is there a reason why you read a character at a time and instead of calling fgets() to read a line?
char instr[MAXCHARS];
for( iline = 0; ( fgets( instr, 256, ifp ) ); iline++ ) {
. . .<strcmp() current line to previous line here>. . .
}
EDIT:
You might want to declare 2 character strings and 3 char pointers -- one point to the current line and the other to the previous line. Then swap the two pointers using the third pointer.
You need to use a function to read lines — either fgets() or one you write (or POSIX getline() if you are familiar with dynamic memory allocation).
You then need to use an algorithm equivalent to:
Read first line into old.
If there is no line (EOF), stop.
Print the first line.
For every extra line read into new.
If there is no line (EOF), stop.
If new is the same as old, go to step 4.
Print new.
Copy new to old.
Go to step 4.
Those 'go to' steps would be part of normal loop controls, not actual goto statements.
I would do it by strings instead of char by char. I would use gets() to get the full input line and strcmp it to the previous string. You can also use fgets(str, MAX_CHARS, stdin) if you want. strcmp assumes your strings are nul terminated and you may need special EOF handling but something like whats below should work:
int main(){
char newStr[MAX_CHARS] = {0}; //string for new input
char oldStr[MAX_CHARS] = {0};
// Loop over input as long as there is something to read
while(gets(newStr) != NULL){
if(strcmp(newStr,oldStr) != 0){
printf("%s", newStr);
}
else{
//This is the case when you have duplicate strings. Dont print
}
memset(oldStr, 0, sizeof(oldStr)); //clear out old string incase it was longer
strcpy(oldStr, newStr); //copy new string into old string for future compare
}
}
At the part where you tested for duplicate, maybe you could test if oldCount == newCount first? My reasoning is that, if it is a duplicate line, oldCount will be equals to newCount. If it’s true, then proceed to check against the two array?
In the code, when I print var_value, it shows its content but when I need to assign it later on the if else statements, it's empty only IN THE LAST IF and I have no idea why it is. If I delete the last statement, the other three pass without problems.
while ((read = getline(&line, &len, f)) != -1){
printf("%s\n", line);
char *token;
token = strtok(line, "=");
var_name = token;
/* Separate every line by the '=' character */
while( token != NULL ) {
var_value = token;
token = strtok(NULL, "=");
}
printf("%s\n", var_name);
printf("%s\n", var_value);
// Obtain the parameters
if (strcmp(var_name, "puerto") == 0) {
puerto = atoi(var_value);
parameters_count += 1;
} else if (strcmp(var_name, "tamano_tabla") == 0) {
tamano_tabla = atoi(var_value);
parameters_count += 1;
} else if (strcmp(var_name, "periodo_archivo") == 0) {
periodo_archivo = atoi(var_value);
parameters_count += 1;
} else if (strcmp(var_name, "archivo_tabla") == 0) {
printf("%s var val\n", var_value);
strcpy(archivo_tabla, strtok(var_value, "\n")); //Remove \n and copy to destination variable
parameters_count += 1;
printf("%s filetabla\n", archivo_tabla);
}
}
Edit: Results in console and after the final one, segmentation fault
puerto=1212
puerto
1212
archivo_tabla=tabla.xml
archivo_tabla
tabla.xml
tabla.xml
var val
Your output does not support your contention that
In the code, when I print var_value, it shows its content but when I
need to assign it later on the if else statements, it's empty only IN
THE LAST IF [...].
In fact, your output shows the opposite: var_val is printed just fine. I have to assume that you are being confused by the fact that its value ends with a newline, which is printed, too. Thus, " var val" appears at the beginning of a line. Here's the expected value of var_val, including newline, followed by " var val":
tabla.xml
var val
The presence of a newline in the string provided by getline() is the whole point of the strtok(var_value, "\n") that happens next, after all. Or so I assume.
Note also that although the output you present appears to be truncated relative to the code you present, in my tests, the contents of var_val are successfully copied to variable archivo_tabla, too, less that pesky newline.
This line is suspect: strcpy(archivo_tabla, strtok(var_value, "\n")); //Remove \n and copy to destination variable
strtok mutates var_value. You seem to be copying archivo_tabla to the remaining part of var_value after the "\n" (which doesn't really make sense)
https://www.tutorialspoint.com/c_standard_library/c_function_strtok.htm
I am new to C and pointers, so it is still confusing as hell! Below is the code of a function with the main purpose of finding how many times a word appears on a text file. Any help will be appreciated!
void count_occurrences (int n, FILE *file, Entry *entries) {
file = fopen("test/flicka.txt", "r");
if (file != NULL) {
char buff[LINE_MAX_CHARS];
int i = 0;
char * haystack = fgets(buff, 1000, file);
char * needle = NULL;
char * p = NULL;
while (haystack != NULL) {
for (i; i < n; i++) {
needle = entries[i].string;
while ( (p = strstr(haystack, needle)) != NULL) {
entries[i].count++;
p++;
}
}
haystack = fgets(buff, 1000, file);
i = 0;
}
fclose(file);
}
else {
printf("File not found!\n");
}
}
The problem with an exercise like this is that the best way of solving the specific problem - a character-based state machine attached to the stream - doesn't scale up to larger problems.
To do it first way, you maintain a "parse position" which is initially zero. You then call fgetc() in a loop until data runs out and you get EOF. If the character matches the character at the parse position, increment the parse position, if the parse position goes to the end of the string, you have a match, so increment the count. If it doesn't, reset the parse position to zero or one depending on whether the first character matches.
The first way is fast and easy, but inflexible.
A more scaleable way is on line-based input. Call fgets with a big buffer if you know lines must be short, or build a "getline" if lines are unbounded. Then call strstr on the line to see if you have a match. If you have a match, you need to increment the pointer and check for another.
The scaleable way separates the parse from the IO and allows you to search for multiple patterns. Pseudo-code
while(line = getline() )
{
N += countwords(line, "myword");
}
int countwords(line, word)
{
ptr = line;
while(strstr(ptr, word))
{
ptr = strstr(ptr, word) + strlen(word); // replace strlen with 1 to allow overlaps
answer++;
}
}
Obviously you now need to modify the main loop to search for several words, keeping an array of Ns and calling repeated with each word. But it scales up to any sort of pattern matching.
In my current code, I am trying to pass in a file path, separate the file path by "/" , and then checking through each delimited value against every element in an array named categories using a for loop. If any element in the array matches as a substring using strstr(), it will return the value of the element, if not it will return "Others".
I managed to produce the output that I want, however I realise that whenever there is an even number of elements in categories array, the output will be 1 of the value from the file path such as
If the file path is
"/a/b/Lib/Contact/c"
If the categories are
char *categories[] = {"Library","Applications","Contact","dsdfd"};
the output will be b.
Here is my code for your reference:
const char *categorize(char*path)
{
int i = 0;
char str[1024];
char *token;
char *delim = "/";
char *categories[] = {"Library","Applications","Contact","dsdfd"};
bool check = false;
strcpy(str,path);
token = strtok(str,delim);
while (token !=NULL)
{
if(check == true)
break;
//printf("%s\n",token);
for (i = 0; i <(sizeof(categories)/sizeof(char*)); i++)
{
if(categories[i] == NULL)
break;
if(strstr(token, categories[i]))
{
check = true;
break;
}
}
token = strtok (NULL, delim);
}
if (check == true)
return "Others";
else
return categories[i];
}
Edit: I have encountered another problem. I tried using a test case if the file path does not contain any substring. It should return "Others", however it is returning me "/".
Edit2: I have sort of solved the problem by changing the if statement to check == true. However, I realise that my code here is not good which #LưuVĩnhPhúc and #fasked mentioned in the comments and I would hope to possibly fixed it.
Change your code to
for (i = 0; i <(sizeof(categories)/sizeof(char*)); i++)
sizeof(categories) will give your 16 bytes (size of 4 char*) you will have to devide that with each char* size to get number of elements in categories;