C strstr not working correctly - c

I am trying to use strstr to search for any matches using a substring and comparing it to a line of text but haven't been successful in getting a match so far. I am opening and reading a file using popen while trying to do a search with only the ipv4 address looking for matches. What is wrong with my code? Any help will be much appreciated, thanks.
char *buffe = malloc(sizeof(char));
FILE *rib_file = popen("bgpdump -Mv rib.20160101.0000.bz2", "r");
while(fgets(buffe, sizeof(buffe), rib_file)!=NULL) {
if(buffe[strlen(buffe)-1] == '\n')
buffe[strlen(buffe)-1] = '\0';
for (int i = 0; i < argc; i++) {
if((strstr(buffe, argv[i])) != NULL) {
printf("A match found on line: %d\n", line_num);
printf("\n%s\n", buffe);
find_result++;
}
line_num++;
}
}
if(find_result == 0) {
printf("\nSorry, couldn't find a match.\n");
}
if (rib_file) {
pclose(rib_file);
}
free(buffe);
}
an example of what buffe is like:
TABLE_DUMP2|01/01/16 00:00:38|B|202.232.0.3|2497|223.255.254.0/24|2497 7473 3758 55415|IGP
I am trying to print the exact line as above when my code finds a match.

The sizeof function tells you the size of a type. Since buffe is a pointer, your call to sizeof(buffe) will get you the size of a pointer on your platform which is almost certainly not what you want. Did you mean to pass fgets the size of the thing buffe points to?

Related

parsing a file while reading in c

I am trying to read each line of a file and store binary values into appropriate variables.
I can see that there are many many other examples of people doing similar things and I have spent two days testing out different approaches that I found but still having difficulties getting my version to work as needed.
I have a txt file with the following format:
in = 00000000000, out = 0000000000000000
in = 00000000001, out = 0000000000001111
in = 00000000010, out = 0000000000110011
......
I'm attempting to use fscanf to consume the unwanted characters "in = ", "," and "out = "
and keep only the characters that represent binary values.
My goal is to store the first column of binary values, the "in" values into one variable
and the second column of binary values, the "out" value into another buffer variable.
I have managed to get fscanf to consume the "in" and "out" characters but I have not been
able to figure out how to get it to consume the "," "=" characters. Additionally, I thought that fscanf should consume the white space but it doesn't appear to be doing that either.
I can't seem to find any comprehensive list of available directives for scanners, other than the generic "%d, %s, %c....." and it seems that I need a more complex combination of directives to filter out the characters that I'm trying to ignore than I know how to format.
I could use some help with figuring this out. I would appreciate any guidance you could
provide to help me understand how to properly filter out "in = " and ", out = " and how to store
the two columns of binary characters into two separate variables.
Here is the code I am working with at the moment. I have tried other iterations of this code using fgetc() in combination with fscanf() without success.
int main()
{
FILE * f = fopen("hamming_demo.txt","r");
char buffer[100];
rewind(f);
while((fscanf(f, "%s", buffer)) != EOF) {
fscanf(f,"%[^a-z]""[^,]", buffer);
printf("%s\n", buffer);
}
printf("\n");
return 0;
}
The outputs from my code appear as follows:
= 00000000000,
= 0000000000000000
= 00000000001,
= 0000000000001111
= 00000000010,
= 0000000000110011
Thank you for your time.
The scanf family function is said to be a poor man'parser because it is not very tolerant to input errors. But if you are sure of the format of the input data it allows for simple code. The only magic here if that a space in the format string will gather all blank characters including new lines or none. Your code could become:
int main()
{
FILE * f = fopen("hamming_demo.txt", "r");
if (NULL == f) { // always test open
perror("Unable to open input file");
return 1;
}
char in[50], out[50]; // directly get in and out
// BEWARE: xscanf returns the number of converted elements and never EOF
while (fscanf(f, " in = %[01], out = %[01]", in, out) == 2) {
printf("%s - %s\n", in, out);
}
printf("\n");
return 0;
}
So basically you want to filter '0' and '1'? In this case fgets and a simple loop will be enough: just count the number of 0's and 1's and null-terminate the string at the end:
#include <stdio.h>
int main(void)
{
char str[50];
char *ptr;
// Replace stdin with your file
while ((ptr = fgets(str, sizeof str, stdin)))
{
int count = 0;
while (*ptr != '\0')
{
if ((*ptr >= '0') && (*ptr <= '1'))
{
str[count++] = *ptr;
}
ptr++;
}
str[count] = '\0';
puts(str);
}
}

I have a 'Segmentation Problem' while printing parsed parts of a String

I am writing a simple Shell for school assignment and stuck with a segmentation problem. Initially, my shell parses the user input to remove whitespaces and endofline character, and seperate the words inside the input line to store them in a char **args array. I can seperate the words and can print them without any problem, but when storing the words into a char **args array, and if argument number is greater than 1 and is odd, I get a segmentation error.
I know the problem is absurd, but I stuck with it. Please help me.
This is my parser code and the problem occurs in it:
char **parseInput(char *input){
int idx = 0;
char **parsed = NULL;
int parsed_idx = 0;
while(input[idx]){
if(input[idx] == '\n'){
break;
}
else if(input[idx] == ' '){
idx++;
}
else{
char *word = (char*) malloc(sizeof(char*));
int widx = 0; // Word index
word[widx] = input[idx];
idx++;
widx++;
while(input[idx] && input[idx] != '\n' && input[idx] != ' '){
word = (char*)realloc(word, (widx+1)*sizeof(char*));
word[widx] = input[idx];
idx++;
widx++;
}
word = (char*)realloc(word, (widx+1)*sizeof(char*));
word[widx] = '\0';
printf("Word[%d] --> %s\n", parsed_idx, word);
if(parsed == NULL){
parsed = (char**) malloc(sizeof(char**));
parsed[parsed_idx] = word;
parsed_idx++;
}else{
parsed = (char**) realloc(parsed, (parsed_idx+1)*sizeof(char**));
parsed[parsed_idx] = word;
parsed_idx++;
}
}
}
int i = 0;
while(parsed[i] != NULL){
printf("Parsed[%d] --> %s\n", i, parsed[i]);
i++;
}
return parsed;
}
In your code you have the loop
while(parsed[i] != NULL) { ... }
The problem is that the code never sets any elements of parsed to be a NULL pointer.
That means the loop will go out of bounds, and you will have undefined behavior.
You need to explicitly set the last element of parsed to be a NULL pointer after you parsed the input:
while(input[idx]){
// ...
}
parsed[parsed_idx] = NULL;
On another couple of notes:
Don't assign back to the same pointer you pass to realloc. If realloc fails it will return a NULL pointer, but not free the old memory. If you assign back to the pointer you will loose it and have a memory leak. You also need to be able to handle this case where realloc fails.
A loop like
int i = 0;
while (parsed[i] != NULL)
{
// ...
i++;
}
is almost exactly the same as
for (int i = 0; parsed[i] != NULL; i++)
{
// ...
}
Please use a for loop instead, it's usually easier to read and follow. Also for a for loop the "index" variable (i in your code) will be in a separate scope, and not available outside of the loop. Tighter scope for variables leads to less possible problems.
In C you shouldn't really cast the result of malloc (or realloc) (or really any function returning void *). If you forget to #include <stdlib.h> it could lead to hard to diagnose problems.
Also, a beginner might find the -pedantic switch helpful on your call to the compiler. That switch would have pointed up most of the other suggestions made here. I personally am also a fan of -Wall, though many find it annoying instead of helpful.

Read in one word from a file and comparing with a regular expression

I am building a program that is suppose to look for words from a file that has 2 vowels in a row and ends with either ly or ing. Im currently having some issues with how i am suppose to deal with reading words from the file. My current code looks a little like this
fgets(string, BUFF_SIZE, file);
char *ptr = strtok(string, delim);
reti = regcomp(&regex, "[aoueiyAOUEIY]+[aoueiyAOUEIY].{0,}(ly|ing|LY|ING)$", REG_EXTENDED);
if (reti){
fprintf(stderr, "Could not compile regex\n");
exit(1);
}
/* Execute regular expression */
reti = regexec(&regex, ptr , 0, NULL, 0);
if (!reti) {
puts("Match");
printf(" %s\n", string);
}
else if (reti == REG_NOMATCH) {
puts("No match");
printf(" %s\n", string);
}
else {
regerror(reti, &regex, msgbuf, sizeof(msgbuf));
fprintf(stderr, "Regex match failed: %s\n", msgbuf);
exit(1);
}
Im aware that i need some sort of loop so that i can check more then one word,i wanted to try how strtok would work but realised that i stil face the same problem. If i for example have the line fairly standing. jumping? hoping! there is just to many "chars" that a word can end on, how to i make my delim understand that it's at an end of a word. Im thinking of doing a second regex that only has letter in it and compare until i get a reg no match. But the issue with that is that the buffer will get full very quickly.
For a task like this it's important to define "what is a word".
For instance consider "bad!idea this!is" is that the 4 words "bad", "idea" "this" "is" or is it the 4 words "bad!", "idea" "this!" "is" or is it just the two words "bad!idea" "this!is".
And what if the input is "bad3idea this9is" ?
Sometimes the standard functions (e.g. strtok, fscanf) will fit your needs and in such cases you should use them.
In case the standard functions do not fit, you can use fgetc to implement something that fit your needs.
The example below will consider anything that is not a letter (i.e. not a-z or A-Z) as word delimiters.
int end_of_file = 0;
while(!end_of_file)
{
int index = 0;
int c = fgetc(file);
if (c == EOF) break; // Done with the file
while (isalpha(c))
{
string[index] = c;
++index;
if (index == BUFF_SIZE)
{
// oh dear, the buffer is too small
//
// Just end the program...
exit(1);
}
c = fgetc(file);
if (c == EOF)
{
end_of_file = 1;
break;
}
}
string[index] = '\0';
if (index >= 4) // We need at least 4 chars for a match
{
// do the regex stuff
}
}

c- sysmalloc assertion problems

I am hoping that someone can help me understand where I have gone wrong here. I am implementing a program to check for spelling correctness. In the process I use a trie data structure to load into memory a dictionary text file to check words against.
Overall it seems to operate as expected but I get a lot of problems when loading in the longest possible word, namely pneumonoultramicroscopicsilicovolcanoconiosis. I do not understand why but first let me present some code -
/**
* Loads dictionary into memory. Returns true if successful else false.
*/
bool load(const char *dictionary)
{
FILE *dict = fopen(dictionary, "r");
if (dict == NULL)
{
fprintf(stderr, "Could not open %s dictionary file.\n", dictionary);
return false;
}
// Initialise the root t_node
root = (t_node *) malloc(sizeof(t_node));
if (root == NULL)
{
fprintf(stderr, "Could not allocate memory to trie structure.\n");
return false;
}
// Set all current values in root to NULL and is_word to false
for (int i = 0; i < ALPHA_SIZE; i++)
{
root->branch[i] = NULL;
}
root->is_word = false;
while (1)
{
// Create char aray to hold words from .txt dictionary file once read
char *word = (char *) malloc((LENGTH + 1) * sizeof(char));
if (fscanf(dict, "%s", word) == EOF)
{
free(word);
break;
}
t_node *cursor = root;
int len = strlen(word) + 1;
for (int i = 0; i < len; i++)
{
if (word[i] == '\0')
{
cursor->is_word = true;
cursor = root;
word_count++;
}
else
{
int index = (word[i] == '\'') ? ALPHA_SIZE - 1 : tolower(word[i]) - 'a';
if (cursor->branch[index] == NULL)
{
cursor->branch[index] = (t_node *) malloc(sizeof(t_node));
for (int j = 0; j < ALPHA_SIZE; j++)
{
cursor->branch[index]->branch[i] = NULL;
}
cursor->branch[index]->is_word = false;
}
cursor = cursor->branch[index];
}
}
free(word);
}
fclose(dict);
return true;
}
This is my entire function to load in a dictionary into memory. For reference I defined the trie structure and created root prior to this function. LENGTH is defined as 45 to account for the longest possible word. And ALPHA_SIZE is 27 to include lower case letters plus apostrophes.
As I said already with all other shorter words this function works well. But with the longest word the function works through about half of the word, getting up to index 29 of the word variable before suffering a sysmalloc assertion issue where it then aborts.
I've tried to find what is happening here but the most I can see is that it suffers the fault at -
cursor->branch[index] = (t_node *) malloc(sizeof(t_node));
once it gets to the 29th index of word, but no other indexes prior. And all other posts I can find relate to functions giving this error that do not work at all rather than most of the time with an exception.
Can anyone see what I cannot and what may be the error I made in this code? I'd appreciate any help and thank you all for your time taken to consider my problem.
* UPDATE *
First of all I want to thank everyone for all of their help. I was so pleasantly surprised to see how many people responded to my issue and how quickly they did! I cannot express my gratitude to all of you for your help. Especially Basile Starynkevitch who gave me a great amount of information and offered a lot of help.
I am extremely embarrassed to say that I have found my issue and it's something that I should have caught a LONG time before turning to SO. So I must apologise for using up everyone's time on something so silly. My problem lied here -
else
{
int index = (word[i] == '\'') ? ALPHA_SIZE - 1 : tolower(word[i]) - 'a';
if (cursor->branch[index] == NULL)
{
cursor->branch[index] = (t_node *) malloc(sizeof(t_node));
for (int j = 0; j < ALPHA_SIZE; j++)
{
cursor->branch[index]->branch[j] = NULL; // <<< PROBLEM WAS HERE
}
cursor->branch[index]->is_word = false;
}
cursor = cursor->branch[index];
}
In my code originally I had 'cursor->branch[index]->branch[i] = NULL' where I was iterating through 'int j' in that loop, not i ....
Sooooo once again thank you all for your help! I am sorry for my poorly formatted question and I will do better to abide by the SO guidelines in the future.
Your
char *word = (char *) malloc((LENGTH + 1) * sizeof(char));
is not followed by a test on failure of malloc; you need to add:
if (!word) { perror("malloc word"); exit(EXIT_FAILURE); }
before
if (fscanf(dict, "%s", word) == EOF)
since using fscanf with %s on a NULL pointer is wrong (undefined behavior, probably).
BTW, recent versions of fscanf (or with dynamic memory TR) accepts the %ms specifier to allocate a string when reading it. On those systems you could:
char*word = NULL;
if (fscanf(dict, "%ms", &word) == EOF))
break;
and some systems have getline, see this.
At last, compile with all warnings and debug info (gcc -Wall -Wextra -g with GCC), improve your code to get no warnings, and use the debugger gdb and valgrind.
BTW pneumonoultramicroscopicsilicovolcanoconiosis has 45 letters. You need one additional byte for the terminating NUL (otherwise you have a buffer overflow). So your LENGTH should be at least 46 (and I recommend choosing something slightly bigger, perhaps 64; in fact I recommend using systematically C dynamic memory allocation and avoiding hard-coding such limits and code in a more robust style, following the GNU coding standards).

Can printf change its parameters?

EDIT:
complete code with main is here http://codepad.org/79aLzj2H
and once again this is were the weird behavious is happening
for (i = 0; i<tab_size; i++)
{
//CORRECT OUTPUT
printf("%s\n", tableau[i].capitale);
printf("%s\n", tableau[i].pays);
printf("%s\n", tableau[i].commentaire);
//WRONG OUTPUT
//printf("%s --- %s --- %s |\n", tableau[i].capitale, tableau[i].pays, tableau[i].commentaire);
}
I have an array of the following strcuture
struct T_info
{
char capitale[255];
char pays[255];
char commentaire[255];
};
struct T_info *tableau;
This is how the array is populated
int advance(FILE *f)
{
char c;
c = getc(f);
if(c == '\n')
return 0;
while(c != EOF && (c == ' ' || c == '\t'))
{
c = getc(f);
}
return fseek(f, -1, SEEK_CUR);
}
int get_word(FILE *f, char * buffer)
{
char c;
int count = 0;
int space = 0;
while((c = getc(f)) != EOF)
{
if (c == '\n')
{
buffer[count] = '\0';
return -2;
}
if ((c == ' ' || c == '\t') && space < 1)
{
buffer[count] = c;
count ++;
space++;
}
else
{
if (c != ' ' && c != '\t')
{
buffer[count] = c;
count ++;
space = 0;
}
else /* more than one space*/
{
advance(f);
break;
}
}
}
buffer[count] = '\0';
if(c == EOF)
return -1;
return count;
}
void fill_table(FILE *f,struct T_info *tab)
{
int line = 0, column = 0;
fseek(f, 0, SEEK_SET);
char buffer[MAX_LINE];
char c;
int res;
int i = 0;
while((res = get_word(f, buffer)) != -999)
{
switch(column)
{
case 0:
strcpy(tab[line].capitale, buffer);
column++;
break;
case 1:
strcpy(tab[line].pays, buffer);
column++;
break;
default:
strcpy(tab[line].commentaire, buffer);
column++;
break;
}
/*if I printf each one alone here, everything works ok*/
//last word in line
if (res == -2)
{
if (column == 2)
{
strcpy(tab[line].commentaire, " ");
}
//wrong output here
printf("%s -- %s -- %s\n", tab[line].capitale, tab[line].pays, tab[line].commentaire);
column = 0;
line++;
continue;
}
column = column % 3;
if (column == 0)
{
line++;
}
/*EOF reached*/
if(res == -1)
return;
}
return ;
}
Edit :
trying this
printf("%s -- ", tab[line].capitale);
printf("%s --", tab[line].pays);
printf("%s --\n", tab[line].commentaire);
gives me as result
-- --abi -- Emirats arabes unis
I expect to get
Abu Dhabi -- Emirats arabes unis --
Am I missing something?
Does printf have side effects?
Well, it prints to the screen. That's a side effect. Other than that: no.
is printf changing its parameters
No
I get wrong resutts [...] what is going on?
If by wrong results you mean that the output does not appear when it should, this is probably just a line buffering issue (your second version does not print newline which may cause the output to not be flushed).
It's highly unlikely that printf is your problem. What is far, far more likely is that you're corrupting memory and your strange results from printf are just a symptom.
There are several places I see in your code which might result in reading or writing past the end of an array. It's hard to say which of them might be causing you problems without seeing your input, but here are a few that I noticed:
get_lines_count won't count the last line if it doesn't end in a newline, but your other methods will process that line
advance will skip over a newline if it is preceded by spaces, which will cause your column-based processing to get off, and could result in some of your strings being uninitialized
get_word doesn't do any bounds checks on buffer
There may be others, those were just the ones that popped out at me.
I tested your code, adding the missing parts (MAX_LINE constant, main function and a sample datafile with three columns separated by 2+ whitespace), and the code works as expected.
Perhaps the code you posted is still not complete (fill_table() looks for a -999 magic number from get_word(), but get_word() never returns that), your main function is missing, so we don't know if you are properly allocating memory, etc.
Unrelated but important: it is not recommended (and also not portable) to do relative movements with fseek in text files. You probably want to use ungetc instead in this case. If you really want to move the file pointer while reading a text stream, you should use fgetpos and fsetpos.
Your approach for getting help is very wrong. You assumed that printf had side effects without even understanding your code. The problem is clearly not in printf, but you held information unnecessarily. Your code is not complete. You should create a reduced testcase that compiles and displays your problem clearly, and include it in full in your question. Don't blame random library functions if you don't understand what is really wrong with your program. The problem can be anywhere.
From your comments, i am assuming if you use these printf statements,
printf("%s\n", tableau[i].capitale);
printf("%s", tableau[i].pays);
printf("%s\n", tableau[i].commentaire);
then everything works fine...
So try replacing your single printf statement with this. (Line no. 173 in http://codepad.org/79aLzj2H)
printf("%s\n %s %s /n", tableau[i].capitale, tableau[i].pays, tableau[i].commentaire);

Resources