Finding and counting strings in C using fgets and strstr - c

I'm trying to work on an assignment. The idea is to get an array of strings, and a file stream. I need to look for those strings in the file, and count the occurrence of these strings.
I think I've got the basic loop down. The only problem is that, when I find a string in the line, I want to search for that string again (in case of more than one occurrence) by starting from 1 + the position where the found string starts.
#define LINE_MAX_CHARS 1000
// n = number of strings to be found
// **strings = array of strings to look for in file
void count_occurrences (int n, FILE *file, char **strings) {
char str[LINE_MAX_CHARS]; // buffer
int count = 0;
while (fgets(str, LINE_MAX_CHARS, file) != NULL){ // for each line
for (int i = 0; i < n; i++){ // for each word in line
char *found;
found = (strstr(str, (*(strings + i)))); // search line
if (found != NULL){ // if word found in line
// here, I want str (the buffer) to have its 0-th element to be the element at (found + 1),
// and the 1-st element to be (found + 2) and so on...
i--; // to look for same word in the rest of the line
count = count + 1;
}
}
}
}
Another problem is that I have no way of testing my code. I'm just given a test program that runs and tells me whether or not my code is producing the correct output.
I am REQUIRED to use fgets and strstr.
Suggestions?

strstr(str, strings[i]) Returns a pointer to a position in the string. You should be able to increment that pointer (str++) and pass it straight back into strstr() in a loop, incrementing count each time, finishing the loop if strstr() returns NULL or str hits the null character.
It should look something like this. I've not tested this; but since this is your homework, if it doesn't work/compile quite right, I'm going to leave it to you to debug. That means I won't have done all the work for you...
;-)
void count_occurrences (int n, FILE *file, char **strings) {
char str[LINE_MAX_CHARS];
int count = 0;
while (fgets(str, LINE_MAX_CHARS, file) != NULL){
for (int i = 0; i < n; i++){
char *pos = str;
while(((pos = strstr(pos, strings[i]) != NULL) && *pos != '\n') {
count++;
pos++;
}
}
}
}

To count every occurrence of strings[i] in the current line, you have to use a loop and you have to let strstr start at least one position after the last occurrence. See the following code:
#define LINE_MAX_CHARS 1000
// n = number of strings to be found
// **strings = array of strings to look for in file
void count_occurrences (int n, FILE *file, char **strings) {
char str[LINE_MAX_CHARS]; // buffer
int count = 0;
while (fgets(str, LINE_MAX_CHARS, file) != NULL){ // for each line
for (int i = 0; i < n; i++){ // for each word in line
char *found = str;
do {
found = strstr(found, strings[i]); // search line
if (found != NULL){ // if word found in line
count = count + 1;
found++;
}
}
while (found)
}
}
}

Related

Find text inside the beg and end () parentheses in textile and read/print into a buffer. IN C

I am new to C and am getting very frustrated with learning this language. Currently I'm trying to write a program that reads in a program textfile, reads and prints all the string literals, and tokens each on separate line. I have most of it except for one snag. within the text file there is a line such as: (..text..). I need to be able to search, read and print all the text is inside the parentheses on it's own line. Here is an idea I have so far:
#define KEY 32
#define BUFFER_SIZE 500
FILE *fp, *fp2;
int main()
{
char ch, buffer[BUFFER_SIZE], operators[] = "+-*%=", separators[] = "(){}[]<>,";
char *pus;
char source[200 + 1];
int i, j = 0, k = 0;
char *words = NULL, *word = NULL, c;
fp = fopen("main.txt", "r");
fp2 = fopen ("mynewfile.txt","w") ;
while ((ch = fgetc(fp)) != EOF)
{
// pus[k++] = ch;
if( ch == '(')
{
for ( k = 0;, k < 20, K++){
buffer[k] = ch;
buffer[k] = '\0';
}
printf("%s\n", buffer)
}
....
The textfile is this:
#include <stdio.h>
int main(int argc, char **argv)
{
for (int i = 0; i < argc; ++i)
{
printf("argv[%d]: %s\n", i, argv[i]);
}
}
So far I've been able to read char by char and place it into a buffer. But this idea just isn't working, and I'm stumped. I've tried dabbling with strcopy(), ands strtok, but they all take char arrays. Any ideas would be appreciated thank you.
Most likely the best way would be to use fgets() with a file to read in each line as a string (char array) and then delimit that string. See the short example below:
char buffer[BUFFER_SIZE];
int current_line = 0;
//Continually read in lines until nothing is left...
while(fgets(buffer, BUFFER_SIZE - 1, fp) != NULL)
{
//Line from file is now in buffer. We can delimit it.
char copy[BUFFER_SIZE];
//Copy as strtok will overwrite a string.
strcpy(copy, buffer);
printf("Line: %d - %s", current_line, buffer); //Print the line.
char * found = strtok(copy, separators); //Will delmit based on the separators.
while(found != NULL)
{
printf("%s", found);
found = strtok(NULL, separators);
}
current_line++;
}
strtok will return a char pointer to where the first occurrence of a delimiter is. It will replace the delimiter with the null terminator, thereby making "new" string. We can pass NULL to strtok to tell it to continue where it left off. Using this, we can parse line by line from a file based on multiple delimiters. You could save these individual string or evaluate them further.

C loop to read lines of input

I want to create a program in C that takes an arbitrary number of lines of arbitrary length as input and then prints to console the last line that was inputted. For example:
input:
hi
my name is
david
output: david
I figured the best way to do this would be to have a loop that takes each line as input and stores it in a char array, so at the end of the loop the last line ends up being what is stored in the char array and we can just print that.
I have only had one lecture in C so far so I think I just keep setting things up wrong with my Java/C++ mindset since I have more experience in those languages.
Here is what I have so far but I know that it's nowhere near correct:
#include <stdio.h>
int main()
{
printf("Enter some lines of strings: \n");
char line[50];
for(int i = 0; i < 10; i++){
line = getline(); //I know this is inproper syntax but I want to do something like this
}
printf("%s",line);
}
I also have i < 10 in the loop because I don't know how to find the total number of lines in the input which, would be the proper amount of times to loop this. Also, the input is being put in all at once from the
./program < test.txt
command in Unix shell, where test.txt has the input.
Use fgets():
while (fgets(line, sizeof line, stdin)) {
// don't need to do anything here
}
printf("%s", line);
You don't need a limit on the number of iterations. At the end of the file, fgets() returns NULL and doesn't modify the buffer, so line will still hold the last line that was read.
I'm assuming you know the maximum length of the input line.
This one here will surely do the job for you
static char *getLine( char * const b , size_t bsz ) {
return fgets(b, bsz, stdin) );
}
But remember fgets also puts a '\n' character at the end of buffer so perhaps something like this
static char *getLine( char * const b , size_t bsz ) {
if( fgets(b, bsz, stdin) ){
/* Optional code to strip NextLine */
size_t size = strlen(b);
if( size > 0 && b[size-1] == '\n' ) {
b[--size] = '\0';
}
/* End of Optional Code */
return b;
}
return NULL;
}
and your code needs to be altered a bit while calling the getline
#define BUF_SIZE 256
char line[BUF_SIZE];
for(int i = 0; i < 10; i++){
if( getLine(line, BUF_SIZE ) ) {
fprintf(stdout, "line : '%s'\n", line);
}
}
Now it is how ever quite possible to create function like
char *getLine();
but then one needs to define the behavior of that function for instance if the function getLine() allocates memory dynamically then you probably need use a free to de-allocate the pointer returned by getLine()
in which case the function may look like
char *getLine( size_t bsz ) {
char *b = malloc( bsz );
if( b && fgets(b, bsz, stdin) ){
return b;
}
return NULL;
}
depending on how small your function is you can entertain thoughts about making it inline perhaps that's a little off topic for now.
In order to have dynamic number of input of dynamic length, you have to keep on reallocating your buffer when the input is of greater length. In order to store the last line, you have to take another pointer to keep track of it and to stop the input from the terminal you have to press EOF key(ctrl+k). This should do your job.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char *get_last_line(FILE* fp, size_t size){
//The size is extended by the input with the value of the provisional
char *str, *last_str = NULL;
int ch;
size_t len = 0, last_len = 0;
str = realloc(NULL, sizeof(char)*size);//size is start size
if(!str)return str;
while(ch=fgetc(fp)){
if(ch == EOF){
break;
}
if(ch == '\n'){
str[len]='\0';
last_len = len;
last_str = realloc(last_str,sizeof(char)*last_len);
last_str[last_len]='\0';
//storing the last line
memcpy(last_str,str,sizeof(char)*last_len);
str = realloc(NULL, sizeof(char)*size);//size is start size
len = 0;
}
else {
str[len++]=ch;
if(len==size){
str = realloc(str, sizeof(char)*(size+=16));
if(!str)return str;
}
}
}
free(str);
return last_str;
}
int main(void){
char *m;
printf("input strings : ");
m = get_last_line(stdin, 10);
printf("last string :");
printf("%s\n", m);
free(m);
return 0;
}

How many elements in string array?

I want to find how many names in names array. I know sizeof(names)/sizeof(names[0]) gives the right answer. But the problem is I can't just declare char *names[];. Because compiler gives me an error like this "Storage of names is unknown". To avoid this error, I must declare like this char *names[] = {"somename", "somename2"};. But the thing is I cannot assign the strings right after deceleration. I assign strings after some conditions and my problem is how many strings i have after that conditions.
My example.
char *names[];
char word[10];
int i = 0;
while (fscanf(word, sizeof(word), fp)>0) {
// Think hello increase every time loop returns.
// such as "hello1", and the 2nd time "hello2"
if(strcmp(word, "hello1") == 0)
names[i] = word;
}
printf("size: %d\n", sizeof(names)/sizeof(names[0]));
An array size is fixed once the array is created. It cannot change.
If fp can be read twice, read the file once for the word count.
size_t word_count = 0;
int word_length_max = 0;
long pos = ftell(fp); // remember file location
int n = 0;
while (fscanf(fp, "%*s%n", &n) != EOF && n > 0) { // Use %n to record character count
word_count++;
if (n > word_length_max) {
word_length_max = n;
}
n = 0;
}
Now code knows the word[] array size needed and the maximum length.
char *words[word_count];
char word[word_length_max + 1u]; // buffer size needed to read in the words
fseek(fp, pos, SEEK_SET); // go back
for (size_t i=0; i<word_count; i++) {
if (fscanf(fp, "%s", word) != 1) {
Handle_UnexpectedError(); // 2nd pass should have same acceptable results
}
words[i] = strdup(word); // allocate a duplicate
}
When done with words[], be sure to free the allocated memory.
....
for (size_t i=0; i<word_count; i++) {
free(words[i]);
}
Better code would also check the return value of ftell(), fseek(), malloc() for errors and limit fscanf(fp, "%s", word).

sscanf parse formatted string

I would like to read a string containing a undefined amount of suffixes, all separated by ;
example 1: « .txt;.jpg;.png »
example 2: « .txt;.ods;_music.mp3;.mjpeg;.ext1;.ext2 »
I browsed the web and wrote that piece of code that doesn't work:
char *suffix[MAX]; /* will containt pointers to the different suffixes */
for (i = 0; i < MAX ; i++)
{
suffix[i] = NULL;
if (suffix_str && sscanf(suffix_str,"%[^;];%[^\n]",suffix[i],suffix_str) < 1)
suffix_str = NULL;
}
After the first iteration, the result of sscanf is 0. Why didn't it read the content of the string?
How should be parsed a string containing an undefined number of elements? Is sscanf a good choice?
First, as covered in general comment, you're invoking undefined behavior by using the same buffer as both a source input and destination target for sscanf. Per the C standard, that isn't allowed.
The correct function to use for this would likely be strtok. A very simply example appears below.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char line[] = ".txt;.ods;_music.mp3;.mjpeg;.ext1;.ext2";
size_t slen = strlen(line); // worst case
char *suffix[slen/2+1], *ext;
size_t count=0;
for (ext = strtok(line, ";"); ext; ext = strtok(NULL, ";"))
suffix[count++] = ext;
// show suffix array entries we pulled
for (size_t i=0; i<count; ++i)
printf("%s ", suffix[i]);
fputc('\n', stdout);
}
Output
.txt .ods _music.mp3 .mjpeg .ext1 .ext2
Notes
This code assumes a worst-case suffix count to be half the string length, thereby a list of single character suffixes split on the delimiter.
The suffix array contains pointers into the now-sliced-up original line buffer. The lifetime of usability for those pointers is therefore only as long as that of the line buffer itself.
Hope it helps.
There are several ways to tokenize from a C string. In addition to using strtok and sscanf you could also do something like this:
char *temp = suffix_str;
char *suffix[i];
for (int i = 0; i < MAX; i++)
{
int j = 0;
char buf[32];
while (*temp != '\0' && *temp != '\n' && *temp != ';')
{
buf[j++] = *temp;
temp++;
}
buf[j] = 0;
if (*temp == ';') temp++;
suffix[i] = malloc((strlen(buf) + 1) * sizeof(char));
//handle memory allocation error
strcpy(suffix[i], buf);
}

"Pointer being freed was not allocated" happen on mac but not on window7

I am doing an exercise on a book, changing the words in a sentence into pig latin. The code works fine in window 7, but when I compiled it in mac, the error comes out.
After some testings, the error comes from there. I don't understand the reason of this problem. I am using dynamic memories for all the pointers and I have also added the checking of null pointer.
while (walker != NULL && *walker != NULL){
free(**walker);
free(*walker);
free(walker);
walker++;
}
Full source code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>
#define inputSize 81
void getSentence(char sentence [], int size);
int countWord(char sentence[]);
char ***parseSentence(char sentence[], int *count);
char *translate(char *world);
char *translateSentence(char ***words, int count);
int main(void){
/* Local definition*/
char sentence[inputSize];
int wordsCnt;
char ***head;
char *result;
getSentence(sentence, inputSize);
head = parseSentence(sentence, &wordsCnt);
result = translateSentence(head, wordsCnt);
printf("\nFinish the translation: \n");
printf("%s", result);
return 0;
}
void getSentence(char sentence [81], int size){
char *input = (char *)malloc(size);
int length;
printf("Input the sentence to big latin : ");
fflush(stdout);
fgets(input, size, stdin);
// do not copy the return character at inedx of length - 1
// add back delimater
length = strlen(input);
strncpy(sentence, input, length-1);
sentence[length-1]='\0';
free(input);
}
int countWord(char sentence[]){
int count=0;
/*Copy string for counting */
int length = strlen(sentence);
char *temp = (char *)malloc(length+1);
strcpy(temp, sentence);
/* Counting */
char *pToken = strtok(temp, " ");
char *last = NULL;
assert(pToken == temp);
while (pToken){
count++;
pToken = strtok(NULL, " ");
}
free(temp);
return count;
}
char ***parseSentence(char sentence[], int *count){
// parse the sentence into string tokens
// save string tokens as a array
// and assign the first one element to the head
char *pToken;
char ***words;
char *pW;
int noWords = countWord(sentence);
*count = noWords;
/* Initiaze array */
int i;
words = (char ***)calloc(noWords+1, sizeof(char **));
for (i = 0; i< noWords; i++){
words[i] = (char **)malloc(sizeof(char *));
}
/* Parse string */
// first element
pToken = strtok(sentence, " ");
if (pToken){
pW = (char *)malloc(strlen(pToken)+1);
strcpy(pW, pToken);
**words = pW;
/***words = pToken;*/
// other elements
for (i=1; i<noWords; i++){
pToken = strtok(NULL, " ");
pW = (char *)malloc(strlen(pToken)+1);
strcpy(pW, pToken);
**(words + i) = pW;
/***(words + i) = pToken;*/
}
}
/* Loop control */
words[noWords] = NULL;
return words;
}
/* Translate a world into big latin */
char *translate(char *word){
int length = strlen(word);
char *bigLatin = (char *)malloc(length+3);
/* translate the word into pig latin */
static char *vowel = "AEIOUaeiou";
char *matchLetter;
matchLetter = strchr(vowel, *word);
// consonant
if (matchLetter == NULL){
// copy the letter except the head
// length = lenght of string without delimiter
// cat the head and add ay
// this will copy the delimater,
strncpy(bigLatin, word+1, length);
strncat(bigLatin, word, 1);
strcat(bigLatin, "ay");
}
// vowel
else {
// just append "ay"
strcpy(bigLatin, word);
strcat(bigLatin, "ay");
}
return bigLatin;
}
char *translateSentence(char ***words, int count){
char *bigLatinSentence;
int length = 0;
char *bigLatinWord;
/* calculate the sum of the length of the words */
char ***walker = words;
while (*walker){
length += strlen(**walker);
walker++;
}
/* allocate space for return string */
// one space between 2 words
// numbers of space required =
// length of words
// + (no. of words * of a spaces (1) -1 )
// + delimater
// + (no. of words * ay (2) )
int lengthOfResult = length + count + (count * 2);
bigLatinSentence = (char *)malloc(lengthOfResult);
// trick to initialize the first memory
strcpy(bigLatinSentence, "");
/* Translate each word */
int i;
char *w;
for (i=0; i<count; i++){
w = translate(**(words + i));
strcat(bigLatinSentence, w);
strcat(bigLatinSentence, " ");
assert(w != **(words + i));
free(w);
}
/* free memory of big latin words */
walker = words;
while (walker != NULL && *walker != NULL){
free(**walker);
free(*walker);
free(walker);
walker++;
}
return bigLatinSentence;
}
Your code is unnecessarily complicated, because you have set things up such that:
n: the number of words
words: points to allocated memory that can hold n+1 char ** values in sequence
words[i] (0 <= i && i < n): points to allocated memory that can hold one char * in sequence
words[n]: NULL
words[i][0]: points to allocated memory for a word (as before, 0 <= i < n)
Since each words[i] points to stuff-in-sequence, there is a words[i][j] for some valid integer j ... but the allowed value for j is always 0, as there is only one char * malloc()ed there. So you could eliminate this level of indirection entirely, and just have char **words.
That's not the problem, though. The freeing loop starts with walker identical to words, so it first attempts to free words[0][0] (which is fine and works), then attempts to free words[0] (which is fine and works), then attempts to free words (which is fine and works but means you can no longer access any other words[i] for any value of i—i.e., a "storage leak"). Then it increments walker, making it more or less equivalent to &words[1]; but words has already been free()d.
Instead of using walker here, I'd use a loop with some integer i:
for (i = 0; words[i] != NULL; i++) {
free(words[i][0]);
free(words[i]);
}
free(words);
I'd also recommending removing all the casts on malloc() and calloc() return values. If you get compiler warnings after doing this, they usually mean one of two things:
you've forgotten to #include <stdlib.h>, or
you're invoking a C++ compiler on your C code.
The latter sometimes works but is a recipe for misery: good C code is bad C++ code and good C++ code is not C code. :-)
Edit: PS: I missed the off-by-one lengthOfResult that #David RF caught.
int lengthOfResult = length + count + (count * 2);
must be
int lengthOfResult = length + count + (count * 2) + 1; /* + 1 for final '\0' */
while (walker != NULL && *walker != NULL){
free(**walker);
free(*walker);
/* free(walker); Don't do this, you still need walker */
walker++;
}
free(words); /* Now */
And you have a leak:
int main(void)
{
...
free(result); /* You have to free the return of translateSentence() */
return 0;
}
In this code:
while (walker != NULL && *walker != NULL){
free(**walker);
free(*walker);
free(walker);
walker++;
}
You need to check that **walker is not NULL before freeing it.
Also - when you compute the length of memory you need to return the string, you are one byte short because you copy each word PLUS A SPACE (including a space after the last word) PLUS THE TERMINATING \0. In other words, when you copy your result into the bigLatinSentence, you will overwrite some memory that isn't yours. Sometimes you get away with that, and sometimes you don't...
Wow, so I was intrigued by this, and it took me a while to figure out.
Now that I figured it out, I feel dumb.
What I noticed from running under gdb is that the thing failed on the second run through the loop on the line
free(walker);
Now why would that be so. This is where I feel dumb for not seeing it right away. When you run that line, the first time, the whole array of char*** pointers at words (aka walker on the first run through) on the second run through, when your run that line, you're trying to free already freed memory.
So it should be:
while (walker != NULL && *walker != NULL){
free(**walker);
free(*walker);
walker++;
}
free(words);
Edit:
I also want to note that you don't have to cast from void * in C.
So when you call malloc, you don't need the (char *) in there.

Resources