Get the text before and after strstr in C - c

I need to be able to extract the characters before and after a substring, currently I have the following code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char *argv[]){
char *text = (char *) malloc (10000000);
char *word = argv[1];
int rep;
FILE *f;
if(argc < 2)
{
printf("Usage: GET <website> | ./word_counter <word>\n");
exit(1);
}
fread(text, 100, 10000000, stdin);
const char *tmp = text;
f = fopen("output.txt", "w");
fprintf(f, "%s\n", "REPS");
while(tmp = strstr(tmp, word)){
printf("%.50s\n", tmp);
rep++;
tmp++;
}
printf("Word count: %d\n", rep);
fclose(f);
system("gedit output.txt");
return 0;
}
I made a copy of the original input so I could leave it untouched and get the "before" characters from it.
Using strstr() on tmp (the original input copy) I can find the instances of the word I'm looking for and print the first 50 characters. But knowing this, how can I access the 50 characters BEFORE this instance?
Any help will be appreciated. Thanks!

Apart from the printing question itself, there are a couple of errors in your code. I have corrected most of them; a short list is:
Always test if malloc succeeded.
fread(text, 100, 10000000, ..) reads way too many text. 100 * 10000000 = 1000000000, almost a full gigabyte. You only allocated enough memory for 10 Mb.
You read from a text file and treat this data as a string. Therefore, you must make sure the data ends with a 0, else functions such as printf and strstr will try to continue reading after the end.
Your rep variable starts out uninitialized and therefore you will always see a random number.
Always free memory you allocated.
That said, it is slightly more efficient to use a dedicated function to print out text – if only to not put too much in your main. And since it's a function, you can add as many useful parameters into it as you want; I added before and after variables, so you can vary the number of characters shown.
For added niceness, this function prints a correct number of spaces when the phrase is found before the minimum number of before characters, so the results line up nicely. Also, since printing out characters such as tab and newlines will mess up your output, I replaced them with ?.
There is, admittedly, some repetition in print_range but in this case I went for clarity, rather than brevity.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_LENGTH 10000000
void print_range (char *source_text, int startindex, int before, int after, int phrase_length)
{
int i;
if (before > startindex)
{
for (i=0; i<before-startindex; i++)
printf (" ");
startindex = before;
}
for (i=0; i<before; i++)
{
if (strchr ("\t\r\n", source_text[startindex-before+i]))
printf ("?");
else
printf ("%c", source_text[startindex-before+i]);
}
for (i=0; i<phrase_length; i++)
{
if (strchr ("\t\r\n", source_text[startindex+i]))
printf ("?");
else
printf ("%c", source_text[startindex+i]);
}
for (i=0; i<after; i++)
{
if (!source_text[startindex+phrase_length+i])
break;
if (strchr ("\t\r\n", source_text[startindex+phrase_length+i]))
printf ("?");
else
printf ("%c", source_text[startindex+phrase_length+i]);
}
printf ("\n");
}
int main (int argc, char *argv[]){
char *text = (char *) malloc (MAX_LENGTH);
char *word = argv[1];
int rep = 0;
if (!text)
return -1;
if(argc < 2)
{
printf("Usage: GET <website> | ./word_counter <word>\n");
exit(1);
}
fread(text, 1, MAX_LENGTH, stdin);
text[MAX_LENGTH] = 0;
const char *tmp = text;
do
{
tmp = strstr(tmp, word);
if (!tmp)
break;
print_range (text, tmp-text, 16,16, strlen(word));
rep++;
tmp++;
} while (1);
free (text);
printf ("Word count: %d\n", rep);
return 0;
}
Result of running this on its own source code:
~/Documents $ ./wordcounter printf < wordcounter.c
tindex; i++)????printf (" ");???starti
-before+i]))????printf ("?");???else??
"?");???else????printf ("%c", source_t
before+i]);??}??printf ("{");??for (i=
rtindex+i]))????printf ("?");???else??
"?");???else????printf ("%c", source_t
tindex+i]);??}??printf ("}");??for (i=
_length+i]))????printf ("?");???else??
"?");???else????printf ("%c", source_t
length+i]);??}??printf ("\n");?}??int
argc < 2)??{??? printf("Usage: GET <we
?free (text);???printf ("Word count: %
Word count: 12

Related

counting words with arguments, fgets(), strncmp()

i would like to write a code that count how often the argument occurs in the input.
those are the requirements:
It may be assumed
that the lines in the input do not exceed 1024 characters. The string #EOF on the beginning of a line indicates the end of the input. It it not necessary to consider word
boundaries, and overlapping words must be counted as well: with an input of baaaab,
the word aa shall be counted three times. Also, the program must be case sensitive.
i already wrote a code, but i seem to have made some mistakes. Does anyone have an idea?
int main(int argc, char *argv[])
{
char buf[1026]="start";
int count=0;
while (strncmp(buf,"#EOF",4)!=0)
{
fgets(buf, 1025, stdin);
if (strncmp(buf, argv[1], strlen(argv[1]))==0)
{
count++;
}
}
if(argc==1)
printf("Please specify a program argument.");
if(argc>=2)
printf("%d", count);
return 0;
}
this is the program input with the argument let:
Let it be, let it be, let it be, let it be.
Whisper words of wisdom, let it be.
#EOF
and there is no output while it should be 4
this is the program input with argument aa:
aa aaaaa aa
aa aaa
#EOF
and the output is 2 while it should be 9
this is the program input with argument EOF:
De volgende EOF behoort ook tot de invoer: EOF
# Net als deze #EOF. Maar hieronder niet meer.
#EOF
and there is no input while it should be 3
thanks in advance
strncmp() tests for exact equality of the first n characters of each string provided. However, what you want is to count each occurrence, not just if the start of the line matches. For example, if you're looking for "let" in "Let it be, let it be, let it be, let it be.", you're only ever testing "Let" against "let". No match, no count. You never test further down the string.
So what you want to do is to loop over the result of fgets(), like so:
fgets(buf, 1025, stdin);
for (char *p = buf; *p; ++p) {
if (strncmp(p, argv[1], strlen(argv[1])) == 0)
{
count++;
}
}
This will test "let" against "Let", then "et ", then "t i", etc. until you've checked the whole line and counted the matches.
If you were to use strstr() instead of strncmp(), the loop would look like this:
for (char *p = buf; (p = strstr(p, argv[1])); ++p)
{
count++;
}
Your code only counts the first occurrence of the word in each line of input. You need to iterate through each input string to find ALL occurrences. Try something like this:
int main(int argc,char *argv[])
{
char buf[1026] = "start";
int len, matches = 0;
if (argc < 2) {
printf("Please specify a program argument.");
exit(1);
}
len = strlen(argv[1]);
while (strncmp(buf,"#EOF",4) != 0) {
fgets(buf,1025,stdin);
int buflen = strlen(buf);
for (int i = 0; i <= buflen - len; ++i) {
if (strncmp(&buf[i],argv[1],len) == 0)
++matches;
}
}
printf("'%s' found %d times\n",argv[1],matches);
return 0;
}
This is a functional and correct code written based on the answer given by Fred Larson
Big thanks to him.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char *argv[])
{
char buf[1026]="start";
int N;
int count=0;
char *p;
if(argc==1)
{
printf("Please specify a program argument.\n");
return(1);
}
N=strlen(argv[1]);
while (strncmp(buf,"#EOF",4)!=0)
{
fgets(buf, 1025, stdin);
for (p = buf;*p;p++)
{
if (strncmp(p, argv[1], N)==0)
{
if (strncmp(buf,"#EOF",4)!=0)
count++;
}
}
}
if(argc>=2)
printf("%d\n", count);
return 0;
}

fscanf() to read in only characters with no punctuation marks

I would like to read in some words (in this example first 20) from a text file (name specified as an argument in the command line). As the below code runs, I found it takes punctuation marks with characters too.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(int argc, char * argv[]){
int wordCap = 20;
int wordc = 0;
char** ptr = (char **) calloc (wordCap, sizeof(char*));
FILE *myFile = fopen (argv[1], "r");
if (!myFile) return 1;
rewind(myFile);
for (wordc = 0; wordc < wordCap; wordc++){
ptr[wordc] = (char *)malloc(30 * sizeof( char ) );
fscanf(myFile, "%s", ptr[wordc]);
int length = strlen(ptr[wordc]);
ptr[wordc][length] = '\0';
printf("word[%d] is %s\n", wordc, ptr[wordc]);
}
return 0;
}
As I pass through the sentence: "Once when a Lion was asleep a little Mouse began running up and down upon him;", "him" will be followed with a semicolon.
I changed the fscanf() to be fscanf(myFile, "[a-z | A-Z]", ptr[wordc]);, it takes the whole sentence as a word.
How can I change it to make the correct output?
You could accept the semi-colon and then remove it latter, like so:
after you've stored the word in ptr[wordc]:
i = 0;
while (i < strlen(ptr[wordc]))
{
if (strchr(".;,!?", ptr[wordc][i])) //add any char you wanna delete to that string
memmove(&ptr[wordc][i], &ptr[wordc][i + 1], strlen(ptr[wordc]) - i);
else
i++;
}
if (strlen(ptr[wordc]) > 0) // to not print any word that was just punctuations beforehand
printf("word[%d] is %s\n", wordc, ptr[wordc]);
I haven't tested this code, so there might be a typo or something in it.
Alternatively you could switch
fscanf(myFile, "%s", ptr[wordc]);
for
fscanf(myFile, "%29[a-zA-Z]%*[^a-zA-Z]", ptr[wordc]);
to capture only letters. the 29 limits word size so you don't get overflow since you're allocating size for only 30 chars

C string alteration

So, I got to a point where I have a string with words and punctuation marks ( a full sentence to be exact). I wanted to change one word in that sentence. The number of letters of the new word may not be the exact with the previous one. I also have a 2-d matrix with the words of the sentence changed now so that it has the new word instead of the old one. So I managed to trade all words in my original string with a * and keep the punctuation marks so that I can change the * with the words of the altered 2-d matrix and keep the punctuation marks. So my real question is how can I change the * of the string with whole words and then add the punctuation marks where needed.
Example:
Original string: HELLO PEOPLE. HELLO WORLD. HOW ARE YOU TODAY?
Word for change: WORLD --> MAN
String with '*': * *. * *.* * * *?
Result I want: HELLO PEOPLE. HELLO MAN. HOW ARE YOU TODAY?
I tried this (with text3 string with'*' and text 4 the result I want):
l1=0;sum3=0;
for (k=0;k<sum2;k++){
if (text3[k]=='*'){
strcpy(&text4[sum3],textb1[l1]);
l1++;
sum3=sum3+strlen(textb1[l1]);
}
else {
text4[sum3]=text3[k];
sum3++;
}
}
printf("%s\n",text4);
But I only manage to get the first HELLO printed.
Here is a full program, based on my "stack" idea, to demonstrate:
split then store words on spaces and punctuation;
store into a basic stack;
print out with spaces added where needed;
output new word whenever old word was in input.
The latter replaces every occurrence of old with new, as it was not specified the replacement should only occur once. Overflow handling of stack and memory cleanup omitted for clarity.
Pro
Spaces are always concatenated into one single space.
Easily adjustable for other scenarios.
Con
Spaces are always concatenated into one single space.
It only checks for alphanumerics and a basic punctuation set.
Handling of punctuation is very basic. For instance, initially I added () to punct as well, but these need further adjustments in the output routine that insert spaces. If necessary, you could, for instance, make stack a struct and add a spaceAfter member and save where the original spaces occurred.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
char *stack[256];
int stack_depth = 0;
const char *punct = ".,:;!?";
int main (int argc, char **argv)
{
char *ptr, *find, *change;
int i;
if (argc != 3)
{
ptr = strrchr (argv[0], '/');
if (!ptr)
ptr = strrchr (argv[0], '\\');
if (ptr)
ptr++;
else
ptr = argv[0];
printf ("usage: %s \"original string\" \"original --> new\"\n", ptr);
return 0;
}
ptr = argv[1];
while (*ptr)
{
while (*ptr == ' ')
ptr++;
i = 0;
while (ptr[i])
{
/* do not store spaces */
if (ptr[i] == ' ')
break;
/* stop on punctuation */
if (strchr (punct, ptr[i]))
break;
i++;
}
if (i)
{
stack[stack_depth] = malloc(i+1);
if (stack[stack_depth] == NULL)
{
printf ("oh wow, out of memory\n");
return -1;
}
strncpy (stack[stack_depth], ptr, i);
stack[stack_depth][i] = 0;
stack_depth++;
ptr += i;
}
if (*ptr && strchr (punct, *ptr))
{
i = 0;
while (ptr[i] && strchr (punct, ptr[i]))
i++;
stack[stack_depth] = malloc(i+1);
if (stack[stack_depth] == NULL)
{
printf ("oh wow, out of memory\n");
return -1;
}
strncpy (stack[stack_depth], ptr, i);
stack[stack_depth][i] = 0;
stack_depth++;
ptr += i;
}
}
printf ("Original string: ");
for (i=0; i<stack_depth; i++)
{
if (i > 0 && !strchr (punct, stack[i][0]))
printf (" ");
printf ("%s", stack[i]);
}
printf ("\n");
/* fetch change words */
ptr = strstr (argv[2], " --> ");
if (!ptr)
{
printf ("bad syntax!\n");
return -1;
}
/* fetch the length of 'find' */
i = ptr-argv[2];
find = malloc (i+1);
strncpy (find, argv[2], i);
find[i] = 0;
/* fetch the length of 'change' */
/* this is the 5 characters ' --> ' after start */
ptr += 5;
i = strlen(ptr);
change = malloc (i+1);
strncpy (change, ptr, i);
change[i] = 0;
printf ("Word for change: %s --> %s\n", find, change);
printf ("Result: ");
for (i=0; i<stack_depth; i++)
{
if (i > 0 && !strchr (punct, stack[i][0]))
printf (" ");
if (strcmp (stack[i], find))
printf ("%s", stack[i]);
else
printf ("%s", change);
}
printf ("\n");
}
Test run:
replace "HELLO PEOPLE. HELLO WORLD. HOW ARE YOU TODAY?" "WORLD --> MAN"
Original string: HELLO PEOPLE. HELLO WORLD. HOW ARE YOU TODAY?
Word for change: WORLD --> MAN
Result: HELLO PEOPLE. HELLO MAN. HOW ARE YOU TODAY?
A few bugs:
l1=0;sum3=0;
for (k=0;k<sum2;k++){
if (text3[k]=='*'){
strcpy(&text4[sum3],textb1[l1]);
/*l1++; */ /* You are incrementing the wrong length I think.... */
sum3=sum3+strlen(textb1[l1]);
l1++;
}
else {
text4[sum3]=text3[k];
sum3++;
}
}
text4[sum3] = 0; /* Null terminate */
printf("%s\n",text4);

Processing outputs of multiple inputs in C

It's not something trivial but I would like to know the best way to process multiple outputs, for example:
Input
First line of input will contain a number T = number of test cases. Following lines will contain a string each.
Output
For each string, print on a single line, "UNIQUE" - if the characters are all unique, else print "NOT UNIQUE"
Sample Input
3
DELHI
london
#include<iostream>
Sample Output
UNIQUE
NOT UNIQUE
NOT UNIQUE
So how can I accomplish outputs like that? My code so far is:
int main(int argc, char *argv[])
{
int inputs, count=0;
char str[100];
char *ptr;
scanf("%d",&inputs);
while(inputs-- >0)
{
scanf("%s",str);
for(ptr=str; *ptr!='\0';ptr++)
{
if( *ptr== *(ptr+1))
{
count++;
}
}
if(count>0)
{
printf("NOT UNIQUE");
}
else
{
printf("UNIQUE");
}
}
}
But the above will obviously print the output after each input, but I want the output only after entering all the inputs, if the user enters 3, then the user have to give 3 strings and after the output will be given whether the given strings are unique or not. So I want to know how can I achieve the result given in the problem. Also another thing I want to know is, I am using an array of 100 char, which it can hold a string up to 100 characters, but what do I have to do if I want to handle string with no limit? Just declaring char *str is no good, so what to do?
Hope this helps:
#include <stdio.h>
int main(int argc, char *argv[])
{
int inputs,count=0;
char str[20];
scanf("%d",&inputs);
char *ptr;
char *dummy;
while(inputs-- >0)
{
scanf("%s",str);
for(ptr=str; *ptr!='\0';ptr++)
{
for(dummy=ptr+1; *dummy != '\0';dummy++)
{
if( *ptr== *dummy)
{
count=1;
}
}
if(count == 1)
break;
}
if(count>0)
{
printf("NOT UNIQUE");
}
else
{
printf("UNIQUE");
}
}
}
If you want to save stuff for later use, you must store it somewhere. The example below stores up to 10 lines in buf and then points str to the current line:
#include <stdlib.h>
#include <stdio.h>
#include <string.h> /* for strlen */
#include <ctype.h> /* for isspace */
int main(int argc, char *argv[])
{
int ninput = 0;
char buf[10][100]; /* storage for 10 strings */
char *str; /* pointer to current string */
int i;
printf("Enter up to 10 strings, blank to and input:\n");
for (i = 0; i < 10; i++) {
int l;
str = buf[i];
/* read line and break on end-of-file (^D) */
if (fgets(str, 100, stdin) == NULL) break;
/* delete trailing newline & spaces */
l = strlen(str);
while (l > 0 && isspace(str[l - 1])) l--;
str[l] = '\0';
/* break loop on empty input */
if (l == 0) break;
ninput++;
}
printf("Your input:\n");
for (i = 0; i < ninput; i++) {
str = buf[i];
printf("[%d] '%s'\n", i + 1, str);
}
return 0;
}
Note the two separate loops for input and output.
I've also rejiggled your input. I'm not very fond of fscanf; I prefer to read input line-wise with fgets and then analyse the line with strtok or sscanf. The advantage over fscanf is that yout strings may contain white-space. The drawback is that you have a newline at the end which you usually don't want and have to "chomp".
If you want to allow for longer strings, you should use dynamic allocation with malloc, although I'm not sure if it is useful when reading user input from the console. Tackle that when you have understood the basics of fixed-size allocation on the stack.
Other people have already pointed you to the error in your check for uniqueness.

Detecting single character in string

So, I'm trying to detect a single character in a string. There must be no other characters besides whitespace and a null character. This is my first issue, as my code detects the character in a string with other characters (besides the whitespace).
My second issue, is I can't seem to figure out how best to read matrices from a file. I'm supposed to read the first line and get the ROWS x COLUMNS. Then I'm supposed to read the data into the a matrix array that is stored globally. Then reading the second matrix into a second matrix array (stored globally as well).
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <string.h>
#define MAXLINE 100
typedef struct matrixStruct{
int rows;
int columns;
}matrixStruct;
typedef int bool;
enum{
false,
true
};
/*
*
*/
int aMatrix1[10][10];
int aMatrix2[10][10];
int multiMatrix[10][10];
int main(int argc, char** argv){
FILE *inputFile;
char tempLine[MAXLINE], *tempChar, *tempString;
char *endChar;
endChar = (char *)malloc(sizeof(char));
(*endChar) = '*';
bool readFile = true;
inputFile = fopen(argv[1], "r");
if(inputFile == NULL){
printf("File %s not found.\n", argv[1]);
perror("Error");
exit(EXIT_FAILURE);
}else{
printf("File opened!\n");
}
int numRow, numColumn, i, j, tempNum, count = 0;
do{
fgets(tempLine, MAXLINE, inputFile);
tempChar = strchr(tempLine, '*');
if(tempChar != NULL){
printf("True # %s\ncount=%d\n",tempChar,count);
readFile = false;
}else{
sscanf(tempLine, "%d %d", &numRow, &numColumn);
count++;
for(i=0;i<numRow;i++){
fgets(tempLine, MAXLINE, inputFile);
for(j=0;j<numColumn;j++){
aMatrix1[i][j] = atoi(tempNum);
}
}
}
}
while(readFile);
printf("aMatrix1[%d][%d]= \n", numRow, numColumn);
for(i=0; i < numRow;i++){
for(j=0; j < numColumn; j++){
printf("aMatrix[%d][%d] = %d\t", i, j, aMatrix1[i][j]);
}
printf("\n");
}
return (EXIT_SUCCESS);
}
For the first issue you could do what you suggested in your comment (regexp are an overkill here) - loop through the string, break on any non-whitespace char that's not what you expect, and count the ones that do match - you don't want 0 matches, and i guess also no more than 1.
However, I suggest you read the man page for strtok - I normally wouldn't suggest it as it's not thread-safe and has strange behaviors, but in this simple case it could work fine - provide whitespace chars as delimiters, and it would return the first non-whitespace string. If that's doesn't strcmp with "*", or if the next call to strtok doesn't return null, then it's not a match.
By the way - what do you plan to do with lines that aren't " .. * .. " or " ROWS x COLUMNS "? you're not handling them right now.
As for the second issue - strtok again could come to the rescue - repeated calls would just give you the whitespace-delimited numbers (as strings), and you'll be able to populate tempNum for each iteration.

Resources