Displaying a string one word per line using IN/OUT flag - c

I have written a program that first stores an arbitrary number of lines of text from the user. After that, it checks when a new word has come and if it does, then it prints it in a new line.
Below is my code:
#include<stdio.h>
#define IN 1 //inside a word
#define OUT 0 //outside a word
int main()
{
char s[100]; //storing the string entered by user
int c; //getchar reading variable
int i=0; //iterating variable
while((c=getchar())!=EOF)
{
s[i++]=c;
}
s[i]='\0'; //end of string
i=0;
int p=0; //stores the start of the word
int current=OUT; //flag to indicate if program is inside or outside a word
while(s[i]!='\0')
{
if(current==OUT) //when program is outside a word
{
if(s[i]!=' ' || s[i]!='\n' || s[i]!='\t') //word found
{
p=i; //store starting position of word
current=IN;
}
}
else if(current==IN) //program is inside a word
{
if(s[i]==' ' || s[i]=='\n' || s[i]=='\t') //end of word found
{
current=OUT; //flag now outside the word
for(int j=p;j<i;j++) //print from starting position of word
{
printf("%c",s[j]);
}
printf("\n"); //go next line
}
}
++i; //incremnent the iterator variable
}
return 0;
}
My program works well if I just enter the string in a proper manner, i.e. without any extra spaces or new lines.
But if I enter a line as follows ( notice the extra spaces and new lines):
*I am a boy
I went to Japan */
Then it prints those extra newlines and spaces along with word too, which according to me should not happen because of the IN and OUT flags.
The output is like this:
enter image description here
I request you to please help me out.
I know I can do this easily with the putchar() method checking one character at one time, but I am just curious as to what I am doing wrong in this implementation.

First bug that jumps out at me:
if(s[i]!=' ' || s[i]!='\n' || s[i]!='\t')
will always return true. You want &&, or else use a !() around the whole condition that you use the other place, for symmetry.
Or better yet, factor that out into a function, or use isspace from <ctype.h>.

Your filtering condition for determining if a character is white space is not correct. The || operator means OR. Using chained OR will allow the expression to evaluate to true every time. You need the AND operator &&. The and operator fails as soon as one operand evaluates to false, or in the case of C, 0.
Besides that, there are better ways to check for white space. One idea is using the isspace function from <ctype.h>, which accepts a character as an int, which can also be an unsigned char, and determines if that character is any of ' ', '\t', '\v', '\n' or '\r'. You can also do character checking via switch statements
switch(ch) {
case ' ':
// do something
break;
case '\n':
//do something
break;
}

Related

Scan and swap string values with regex in ANSI C

I want to transform a given input in my c program, for example:
foo_bar_something-like_this
into this:
thissomethingbarfoolike
Explanation:
Every time I get a _, the following text up to, but not including, the next _ or - (or the end of the line) needs to go to the beginning (and the preceding _ needs to be removed). Every time I get a -, the following text up to, but not including, the next _ or - (or the end of the line) needs to be appended to the end (with the - removed).
If possible, I would like to use regular expressions in order to achieve this. If there is a way to do this directly from stdin, it would be optimal.
Note that it is not necessary to do it in a single regular expression. I can do some kind of loop to do this. In this case I believe I would have to capture the data in a variable first and then do my algorithm.
I have to do this operation for every line in my input, each of which ends with \n.
EDIT: I had already written a code for this without using anything related to regex, besides I should have posted it in the first place, my apologies. I know scanf should not be used to prevent buffer overflow, but the strings are already validated before being used in the program. The code is the following:
#include <stdio.h>
#include <stdlib.h>
#define MAX_LENGTH 100001 //A fixed maximum amount of characters per line
int main(){
char c=0;
/*
*home: 1 (append to the start), 0 (append to the end)
*str: array of words appended to the begining
*strlen: length of str
*line: string of words appended to the end
*linelen: length of line
*word: word between a combination of symbols - and _
*wordlen: length of the actual word
*/
int home,strlen,linelen,wordlen;
char **str,*line,*word;
str=(char**)malloc(MAX_LENGTH*sizeof(char*));
while(c!=EOF && scanf("%c",&c)!=EOF){
line=(char*)malloc(MAX_LENGTH);
word=(char*)malloc(MAX_LENGTH);
line[0]=word[0]='\0';
home=strlen=linelen=wordlen=0;
while(c!='\n'){
if(c=='-'){ //put word in str and restart word to '\0'
home=1;
str[strlen++]=word;
word=(char*)malloc(MAX_LENGTH);
wordlen=0;
word[0]='\0';
}else if(c=='_'){ //put word in str and restart word to '\0'
home=0;
str[strlen++]=word;
word=(char*)malloc(MAX_LENGTH);
wordlen=0;
word[0]='\0';
}else if(home){ //append the c to word
word[wordlen++]=c;
word[wordlen]='\0';
}else{ //append c to line
line[linelen++]=c;
line[linelen]='\0';
}
scanf("%c",&c); //scan the next character
}
printf("%s",word); //print the last word
free(word);
while(strlen--){ //print each word stored in the array
printf("%s",str[strlen]);
free(str[strlen]);
}
printf("%s\n",line); //print the text appended to the end
free(line);
}
return 0;
}
I do not think regex can do what you are asking for, so I wrote a simple state machine solution in C.
//
//Discription: This Program takes a string of character input, and parses it
//using underscore and hyphen as queue to either send data to
//the begining or end of the output.
//
//Date: 11/18/2017
//
//Author: Elizabeth Harasymiw
//
#include <stdio.h>
#include <string.h>
#define MAX_SIZE 100
typedef enum{ AppendEnd, AppendBegin } State; //Used to track either writeing to begining or end of output
int main(int argc,char**argv){
char ch; //Used to hold the character currently looking at
State state=AppendEnd; //creates the State
char Buffer[MAX_SIZE]={}; //Current Ouput
char Word[MAX_SIZE]={}; //Pending data to the Buffer
char *c; //Used to index and clear Word
while((ch = getc(stdin)) != EOF){
if(ch=='\n')continue;
switch(state){
case AppendEnd:
if( ch == '-' )
break;
if( ch == '_'){
state = AppendBegin; //Change State
strcat(Buffer, Word); //Add Word to end of Output
for(c=Word;*c;c++)*c=0; //Clear Word
break;
}
{
int postion = -1;
while(Word[++postion]); //Find end of Word
Word[postion] = ch; //Add Character to end of Word
}
break;
case AppendBegin:
if( ch == '-' ){
state = AppendEnd; //Change State
strcat(Word, Buffer); //Add Output to end of Word
strcpy(Buffer, Word); //Move Output from Word back to Output
for(c=Word;*c;c++)*c=0; //Clear Word
break;
}
if( ch == '_'){
strcat(Word, Buffer); //Add Output to end of Word
strcpy(Buffer, Word); //Move Output from Word back to Output
for(c=Word;*c;c++)*c=0; //Clear Word
break;
}
{
int postion = -1;
while(Word[++postion]); //Find end of Word
Word[postion] = ch; //Add Character to end of Word
}
break;
}
}
switch(state){ //Finish adding the Last Word Buffer to Output
case AppendEnd:
strcat(Buffer, Word); //Add Word to end of Output
break;
case AppendBegin:
strcat(Word, Buffer); //Add Output to end of Word
strcpy(Buffer, Word); //Move Output from Word back to Output
break;
}
printf("%s\n", Buffer);
}
This can be done with regexes using loops, assuming you aren't strictly restricted to ANSI. The following uses PCRE.
(Note that this answer deliberately does not show the C code. It is only meant to guide the OP by showing a possible technique for using regexes, as it is not obvious how to do so.)
Method A
Uses two different regexes.
Part 1/2 (Demo)
Regex: ([^_\n]*)_([^_\n]*)(_.*)? Substitution: $2--$1$3
This moves the text following the next underscore to the beginning, appending -- to it. It also removes the underscore. You need to repeat this substitution in a loop until no more matches are found.
For your example, this leads to the following string:
this--something-like--bar--foo
Part 2/2 (Demo):
Regex: (.*)(?<!-)-(?!-)(\w+)(.*) Substitution: $1$3--$2
This moves the text following the next single hyphen to the end, prepending -- to it. It also removes the hyphen. You need to repeat this substitution in a loop until no more matches are found.
For your example, this leads to the following string:
this--something--bar--foo--like
Remove the hyphens from the string to get your result.
Note that the first regex can be simplified to the following and will still work:
([^_]*)_([^_]*)(_.*)?
The \ns were only required to show the intermediate loop results in the demos.
The following are the reasons for using -- as a new separator:
A separator is required so that the regex in part 2 can find the correct end of hyphen prefixed text;
A underscore can't be used as it would interfere with the regex in part 1 causing an infinite loop;
A hyphen can't be used as it would cause the regex in part 2 to find extraneous text;
Although any single character delimiter which can never exist in the input would work and lead to a simpler part 2 regexp, -- is one of the delimiters which allows any and every character* in the input.
\n is actually the perfect * delimiter, but can't be used in this answer as it would not allow the demo to show the intermediate results. (Hint: it should be the actual delimiter used by you.)
Method B
Combines the two regexes.
(Demo)
Regex: ([^_\n]*)_([^_\n]*)(_.*)?|(.*)(?<!-)-(?!-)(\w+)(.*) Substitution: $2--$1$3$4$6--$5
For your example, this leads to the following string:
----this------something--bar--foo----like
As before, remove all the hyphens from the string to get your result.
Also as before, the regex can be simplified to the following and will still work:
([^_]*)_([^_]*)(_.*)?|(.*)(?<!-)-(?!-)(\w+)(.*)
This combined regex works because capturing groups 1,2 & 3 are mutually exclusive to groups 4, 5 & 6. There is a side effect of extra hyphens, however.
Caveat:
* Using -- as a delimiter fails if the input contains consecutive hyphens. All the other "good" delimiters have a similar failure edge case. Only \n is guaranteed not to exist in the input and thus is failsafe.

Making a more efficient function - Word counter

I'm going through some C programming questions and I want to make sure I got the fundamentals down. Currently I'm on a word counter question:
Q: Write a function which will determine how many words are in a given string. You can assume that one or more
consecutive white spaces is a delimiter between words, and that the string you pass to your function is null terminated.
I got the thing working, but efficiency is important. I'm wondering how it can be improved. Have to use pointers and no other library besides #include(stdio.h) Thanks!
#include <stdio.h>
int word_counter(char string[])
{
//We start with first word unless we have a empty string then we have no words
int count;
if(*string!='\0'){
count=1;
}
else{
count=0;
return 0;
}
//while we dont reach the end of the string
while(*string!='\0'){
//if we detect a whitespace
if(*string==' '){
//get previous character
string--;
// If previous character is not a space we increase the count
// Otherwise we dont since we already counted a word
if(*string!=' '){
count++;
}
//return pointer to current character
string++;
}
// set pointer to next character
string++;
}
return count;
}
//just to test if it works
int main(void)
{
char str[] = "Hello World!";
printf("How many words? = %i\n", word_counter(str));
return 0;
}
Looking at your code, I see there's a special case for the initial condition of an empty string. Sometimes getting the initial condition out of the way early simplifies the rest of the algorithm, sometimes you can eliminate it by changing how you look at the problem. This time it's the second one.
If you think about this as counting the boundaries between words, the algorithm becomes simpler. There's two ways to define a word boundary, from the front, and from the back.
" Prestidigitation \n"
^ ^
front back
Are we looking for a non-whitespace character after a whitespace character? Or are we looking for a whitespace character after a non-whitespace character?
You also have code that looks backwards in the string (string--), that's often not safe because what if the string starts with whitespace? Then you've walked backwards off the string, so moving backwards should be avoided.
Finally, there's the problem of whether or not there's any whitespace at the end of the string. We'd have to special case the end of the string.
So looking at the first word boundary is the way to go: a non-whitespace character after a whitespace character. Instead of looking backwards, we'll track the state of the previous character (last_was_space below).
That's a non-whitespace character after a whitespace character. What if the string doesn't start with whitespace?
"Basset hounds got long ears."
^
What about this?
Since we have last_was_space, we can initialize it to true and pretend the start of the string starts with whitespace. This also handles leading whitespace like " this is four words".
Finally, there's more types of space than just space like tab and newline and other exotic stuff. Instead of writing if( *space == ' ' || *space == '\n' == ... ) we can use switch to make things tidy and efficient. This is one of those rare cases where you want to take advantage of its "fall through" mechanic to do the same thing for multiple cases.
#include <stdio.h>
// Note that it's `const` since we don't touch the string memory.
int word_counter(const char string[]) {
// Start with no words.
int count = 0;
// Pretend every word starts with space.
short last_was_space = 1;
// Using a for loop to make the movement of the pointer more apparent
for( ; *string!='\0'; string++ ) {
// A switch can be faster than an if/else if.
switch( *string ) {
// There's more than one type of whitespace.
// These are from isspace().
// It takes advantage of switch's fall through.
case ' ':
case '\t':
case '\n':
case '\r':
case '\v':
case '\f':
// Remember we saw a space.
last_was_space = 1;
break;
default:
if( last_was_space ) {
// Non-whitespace after space, count it
count++;
// Remember we didn't see a space.
last_was_space = 0;
}
break;
}
}
return count;
}
Normally I'd use bool from stdbool.h and isspace from ctype.h, but your exercise can only use stdio.h.

Why am I getting a debug assertion failed error on running the code

When I enter a password in my program below and press enter I get a debug
assertion error, specifically isctype.c line 56
Expression:(unsigned)(c+1) <= 256
Could someone help me get around this please?
Code:
int main()
{
int j=0;
char pass[100];
int upper=0, lower=0, digit=0, sc=0;
printf("Enter your password:\n");
scanf("%s",&pass);
while(j!=' '){
if(isalpha(pass[j])){
if(isupper(pass[j])){
upper++;
}
else{
lower++;
}
}
else if(isdigit(pass[j])){
digit++;
}
else{
sc++;
}
j++;
}
if(upper==0||lower==0||digit==0||sc==0){
printf("Password must contain atleast one upper case, one lower case, one digit and a special character");
}
else{
printf("Good to go");
}
return 0;
_getch();
}
Replace
while (j!=' ')
by
while (pass[j] != 0)
You want to loop as long as pass[j] is different from zero. Remember, strings are terminated by a zero.
It looks like the problem in your code is
while(j!=' ')
which is checking j against space (' ') which is having ASCII value of 32 (decimal).
Essentially, you're unconditionally using pass array elements having index 0 to 31.
Then, pass is an automatic local variable and you did not initialize it. It contains indeterminate value.
If, your input is less than 31 characters, the remaining element of pass will remain uninitialized, and using them further (as the argument to is....() family, here) may lead to undefined behaviour.
Solution: You don't need to check for a space, (as %s does not accept one). Instead you should check for the null terminator \0. Change your code to
scanf("%s",&pass); to scanf("%99s",pass); to avoid possible buffer overflow.
while(j!=' ') to while(pass[j]) to loop until the string terminator null.
That said,
using _getch() after unconditional return statement does not make any sense. You can straight-away remove that _getch().
The recommended signature of main() is int main(void).

How to replace characters using 'getchar()' and if else?

My task is to check the user input and replace each period with exclamation mark, and each exclamation mark with 2 exclamation marks, then count the number of substitutions made.
This is my code:
int main(void)
{
int userInput, substitutionsNum = 0;
printf("please enter your input:\n");
while ((userInput = getchar()) != '#')
{
if (userInput == '.')
{
userInput = '!';
++substitutionsNum;
}
else if (userInput == '!')
{
userInput = '!!';
++substitutionsNum;
}
}
printf("%c, the number of substitutions are: %d", userInput, substitutionsNum);
return 0;
}
If I put in the input "nir." and then "#" to go out of the program, the output is "#, the number of substitutions are: 1"
You never print the input back out except once at the end, so the "replacement" won't work.
Also, you can't represent a pair of exclamation points as '!!', that's a multi-character literal which is not the same. At least, no I/O functions will do what you expect with it, if you try to print it for instance.
!!
is two characters. You assume it as a single character.
And you are overwriting the in the same variable userInput
You could use one more char buffer so that you can adjust your indices according to need. for example two increment to index when you want to store "!!".
You are doing it wrong. You need to store the accumulated changed input in a character array (i.e. char buffer[1024]) and place the substitutions there. With your algorithm, the only thing you are going to print is the last value of userInput variable.
Since this is probable homework I would suggest you to read more about string manipulation in C language.

Counting words in a string?

Hello for this program I am supposed to count the number of words in a string. So far, I have found out how to find the number of characters in a string but am unable to figure out how to turn the letters that make a word, and count it as 1 word.
My function is:
int wordcount( char word[MAX] ){
int i, num, counter, j;
num = strlen( word );
counter = 0;
for (i = 0; i < num; i++)
{
if (word[i] != ' ' || word[i] != '\t' || word[i] != '\v' || word[i] != '\f')
{
}
}
return counter;
}
I tried some variations, but the middle part of the if statement is where I am confused. How can I count the number of words in a string? Testing for this tests if the string has multiple spaces like "Hello this is a string"
Hints only since this is probably homework.
What you're looking to count is the number of transitions between 'word' characters and whitespace. That will require remembering the last character and comparing it to the current one.
If one is whitespace and the other is not, you have a transition.
With more detail, initialise the lastchar to whitespace, then loop over every character in your input. Where the lastchar was whitespace and the current character is not, increase the word count.
Don't forget to copy the current character to lastchar at the end of each loop iteration. And it should hopefully go without saying that the word count should be initialised to 0.
There is a linux util 'wc' that can count words.
have a look (it includes some explanation and a sample):
http://en.literateprograms.org/Word_count_(C)
and a link to the source
http://en.literateprograms.org/index.php?title=Special:DownloadCode/Word_count_(C)&oldid=15634
When you're in the if part, it means you're inside a word. So you can flag this inword and look whether you change from out of word (which would be your else part) to inword and back.
This is a quick suggestion — there could be better ways, but I like this one.
First, be sure to "know" what a word is made of. Let us suppose it's made of letters only. All the rest, being punctuation or "blanks", can be considered as a separator.
Then, your "system" has two states: 1) completing a word, 2) skipping separator(s).
You begin your code with a free run of the skip separator(s) code. Then you enter the "completing a word" state which you will keep until the next separator or the end of the whole string (in this case, you exit). When it happens, you have completed a word, so you increment your word counter by 1, and you go in the "skipping separators" state. And the loop continue.
Pseudo C-like code:
char *str;
/* someone will assign str correctly */
word_count = 0;
state = SKIPPING;
for(c = *str; *str != '\0'; str++)
{
if (state == SKIPPING && can_be_part_of_a_word(c)) {
state = CONSUMING;
/* if you need to accumulate the letters,
here you have to push c somewhere */
}
else if (state == SKIPPING) continue; // unneeded - just to show the logic
else if (state == CONSUMING && can_be_part_of_a_word(c)) {
/* continue accumulating pushing c somewhere
or, if you don't need, ... else if kept as placeholder */
}
else if (state == CONSUMING) {
/* separator found while consuming a word:
the word ended. If you accumulated chars, you can ship
them out as "the word" */
word_count++;
state = SKIPPING;
}
}
// if the state on exit is CONSUMING you need to increment word_count:
// you can rearrange things to avoid this when the loop ends,
// if you don't like it
if (state == CONSUMING) { word_count++; /* plus ship out last word */ }
the function can_be_part_of_a_word returns true if the read char is in [A-Za-z_] for example, false otherwise.
(It should work If I have not done some gross error with the abetment of the tiredness)

Resources