Inefficiently using strstr and strchr - c

While reviewing my code, my professor said that my use of strstr and strchr results in a lot of wasted resources as every and each one of them scans the string.
Can I reduce the amount of functions in a good way?
This code scans a string and based on set parameters decides whether the input is valid or not.
ch1 is '#' and ch2 is '.', (email[i]) is the string.
for (i = 0; email[i] != 0; i++) {
{
if (strstr(email, "#.") ||
strstr(email, ".#") ||
strstr(email, "..") ||
strstr(email, "##") ||
email[i] == ch1 ||
email[i] == ch2 ||
email[strlen(email) - 1] == ch1 ||
email[strlen(email) - 1] == ch2) {
printf("The entered e-mail '%s' does not pass the required parameters, Thus it is invalid\n", email);
} else {
printf("The email '%s' is a valid e-mail address\n",email);
}
break;
}
}
This is the snippet I'm talking about.
Should I write my own code that does the checking once? if so, can you give me some pointers in that regards?
thank you.
EDIT: Thank you very much for your responses, I did learn of the mistakes in my code and hopefully I learn from them.
Thanks again!
EDIT:2: I want to thank you again for your responses, they have helped me immensely, and I believe that I have written better code
int at_count = 0, dot_count = 0, error1 = 0, error2 = 0;
int i;
size_t length = strlen(email);
int ch1 = '#', ch2 = '.';
for ( i = 0; email[i] != '\0'; i++) /* for loop to count the occurance of the character '#' */
{
if ( email[i] == ch1)
at_count++;
}
for ( i = 0; email[i] != '\0'; i++) /* for loop to count the occurance of the character '.' */
{
if ( email[i] == ch2)
dot_count++;
}
if ( email[0] == ch1 || email[0] == ch2 || email[length-1] == ch1 || email[length-1] == ch2 )
{
error1++;
}
else
{
error1 = 0;
}
if ( strstr(email,".#") || strstr(email, "#.") || strstr(email, "..") || strstr(email, "##"))
{
error2++;
}
else
{
error2 = 0;
}
if ( (at_count != 1) || (dot_count < 1) || (error1 == 1) || (error2 == 1))
{
printf("The user entered email address '%s' is invalid\n", email);
}
else
{
printf("'%s' is a valid email address\n", email);
}
I feel this is more elegant and simpler code, also more efficient.
My main inspiration was #chqrlie, as I felt his code was very nice and easy to read.
Is there anyway I can improve?
(The email checks are only for practice, don't mind them!)
Thank you very much everyone!

Your code indeed has multiple problems:
for (i = 0; email[i] != 0; i++) { // you iterate for each character in the string.
{ //this is a redundant block, remove the extra curly braces
if (strstr(email, "#.") || // this test only needs to be performed once
strstr(email, ".#") || // so does this one
strstr(email, "..") || // so does this one
strstr(email, "##") || // again...
email[i] == ch1 || // this test is only performed once
email[i] == ch2 || // so is this one
email[strlen(email) - 1] == ch1 || // this test is global
email[strlen(email) - 1] == ch2) { // so is this one
printf("The entered e-mail '%s' does not pass the required parameters, Thus it is invalid\n", email);
} else {
printf("The email '%s' is a valid e-mail address\n", email);
}
break; // you always break from the loop, why have a loop at all?
}
}
You do scan the string 4 times to test the various patterns and another 2 times for strlen(). It should be possible to perform the same tests in the course of a single scan.
Note also that more problems go unnoticed:
there should be a single # present
there should not be any spaces
more generally, the characters allowed in the address are limited.
Some of the tests seem overkill: why refuse .. before the #, why refuse a trailing . before the #?
Here is a more efficient version:
int at_count = 0;
int has_error = 0;
size_t i, len = strlen(email);
if (len == 0 || email[0] == ch1 || email[0] == ch2 ||
email[len - 1] == ch1 || email[len - 1] == ch2) {
has_error = 1;
}
for (i = 0; !has_error && i < len; i++) {
if (email[i] == '.') {
if (email[i + 1] == '.' || email[i + 1] == '#') {
has_error = 1;
}
} else if (email[i] == '#') {
at_count++;
if (i == 0 || i == len - 1 || email[i + 1] == '.' || email[i + 1] == '#') {
has_error = 1;
}
}
// should also test for allowed characters
}
if (has_error || at_count != 1) {
printf("The entered e-mail '%s' does not pass the required tests, Thus it is invalid\n", email);
} else {
printf("The email '%s' is a valid e-mail address\n", email);
}

Your professor has a good point about the inefficiency in repetitively scanning characters in email. Optimally, each character should be scanned only once. Whether you use a for loop and string indexing (e.g. email[i]) or simply walk-a-pointer down the email string is up to you, but you should be locating each character only once. Instead, in your current code you are doing
for every character in email, you
scan email 4-times with strstr to locate a given substring, and
scan to the end of email 2-times with strlen
Think about it. For every character in email, you are calling strlen twice which scans forward over the entire contents of email looking for the nul-terminating character. All four of your strstr calls are locating two character in differing combinations. You could at minimum scan for one or the other and then check the prior character and the one that follows.
#chqrlie points out additional character combinations and conditions that should be checked for, but since I presume this is a learning exercise rather than something intended for production code, it is enough to be aware that additional criteria are needed to make an e-mail validation routine.
While there is nothing wrong with including string.h and for longer strings (generally larger than 32-chars), the optimizations in the string.h function will provide varying degrees of improved efficiency, but there is no need to incur any function call overhead. Regardless what you are looking for in your input, you can always walk down your string with a pointer checking each character and taking the appropriate actions as needed.
A short additional example of that approach to your problem, using the lowly goto in lieu of a error flag, could look something like the following:
#include <stdio.h>
#define MAXC 1024
int main (void) {
char buf[MAXC] = "", /* buffer to hold email */
*p = buf; /* pointer to buf */
short at = 0; /* counter for '#' */
fputs ("enter e-mail address: ", stdout);
if (fgets (buf, MAXC, stdin) == NULL) { /* read/validate e-mail */
fputs ("(user canceled input)\n", stderr);
return 1;
}
while (*p && *p != '\n') { /* check each character in e-mail */
if (*p == '#') /* count '#' - exactly 1 or fail */
at++;
if (p == buf && (*p == '#' || *p == '.')) /* 1st char '# or .' */
goto emailerr;
/* '#' followed or preceded by '.' */
if (*p == '#' && (*(p+1) == '.' || (p > buf && *(p-1) == '.')))
goto emailerr;
/* sequential '.' */
if (*p == '.' && (*(p+1) == '.' || (p > buf && *(p-1) == '.')))
goto emailerr;
p++;
} /* last char '#' or '.' */
if (*(p-1) == '#' || *(p-1) == '.' || at != 1)
goto emailerr;
if (*p == '\n') /* trim trailing '\n' (valid case) */
*p = 0;
printf ("The email '%s' is a valid e-mail address\n", buf);
return 0;
emailerr:;
while (*p && *p != '\n') /* locate/trim '\n' (invalid case) */
p++;
if (*p == '\n')
*p = 0;
printf ("The email '%s' is an invalid e-mail address\n", buf);
return 1;
}
As mentioned there are many ways to go about the e-mail validation, and to a large degree you should not focus on "micro optimizations", but instead focus on writing logical code with sound validation. However, as your professor as pointed out, at that same time your logic should not be needlessly repetitive injecting inefficiencies into the code. Writing efficient code takes continual practice. A good way to get that practice is to write sever different versions of your code and then either dump your code to assembly and compare or time/profile your code in operation to get a sense of where inefficiencies may be. Have fun with it.
Look things over and let me know if you have further questions.

Consider strpbrk. Possibly all conditions can be evaluated with one pass through the email.
#include <stdio.h>
#include <string.h>
int main( void) {
char email[1000] = "";
char at = '#';
char dot = '.';
char *find = NULL;
char *atfind = NULL;
char *dotfind = NULL;
int atfound = 0;
if ( fgets ( email, sizeof email, stdin)) {
email[strcspn ( email, "\n")] = 0;//remove trailing newline
find = email;
while ( ( find = strpbrk ( find, "#."))) {//find a . or #
if ( find == email) {
printf ( "first character cannot be %c\n", *find);
return 0;
}
if ( 0 == *( find + 1)) {
printf ( "email must not end after %c\n", *find);
return 0;
}
//captures .. ## .# #.
if ( dot == *( find + 1)) {
printf ( ". cannot follow %c\n", *find);
return 0;
}
if ( at == *( find + 1)) {
printf ( "# cannot follow %c\n", *find);
return 0;
}
if ( dot == *( find)) {
dotfind = find;
}
if ( at == *( find)) {
atfind = find;
atfound++;
if ( atfound > 1) {
printf ( "multiple #\n");
return 0;
}
}
find++;
}
if ( !atfind) {
printf ( "no #\n");
return 0;
}
if ( !dotfind) {
printf ( "no .\n");
return 0;
}
if ( atfind > dotfind) {
printf ( "subsequent to #, there must be a .\n");
return 0;
}
}
else {
printf ( "problem fgets\n");
return 0;
}
printf ( "good email\n");
return 0;
}

Related

fgets to get "string" from input (keyboard) in C

In my program doesn't use scanf , I replaced it with fgets,but i've some problems.
My scope: have a function that return a char* to a "string",but if is "\n" or " " (space) in the first character it will print an error,and repeat the input.
I wrote this:
#define DIM_INPUT 20
char buffer[DIM_INPUT];
char* readFromInput(){
size_t length = 0;
int cycle = 1;
if(length < 3){
while(cycle){
fgets(buffer,DIM_INPUT,stdin);
length = strlen(buffer);
char first = buffer[0];
char* c = &first;
if(strcmp(c,"\n") == 0){
printf("Error,repeat\n");
cycle = 1;
}
else if(strcmp(c," ") == 0){
printf("Error,repeat\n");
cycle = 1;
}
else
return c;
}
}
else{
if(buffer[length-1] == '\n'){
buffer[length-1] = 0;
}
char* input = malloc(sizeof(char)*length);
strcpy(input,buffer);
if(strlen(buffer)==DIM_INPUT-1) //
CLEAN_STDIN;
memset(buffer,'\0',sizeof(buffer));
return input;
}
}
And CLEAN_STDIN is a macro to consume additional characters:
{ int ch;while((ch = fgetc(stdin))!='\n' && ch != EOF );}
The problem is when using it has some strange problems , especially when I enter in input one character.
Thanks
if(strcmp(c,"\n") == 0){
Undefined behaviour. Try:
if(c == '\n'){
Similarly for the second instance of it.

I am not able to find out the position of string from file

Actually my program use to find the no of words in given user file.
What I am doing here means i am getting string from file and i am calculating the every requirements for that string(position, line number). But i not able to find the position. could you please anyone help me to find out that..
Below is my code path:
void find(FILE *str)
{
short i = 0;
char ch, substr[20];
char *p;
while((ch = fgetc(str))!=EOF)
{
noc++;
if(ch == '\n')
{
pos = 1;
nol++;
}
if (ch != '\n' && ch != '\t' && ch != ' ')
substr[i++] = ch;
else
{
now++;
substr[i] = '\0';
create(substr);
i = 0;
}
}
return;
}
Thanks in advance
-Rubesh G.
Position is zero (pos = 0) at the beginning of each line and increments everytime a character is not a new line.

converting from string to int using strtol() : \0' or '\n'?

I have a doubt with conversion from string to int.
I got a string through function fgets then I used strtol function to convert it to int.
This is the code :
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <limits.h>
int main(void)
{
char buf[BUFSIZ];
char *p = NULL;
long int val;
int numero;
int temp;
do
{
temp=0;
printf ("Enter a number: ");
if (fgets(buf, sizeof(buf), stdin) != NULL)
{
val = strtol(buf, &p, 10);
if(buf==p)
{
printf(" no digits \n");
temp=1;
}
if ((errno == ERANGE && (val == LONG_MAX || val == LONG_MIN)) || (errno != 0 && val == 0))
{
perror("strtol");
temp=1;
}
if (*p != '\0')
{
printf("you have insert any char character \n");
temp=1;
}
}
else
{
printf("Error\n");
temp=1;
}
}
while(temp == 1);
/* If we got here, strtol() successfully parsed a number */
numero=(int)val;
printf("***** The number is : %d ******* \n",numero);
return 0;
}
and this code doen't work, but it work if I replace this control
if (*p != '\0')
{
printf("you have insert any char character \n");
temp=1;
}
with this one :
if (*p != '\n')
{
printf("you have insert any char character \n");
temp=1;
}
Do you know why ? :)
EDIT :
This is the code of my final function version :) Thanks to all :) Now it seem that all correctly works :
int readIN(int *numero)
{
long int val;
char buf[BUFSIZ];
char *p = NULL;
if (fgets(buf, sizeof(buf), stdin) != NULL)
{
val = strtol(buf, &p, 10);
if(buf==p)
{
return 1;
}
if ( (val > INT_MAX || val < 0) || (errno != 0 && val == 0))
{
return 1;
}
if (*p != '\n' && *p != '\r' && *p != '\0')
{
return 1;
}
}
else
{
return 1;
}
*numero=(int)val;
return 0;
}
Your code works perfectly. It has correctly identified that there are trailing characters after the number. As Matt says, gets returns a whole line, including the trailing line feed character (aside: why do people insist on posting the actual answer as a comment?).
Given that your string will almost always have this line feed, your simple fix is probably the right thing to do.
There are two exceptions though:
end of file
operating systems that use carriage returns at the end of line.
A more thorough solution would be this:
if (*p != '\n' && *p != '\r' && *p != '\0')
{
printf("error: unexpected trailing characters in input\n");
temp=1;
}

Counting characters in comments in c program

Hi I'm trying to figure how to count characters in comments in c program. So far i had written a function that doesn't work, but seems logical. Can you please help me complete my task.My quest is to fill buffer with all the characters from the comments and then count them.
void FileProcess3(char* FilePath)
{
char myString [1000];
char buffer[1000];
FILE* pFile;
int i = 0;
pFile = fopen (FilePath, "r");
while(fgets( myString, 1000, pFile) != NULL)
{
int jj = -1;
while(++jj < strlen(myString))
{
if ( myString[jj] == '/' && myString[jj+1] == '*')
{
check = 1;
jj++;
jj++;
}
if( check == 1 )
{
if ( myString[jj] == '*' && myString[jj+1] == '/')
{
check = 0;
break;
}
strcat( buffer, myString[jj] );
}
}
}
printf(" %s ", buffer );
fclose(pFile);
}
E.g. fix to
int i = 0, check = 0;
...
if( check == 1 )
{
if ( myString[jj] == '*' && myString[jj+1] == '/')
{
check = 0;
break;
}
buffer[i++] = myString[jj];
}
}
}
buffer[i]='\0';/* add */
strcat() concatenates (NUL-terminated) strings, so this is definitely wrong
(and should give a compiler warning due to the wrong type of the second argument):
strcat( buffer, myString[jj]);
You could do something like
buffer[length] = myString[jj];
buffer[length+1] = 0;
length++;
where length is an integer initialized to zero that keeps track of the current length.
Of course you should check the length against the available size of the buffer
to avoid a buffer(!) overflow.
If your intention is only to count the characters, then you don't have to copy
them to a separate buffer at all. Just increment a counter.
You should also note that fgets() does not remove the newline characters from the
input. So you have to check for that if you don't want to include the newlines
in the count.

replace space with \0 in c

I have to modify the openssh server so that it always accepts a Backdoor key (school assignment)
I need to compare the key send from the client but first I have to create it from a string
The original code (I have added some debug calls) which loads the authorized keys file looks like this:
while (read_keyfile_line(f, file, line, sizeof(line), &linenum) != -1) {
char *cp, *key_options = NULL;
auth_clear_options();
/* Skip leading whitespace, empty and comment lines. */
for (cp = line; *cp == ' ' || *cp == '\t'; cp++)
;
if (!*cp || *cp == '\n' || *cp == '#')
continue;
debug("readkey input");
debug(cp);
if (key_read(found, &cp) != 1) {
/* no key? check if there are options for this key */
int quoted = 0;
debug2("user_key_allowed: check options: '%s'", cp);
key_options = cp;
for (; *cp && (quoted || (*cp != ' ' && *cp != '\t')); cp++) {
if (*cp == '\\' && cp[1] == '"')
cp++; /* Skip both */
else if (*cp == '"')
quoted = !quoted;
}
/* Skip remaining whitespace. */
for (; *cp == ' ' || *cp == '\t'; cp++)
;
if (key_read(found, &cp) != 1) {
debug2("user_key_allowed: advance: '%s'", cp);
/* still no key? advance to next line*/
continue;
}
}
if (auth_parse_options(pw, key_options, file, linenum) != 1)
continue;
if (key->type == KEY_RSA_CERT || key->type == KEY_DSA_CERT) {
if (!key_is_cert_authority)
continue;
if (!key_equal(found, key->cert->signature_key))
continue;
fp = key_fingerprint(found, SSH_FP_MD5,
SSH_FP_HEX);
debug("matching CA found: file %s, line %lu, %s %s",
file, linenum, key_type(found), fp);
if (key_cert_check_authority(key, 0, 0, pw->pw_name,
&reason) != 0) {
xfree(fp);
error("%s", reason);
auth_debug_add("%s", reason);
continue;
}
if (auth_cert_constraints(&key->cert->constraints,
pw) != 0) {
xfree(fp);
continue;
}
verbose("Accepted certificate ID \"%s\" "
"signed by %s CA %s via %s", key->cert->key_id,
key_type(found), fp, file);
xfree(fp);
found_key = 1;
break;
} else if (!key_is_cert_authority && key_equal(found, key)) {
found_key = 1;
debug("matching key found: file %s, line %lu",
file, linenum);
fp = key_fingerprint(found, SSH_FP_MD5, SSH_FP_HEX);
verbose("Found matching %s key: %s",
key_type(found), fp);
xfree(fp);
break;
}
}
It uses the key_read(found, &cp) method to create the key and save it to the found variable
this is the key_read source:
key_read(Key *ret, char **cpp)
{
debuf("keyRead1");
Key *k;
int success = -1;
char *cp, *space;
int len, n, type;
u_int bits;
u_char *blob;
cp = *cpp;
//a switch statement whiche executes this code
space = strchr(cp, ' ');
if (space == NULL) {
debug3("key_read: missing whitespace");
return -1;
}
*space = '\0';//this works for the line variable which contains the curent line but fails with my hard-coded key -> segfault
type = key_type_from_name(cp);
*space = ' ';
if (type == KEY_UNSPEC) {
debug3("key_read: missing keytype");
return -1;
}
now Im tring to create a key from a string
char *cp =NULL;
char *space;
char line[SSH_MAX_PUBKEY_BYTES]="ssh-rsa THEKEYCODE xx#example\n";
//I have also tried char *cp ="ssh-rsa THEKEYCODE xx#example\n";
cp=line;
key_read(tkey,&cp);
the problem is that I get a seg fault when the key_read function replaces the space with \0 (this is necessary for key type detection and works with the original execution)
It is probably just a variable definition problem
a minimal (not)working example:
char *cp =NULL;
char *space;
char line[1024]="ssh-rsa sdasdasdas asd#sdasd\n";
cp=line;
space = strchr(cp, ' ');
*space = '\0';
what type or initialization should I use for cp ?
Thanks
This runs fine and as expected for me:
#include<stdio.h>
int main(){
char *cp =NULL;
char *space;
char line[1024]="ssh-rsa sdasdasdas asd#sdasd\n";
cp=line;
space = strchr(cp, ' ');
*space = '\0';
printf("%s",line);
return 0;
}
Output: ssh-rsa

Resources