Implementing strnstr - c

I am trying to implement a strnstr function into C (strstr but it checks the length), for some reason it doesn't work (output is always no):
#include <stdio.h>
char *searchingFor = "stackdummy";
char *in = "la da\ndoo a da\nnow here comes the stack\nok there it was.\n";
char *strnstr(char *s1, char *s2, int length) {
if(s1 == NULL || s2 == NULL) return NULL;
printf("searching \n\n\"%s\"\n for %.*s\n", s1, length, s2);
char *ss1 = malloc(strlen(s1) + 1);
strcpy(ss1, s1);
char *ss2 = malloc(length + 1);
strncpy(ss2, s2, length);
char *result = strstr(ss1, ss2);
free(ss1);
free(ss2);
return result;
}
int main(void) {
printf("found: %s\n", strnstr(in, searchingFor, 5) ? "yes" : "no");
printf("found: %s\n", strnstr(in, searchingFor, 5) ? "yes" : "no");
printf("found: %s\n", strnstr(in, searchingFor, 5) ? "yes" : "no");
return 0;
}

The implementation provided by Chris Dodd has the following disadvantages:
It defeats the purpose of strnstr in that the while condition uses the unbounded string function strchr
It depends on haystack being NULL terminated, which is a deviation from the usual implementation of strnstr, for example as provided by GNU-Darwin
The call to strchr is an unnecessary function call when strchar is not inlined
Returns haystack instead of NULL when len is zero, a deviation from the accepted strstr semantics
Returns an empty string instead of haystack when needle has length of zero
The following implementation remedies the above problems without becoming as difficult to read as the GNU-Darwin implementation, and is Creative Commons licensed:
#include <string.h>
char *strnstr(const char *haystack, const char *needle, size_t len)
{
int i;
size_t needle_len;
if (0 == (needle_len = strnlen(needle, len)))
return (char *)haystack;
for (i=0; i<=(int)(len-needle_len); i++)
{
if ((haystack[0] == needle[0]) &&
(0 == strncmp(haystack, needle, needle_len)))
return (char *)haystack;
haystack++;
}
return NULL;
}

How about:
char *strnstr(char *haystack, char *needle, size_t len) {
if (len == 0) return haystack; /* degenerate edge case */
while (haystack = strchr(haystack, needle[0])) {
if (!strncmp(haystack, needle, len)) return haystack;
haystack++; }
return 0;
}
If you want haystack to not be null terminated, you'll need two length args:
char *memmem(char *haystack, size_t hlen, char *needle, size_t nlen) {
if (nlen == 0) return haystack; /* degenerate edge case */
if (hlen < nlen) return 0; /* another degenerate edge case */
char *hlimit = haystack + hlen - nlen + 1;
while (haystack = memchr(haystack, needle[0], hlimit-haystack)) {
if (!memcmp(haystack, needle, nlen)) return haystack;
haystack++; }
return 0;
}
which is available in GNU libc, though older versions are broken.

The strnstr function is not defined in the C Standard, it is available on BSD and some other systems as an extension.
Here is the man page on OS/X:
NAME
strstr, strcasestr, strnstr -- locate a substring in a string
LIBRARY
Standard C Library (libc, -lc)
SYNOPSIS
#include <string.h>
[...]
char *strnstr(const char *haystack, const char *needle, size_t len);
[...]
DESCRIPTION
[...]
The strnstr() function locates the first occurrence of the null-terminated string needle in the string haystack, where not more
than len characters are searched. Characters that appear after a '\0' character are not searched. Since the strnstr() function
is a FreeBSD specific API, it should only be used when portability is not a concern.
RETURN VALUES
If needle is an empty string, haystack is returned; if needle occurs nowhere in haystack, NULL is returned; otherwise a pointer
to the first character of the first occurrence of needle is returned.
EXAMPLES
The following sets the pointer ptr to the "Bar Baz" portion of largestring:
const char *largestring = "Foo Bar Baz";
const char *smallstring = "Bar";
char *ptr;
ptr = strstr(largestring, smallstring);
The following sets the pointer ptr to NULL, because only the first 4 characters of largestring are searched:
const char *largestring = "Foo Bar Baz";
const char *smallstring = "Bar";
char *ptr;
ptr = strnstr(largestring, smallstring, 4);
This specification is not concise enough, (the man page for the linux kernel version is even more imprecise), yet the example on BSD systems (notably above here) is clear: len is the maximum number of bytes to consider in haystack, not needle, which is just a regular null terminated C string.
Your function does not work for multiple reasons:
the semantics are incorrect as you consider length to limit s2 instead of s1
in your approach, duplicating s1 is useless and counter-productive: result, if non NULL, will point into the allocated copy that is freed before returning from the function, hence accessing the string pointed to by the return value will have undefined behavior.
strncpy does not null terminate the destination array if the source string has at least length characters before its own null terminator. You must set ss2[length] = '\0'; for your approach to work, but again, the real strnstr() function operates differently.
using malloc() and free() is probably not what you are expected to do, and failing to test for potential allocation failure is a mistake.
Here is a corrected version:
char *strnstr(const char *s1, const char *s2, size_t n) {
// simplistic algorithm with O(n2) worst case
size_t i, len;
char c = *s2;
if (c == '\0')
return (char *)s1;
for (len = strlen(s2); len <= n; n--, s1++) {
if (*s1 == c) {
for (i = 1;; i++) {
if (i == len)
return (char *)s1;
if (s1[i] != s2[i])
break;
}
}
}
return NULL;
}

Related

Check if string is reverse substring

I need to write function which will check if string s2 is reverse substring of string s1, and return 1 if the condition is true. Function should be made using pointer arithmetic.
For example:
char s1[] = "abcdef";
char s2[] = "edc";
Function would return 1 because string s1 contains reverse string s2.
#include <stdio.h>
int reverseSubstring(const char *s1, const char *s2) {
while (*s1 != '\0') {
const char *p = s1;
const char *q = s2;
while (*p++ == *q++)
if (*q == '\0')
return 1;
s1++;
}
return 0;
}
int main() {
char s1[] = "abcdef";
char s2[] = "edc";
printf("%d", reverseSubstring(s1, s2));
return 0;
}
This function does the opposite. It checks if the string is substring and in my case it returns 0. It should return 1. How to modify this to work?
Note: it is not allowed to use functions from the string.h, stdlib.h libraries, nor the sprintf and sscanf functions from the stdio.h library. It is not allowed to create auxiliary strings or strings in function or globally.
Slight modification to parts of your code:
const char *q = s2;
while(*q) q++; // Make q point to end of string
while (*p++ == *--q) // Decrement instead of incrementing
if (q == s2) // If we have traversed complete substring
return 1;
This is most likely good enough for a school task, which this most likely is. But it might be good to know that the operation q-- will invoke undefined behavior for an empty string because it will make q point to the element before the string. That can be easily fixed. Just add if(*s2 == '\0') return 1; in the beginning of the function because an empty string is a substring of every string.
For completeness, here is a full version with some small fixes and optimizations. I also took the liberty of replacing a while loop with the library function strlen even if it was forbidden in the task. After all, the while loop is described above, and strlen is easy to implement on your own.
const char *
reverseSubstring(const char *s1, const char *s2) {
if(*s2 == '\0') return s1;
const char *end = s2 + strlen(s2);
while (*s1 != '\0') {
const char *p = s1;
const char *q = end;
while (*p++ == *--q)) {
if (q == s2) return s1;
}
s1++;
}
return NULL;
}
Note that I changed the return type. It returns NULL if no match is found, but if a match is found, it returns a pointer to the first match. That means it contains more information for free.

string length in c without using string library

char* str =
"\
a-a-a-a\
differing the text, because that was the lecture thing\
the text has been changed\
I know!\
the text has been changed\
";
i deeply thinking about this for hours but can`t figure it out..
with using only stdio.h
string.h is not allowed, but using only basic things..
how can I get string length? someone please help me.
the goal is to find frequency of input pattern in a given string
ex) ha => 2, di => 1..
help me.
As for length of string, the implementation of strlen isn't very complicated.
All you should do is to loop over the string until you find a \0 (end of string) and count the number of times you looped.
unsigned int mystrlen(const char* str)
{
unsigned int length = 0;
while (*str != 0)
{
str++;
length++;
}
return length;
}
This could be shortened into
unsigned int len = 0;
for (; str[len]; len++);
A string in pure C is just a pointer to a memory.
IF the last element is 0, then you can use strlen or whatever checks for that.
But if that is not the case you need to memorize the length in a variable.
So if it is 0-terminated just loop to the first element that is 0 (not '0') and thats the end. If you counted the elements you have the string-length.
This works for some test input string, but i higly recommend to check it with more cases.
Suppose we have implemented strstr().
strstr()
Is a C library function from string.h Library
char *strstr(const char *haystack, const char *needle)
This function finds the first occurrence of the substring needle in the source string haystack.
The terminating \0 characters are not compared.
source: TutorialsPoint
(with some edition)
Code
#include <stdio.h>
#include <string.h>
unsigned int Neostrlen(const char* str)
{
unsigned int length = 0;
while (*str != 0)
{
str++;
length++;
}
return length;
}
int BluePIXYstrlen(char* str)
{
int len = 0;
sscanf(str, "%*[^0]%n", &len);
return len;
}
int Jeanfransvastrlen(char* str)
{
int i;
for (i=0;str[i];i++);
return i;
}
int main(int argc, char **argv){
//is it true, no need to malloc????
char* str =
"\
P-P-A-P\
I have a pen, I have a apple\
Uh! Apple-Pen!\
I have a pen, I have pineapple\
Uh! Pineapple-Pen!\
Apple-Pen, Pineapple-Pen\
Uh! Pen-Pineapple-Apple-Pen\
Pen-Pineapple-Apple-Pen\
";
printf("len: %d\n", Jeanfransvastrlen(str));
printf("len: %d\n", Neostrlen(str));
printf("len: %d\n", BluePIXYstrlen(str));
printf("sss:%s\n\n\n", str);
char * search = "have";//search for this substring
int lenSr= Neostrlen(search);
printf("lenSr: %d\n", lenSr);
char * ret;
ret = strstr(str, search);
int count = 0;
while (ret){
//printf("The substring is: %s\n\n\n\n", ret);
count++;
for (int i=0;i<lenSr;i++){
printf("%c", ret[i]);
}
printf("\nEnd sub\n");
for (int i=0;i<lenSr;i++){
ret++;
}
ret = strstr(ret, search);
}
printf("count: %d\n", count);
return 0;
}
Edited
For only using stdio.h you can substitute all strstr() with this version of mystrstr() adopted from leetcode
mystrstr()
char* mystrstr(char *str, const char *target) {
if (!*target) {
return str;
}
char *p1 = (char*)str;
while (*p1) {
char *p1Begin = p1, *p2 = (char*)target;
while (*p1 && *p2 && *p1 == *p2) {
p1++;
p2++;
}
if (!*p2){
return p1Begin;
}
p1 = p1Begin + 1;
}
return NULL;
}
Hint
I removed const from first first argument of mystrstr() because of I want to change it later, and this is the only changed i have made on original code.
This version is sensitive to Uppercase and lowercase letters in string,
for example Apple is differ from apple.
As chux said in comments my code return substrings of "ababa" from source
"aba" only {aba} not more. and this is because i change string pointer inside while in last for.
Suggestion
Try to implement your version of strstr(), and strlen()

Standard way to do strcmp with alternative terminating characters?

So I have a string like so:
char* line="foo bar\tbaz\nbiz";
I want to parse this string one word at a time, doing different things depending on the word. Something like this:
void process_word(const char* str) //str will point to the beginning of a word
{
if(!strcmp(str, "foo")){
foo();
}else if(!strcmp(str, "bar")){
bar();
}else{
others();
}
}
How would I do this so that I can use something like strcmp so that instead of it stopping comparison at \0, instead stopping at a space or other whitespace character?
Bonus points for no new buffer creation or overkill solutions like regular expressions
This would just check the pattern, regardless of the word termination character
int mystrncmp(const char *line, const char *pattern)
{
int len = strlen(pattern);
return strncmp(line,pattern,len);
}
#include <stdio.h>
#include <string.h>
#define WHITE_SPACE " \t\n"
int strcmp2ws(const char *s1, const char *s2){
size_t len1, len2;
len1 = strcspn(s1, WHITE_SPACE);
len2 = strcspn(s2, WHITE_SPACE);
return strncmp(s1, s2, (len1 <= len2) ? len2 : len1);
}
int main(void){
char* line="foo bar\tbaz\nbiz";
//test
if(!strcmp2ws(line, "foo"))
printf("foo\n");
if(!strcmp2ws(strchr(line, ' ') + 1, "bar"))
printf("bar\n");
if(!strcmp2ws(strchr(line, '\t') + 1, "baz"))
printf("baz\n");
if(!strcmp2ws(strchr(line, '\n') + 1, "biz"))
printf("biz\n");
return 0;
}
The function strcspn gives the length of a string up to a set of "reject" characters. Combining that with strncmp should give you what you want.
#include <string.h>
const char delimiters = " \t\n";
void process_word(const char* str, size_t len) //str will point to the beginning of a word
{
if(!strncmp(str, "foo", len)){
foo();
}else if(!strncmp(str, "bar", len)){
bar();
}else{
others();
}
}
void parse(const char *s)
{
size_t i, len;
i = 0;
while (s[i] != '\0') {
len = strcspn(&s[i], delimiters);
process_word(&s[i], len);
i += len;
}
}
Here's a possible solution:
const char* next_word_is(const char* haystack, const char* needle) {
size_t len = strlen(needle);
const char* end = &haystack[len];
if (strncmp(haystack, needle, len) == 0 && (!*end || isspace(*end)))
return end;
return NULL;
}
If you call that with a string literal as the second argument, it will probably get inlined, and the strlen() call will be eliminated. Returning end as a success indicator may not be ideal; depending on your needs, you might want to advance to the beginning of the next word instead.

A proper way of comparing unequal char arrays in c

Is there a proper way of comparing two char-arrays if they aren't equal by length?
How to check which character isn't equal?
strcmp seems to give me only bigger or lesser number, not the position of unequal character.
For example, strings:
/home/jjjj/ and
/home/jjjj/kkkk/asdasd
Should return 12
Using strlen() and strstr() you can achieves this in a two-step approach:
#include <string.h>
#include <stdio.h>
...
char str1[] = "this is a long string";
char str2[] = "long";
{
char * ss = NULL;
char * sg = NULL;
size_t size1 = strlen(str1)
size_t size2 = strlen(str2);
size_t size_ss = 0;
/* step 1: determine which of the two strings tobe compared it the smaller/greater one. */
if (size1 > size2)
{
size_ss = size2;
ss = str2;
sg = str1;
}
else
{
size_ss = size1;
ss = str1;
sg = str2;
}
/* step 2: find out where the smaller string is located in the greater one, if ever... */
{
char * p = strstr(sg, ss);
if (p)
{
printf("'%s' is the same as '%s' from character %zu to character %zu.\n",
sg, ss, p - sg, p - sg + size_ss);
}
else
{
/* printf("The strings are 100%% differently!\n"); */ /* changed as per Jonathan's comment. */
printf("'%s' does not appear in '%s'.\n", ss, sg);
}
}
}
This solution does not take into account that the shorter string could appear more than once in the longer string. It always notifies about the first occurrence.
There isn't a standard C function that returns the first point of discrepancy between two strings.
It wouldn't be hard to create one; take a version of strcmp() from a text book and modify it so that returns the offset of the strings at the point where the result is 'interesting'. If the strings are equal, that will be the offset of the null terminator ('\0'); otherwise, it will be the offset where the two strings are different.
Maybe something like this:
const char* strcmp_plusplus (const char* str1, const char* str2)
{
const char* result = NULL; // return NULL if equal
while(*str1 != '\0')
{
if(*str1 != *str2)
{
result = str1; // point at where in str1 they are different
break;
}
str1++;
str2++;
}
return result;
}
Note that we won't have to check if str2 is \0, because the C standard allows us to read one element beyond an array without invoking undefined behavior. If str2 ends before str1, the function will return a pointer to str1's null termination.
This function attempts to do it all at once. Since a function can only return one value, one of the resulting values (the difference) has to be passed back to the caller via a pointer to it.
#include <stdio.h>
size_t lead_cmp( const char * one, const char * two, int *result);
size_t lead_cmp( const char * one, const char * two, int *result)
{
size_t pos;
for(pos=0; one[pos] && two[pos]; pos++) {
if (one[pos] != two[pos]) break;
}
*result = one[pos] - two[pos];
return pos;
}
int main(int argc, char **argv)
{
size_t len;
int diff;
len = lead_cmp (argv[1], argv[2], &diff );
printf( "Pos=%zu, Rc=%d\n", len, diff);
return 0;
}
Result:
$ ./a.out /home/jjjj/ /home/jjjj/kkkk/
Pos=11, Rc=-107
$
The found position is 11, not 12, since C uses 0-based indexing.
It returns the number of matching characters: the length of the common prefix.

strstr through pointers in c language

is this the standard code for strstr i made????
char* fstrset(char *s,char *t)
{
int b, i=0,j=0;
while(*(s+i)!='\0')
{
if(*(t+j)=='\0')
break;
else if(*(s+i)==*(t+j))
{
i++;j++;b=1;
}
else
{ i++;b=0;j=0;
}
}
if(b==0)
return((char*)NULL);
else if(b==1)
return(s+i-j);
}
This is all the standard has to say about it:
7.21.5.7 The strstr function
Synopsis
#include <string.h>
char *strstr(const char *s1, const char *s2);
Description
The strstr function locates the first
occurrence in the string pointed to by
s1 of the sequence of characters
(excluding the terminating null
character) in the string pointed to by
s2.
Returns
The strstr function
returns a pointer to the located
string, or a null pointer if the
string is not found. If s2 points to a
string with zero length, the function
returns s1.
So, it looks like you're missing const qualifiers on arguments.
As for style, note that *(ptr+index) can be replaced by ptr[index], and size_t is the best type to use for indexing a pointer.
As for being a common way to implement it, compare with GCC's code:
char *
strstr (const char *s1, const char *s2)
{
const char *p = s1;
const size_t len = strlen (s2);
for (; (p = strchr (p, *s2)) != 0; p++)
{
if (strncmp (p, s2, len) == 0)
return (char *)p;
}
return (0);
}
Your code is buggy. Given:
char *haystack = "fififi-trixabelle";
char *needle = "fifi-trixabelle";
fstrset(haystack, needle) returns incorrectly returns NULL.
Besides the bug mentioned by caf there are others:
1) Uninitialized b. If s points to '\0', closing brace may be reached, omitting any return statements.
2) If characters match up to the end of string pointed to by s there is no check if the string pointed to by t ends too.
What does this do? It looks like gibberish. Why adding pointers, and mixing them with ints? Sorry, but the whole thing doesn't make sense.
And to answer your question, i don't think so. But if you compile it and it runs, then yes.
Okay, your code does make sense when you look at it closer. Yes, it does look like it will compile, if thats what you mean by standard code.
inline char* strstr(char* __s1, const char* __s2)
{
return __builtin_strstr(const_cast<const char*>(__s1), __s2);
}
a quick read through seems to show that the code works (there are probably edge cases that dont work). You tell us, does it work?
But why do it? just call strstr
There is no 'standard code', just the standard result.
It is unlikely that any implementation in a standard C library uses array indexing, so it is unlikely that your code matches any implementation in line-by-line detail.
char* fstrstr(char *s1,char *s2)
{
int i=0,flag=0;
char *s4,*s3;
// s4 for retaining the value of s2
s4 = s2;
while(*s1 != '\0' && *s2 != '\0')
{
if(*s1 == *s2)
{
*(s3+i) = *s1;
s2++;
s1++;
i++;
flag = 1;
}
else
{
i = 0;
s1++;
// Initialize s2 again from its address
s2 = s4;
flag = 0;
}
}
if(flag == 1)
{
while(*s1 != '\0')
{
*(s3+i) = *s1;
i++;
s1++;
}
*(s3+i) = '\0';
}
if(flag == 1)
return (s3);
if(flag==0)
{
*s3 = NULL;
return (s3);
}
}
There is no "standard code", only standard results.
It is unlikely that any implementation in the standard C library will use array indexes, so your code is unlikely to match any implementation in the line implementation.
char *strstr(const char *s1, const char *s2) {
char *a = s1, *b = s2;
for (;;)
if (!*b) return (char *)s1;
else if (!*a) return NULL;
else if (*a++ != *b++) {a = ++s1; b = s2;}
}

Resources