memcmp C implementation - any logical errors with this one - c

memcmp C implementation - any logical errors with this one?
I was going looking for an implementation of memcmp(), I found this code snippet, but it is clearly marked that there is 1 logical error with the code snippet. Could you help me find the logical error.
Basically, I tested this code against the string.h library implementation of memcmp() with different inputs, but the expected output is always the same as the library version of the function.
Here is the code snippet:
#include <stdio.h>
#include <string.h>
int memcmp_test(const char *cs, const char *ct, size_t n)
{
size_t i;
for (i = 0; i < n; i++, cs++, ct++)
{
if (*cs < *ct)
{
return -1;
}
else if (*cs > *ct)
{
return 1;
}
else
{
return 0;
}
}
}
int main()
{
int ret_val = 20; //initialize with non-zero value
char *string1 = "china";
char *string2 = "korea";
ret_val = memcmp_test(string1,string2,5);
printf ("ret_val is = %d",ret_val);
getchar();
return 0;
}
I ran the program with the two example strings and the program would return just after the comparison of the first characters of the two strings. ret_val is -1 in the above case.
The definition of memcmp() which the above code snippet should conform to is:
The definition of the ā€˜Cā€™ Library function memcmp is
int memcmp(const char *cs, const char *ct, size_t n)
Compare the first n characters of cs with the first n characters of ct.
Return < 0 if cs < ct.
Return > 0 if cs > ct.
Return 0 if cs == ct.
There is definitely on LOGICAL error, could you help me find it.

As it's written now, this code will only test the first byte of the inputs. The else return 0 needs to be moved out of the loop, leaving return 0 at the end:
for (i = 0; i < n; i++, cs++, ct++)
{
if (*cs < *ct)
{
return -1;
}
else if (*cs > *ct)
{
return 1;
}
}
return 0;
}

I guess since char signedness is implementation defined, you could make your comparison unsigned:
int memcmp_test(const char *cs_in, const char *ct_in, size_t n)
{
size_t i;
const unsigned char * cs = (const unsigned char*) cs_in;
const unsigned char * ct = (const unsigned char*) ct_in;
for (i = 0; i < n; i++, cs++, ct++)
{
if (*cs < *ct)
{
return -1;
}
else if (*cs > *ct)
{
return 1;
}
}
return 0;
}

Look at your for loop. It's only examining one character.

Strictly speaking the signature is wrong. The correct one is:
int memcmp(const void *s1, const void *s2, size_t n);
Your code compares c and k and on finding that c is less than k dutifully returns a -1. However, if these two were equal you would get an incorrect result since you are returning early.
If you read the documentation you'd find:
The sign of a non-zero return value shall be determined by the sign of the difference between the values of the first pair of bytes (both interpreted as type unsigned char) that differ in the objects being compared
Which basically means you are doing the right thing by returning something that preserves the sign of ('c' - 'k').
A simpler implementation can be found here.

The return 0; occurs after only comparing the first character. It should be placed outside of the loop.

This snippet works perfect for me!!
#include <stdio.h>
#include <string.h>
int memcmp_test(const char *cs, const char *ct, size_t n)
{
size_t i;
for (i = 0; i < n; i++, cs++, ct++)
{
if (*cs < *ct)
{
return -1;
}
else if (*cs > *ct)
{
return 1;
}
}
return 0;
}
int main()
{
int ret_val = 20; //initialize with non-zero value
const char *string1 = "DWgaOtP12df0";
const char *string2 = "DWGAOTP12DF0";
ret_val = memcmp_test(string1,string2,5);
printf ("ret_val is = %d",ret_val);
getchar();
return 0;
}

Related

Is there a way if string repeats to return only repeated letters once?

I made code which will for string "aabbcc" return "abc" but in cases when there is more letters like "aaa" it will return "aa" instead of just one.
Here is the code I made.
void Ponavljanje(char *s, char *p) {
int i, j = 0, k = 0, br = 0, m = 0;
for (i = 0; i < strlen(s) - 1; i++) {
for (j = i + 1; j < strlen(s); j++) {
if (s[i] == s[j]) {
br++;
if (br == 1) {
p[k++] = s[i];
}
}
}
br = 0;
}
p[k] = '\0';
puts(p);
}
For "112233" output should be "123" or for "11122333" it should be also "123".
Avoid repeated calls to strlen(s). A weak compiler may not see that s is unchanged and call strlen(s) many times, each call insuring a cost of n operations - quite inefficient. #arkku.1 Instead simply stop iterating when the null character detected.
Initialize a boolean list of flags for all char to false. When a character occurs, set the flag to prevent subsequent usage. Be careful when indexing that list as char can be negative.
Using a const char *s allows for wider allocation and helps a compiler optimization.
Example:
#include <stdbool.h>
#include <limits.h>
void Ponavljanje(const char *s, char *p) {
const char *p_original = p;
bool occurred[CHAR_MAX - CHAR_MIN + 1] = { 0 }; // all values set to 0 (false)
while (*s) {
if (!occurred[*s - CHAR_MIN]) {
occurred[*s - CHAR_MIN] = true;
*p++ = *s;
}
s++;
}
*p = '\0';
puts(p_original);
}
1 #wrongway4you comments that many compilers may assume the string did not change and optimize out the repeated strlen() call. A compliant compiler cannot do that though without restrict unless it is known that in all calls, s and p do not overlap. A compiler otherwise needs to assume p may affect s and warrant a repeated strlen() call.
does the work with a complexity O(n)
I suppose programming can give rmg
void Ponavljanje(char *s,char *p)
{
char n[256] = {0};
int i = 0;
while (*s) {
switch (n[(unsigned char) *s]) {
case 0:
n[(unsigned char) *s] = 1;
break;
case 1:
p[i++] = *s;
n[(unsigned char) *s] = 2;
}
s += 1;
}
p[i] = 0;
puts(p);
}
While the inner loop checks br to only copy the output on the first repetition, the outer loop still passes over each repetition in s on future iterations. Hence each further occurrence of the same character will run a separate inner loop after br has already been reset.
With aaa as the input, both the first and the second a cause the inner loop to find a repetition, giving you aa. In fact, you always get one occurrence fewer of each character in the output than there is in the input, which means it only works for 1 or 2 occurrences in the input (resulting in 0 and 1 occurrences, respectively, in the output).
If you only want to remove the successive double letters, then this function would be sufficient, and the examples given in the question would fit:
#include <stdio.h>
void Ponavljanje(char *s,char *p)
{
char dd = '\0';
char *r;
if(s == NULL || p == NULL)
return;
r = p;
while(*s){
if(*s != dd){
*r = *s;
dd = *s;
r++;
}
s++;
}
*r = '\0';
puts(p);
}
int main(void)
{
char s[20] = "1111332222";
char p[20];
Ponavljanje(s,p);
}
Here is something that works regardless of order:
#include <stdio.h>
#include <string.h>
void
repeat(char *s, char *p)
{
int slen;
int sidx;
int pidx;
int plen;
int schr;
slen = strlen(s);
plen = 0;
for (sidx = 0; sidx < slen; ++sidx) {
schr = s[sidx];
// look for duplicate char
int dupflg = 0;
for (pidx = 0; pidx < plen; ++pidx) {
if (p[pidx] == schr) {
dupflg = 1;
break;
}
}
// skip duplicate chars
if (dupflg)
continue;
p[plen++] = schr;
}
p[plen] = 0;
puts(p);
}
int
main(void)
{
char p[100];
repeat("112233",p);
repeat("123123",p);
return 0;
}
Note: As others have mentioned, strlen should not be placed in the loop condition clause of the for [because the length of s is invariant]. Save strlen(s) to a separate variable and loop to that limit
Here is a different/faster version that uses a histogram so that only a single loop is required:
#include <stdio.h>
#include <string.h>
void
repeat(char *s, char *p)
{
char dups[256] = { 0 };
int slen;
int sidx;
int pidx;
int plen;
int schr;
slen = strlen(s);
sidx = 0;
plen = 0;
for (sidx = 0; sidx < slen; ++sidx) {
schr = s[sidx] & 0xFF;
// look for duplicate char
if (dups[schr])
continue;
dups[schr] = 1;
p[plen++] = schr;
}
p[plen] = 0;
puts(p);
}
int
main(void)
{
char p[100];
repeat("112233",p);
repeat("123123",p);
return 0;
}
UPDATE #2:
I would suggest iterating until the terminating NUL byte
Okay, here's a full pointer version that is as fast as I know how to make it:
#include <stdio.h>
#include <string.h>
void
repeat(char *s, char *p)
{
char dups[256] = { 0 };
char *pp;
int schr;
pp = p;
for (schr = *s++; schr != 0; schr = *s++) {
schr &= 0xFF;
// look for duplicate char
if (dups[schr])
continue;
dups[schr] = 1;
*pp++ = schr;
}
*pp = 0;
puts(p);
}
int
main(void)
{
char p[100];
repeat("112233",p);
repeat("123123",p);
return 0;
}

Runtime Error Message: Line 17: index -3 out of bounds for type 'int [256]'

I need help to understand an issue with my C code. I am trying to find longest substring within a given string without character repetition. When run on the leetcode platform, the code below gives me an error for the String "amqpcsrumjjufpu":
Runtime Error Message: Line 17: index -3 out of bounds for type 'int [256]'
However, the same code works fine when I run it from my computer or any online editor. Please help me to understand this behaviour difference.
#include <stdio.h>
#include <string.h>
int lengthOfLongestSubstring(char* s) {
char *h = s;
int A[256] = {0};
int length = 0;
int temp = 0;
int max = 0;
int len = strlen(s);
for(int i = 0; i < len;i ++){
int A[256] = {0};
length = 0;
h = s + i;
for(int j = i; j < len-1; j++){
if (A[h[j]] == 1) {
break;
} else {
A[h[j]] = 1;
length +=1;
}
if (max < length) {
max = length;
}
}
}
return max;
}
int main() {
char *s = "amqpcsrumjjufpu";
int ret = lengthOfLongestSubstring(s);
printf("SAURABH: %d",ret);
}
It seems you are trying to write a function that finds the length of the longest substring of unique characters.
For starters the function should be declared like
size_t lengthOfLongestSubstring( const char *s );
^^^^^^ ^^^^^
These declarations in the outer scope of the function
int A[256] = {0};
//...
int temp = 0;
are redundant. The variables are not used in the function.
The type char can behave either as the type signed char or the type unsigned char. So in expressions like this A[h[j]] you have to cast explicitly the character used as index to the type unsigned char as for example
A[( unsigned char )h[j]]
The inner loop
for(int j=i;j<len-1;j++){
will not execute for strings that contain only one character. So it does not make sense as it is written.
This if statement
if (max < length) {
max = length ;
}
needs to be placed outside the inner loop.
The algorithm used by you can be implemented the following way
#include <stdio.h>
#include <limits.h>
size_t lengthOfLongestSubstring(const char *s)
{
size_t longest = 0;
for (; *s; ++s )
{
size_t n = 0;
unsigned char letters[UCHAR_MAX] = { 0 };
for ( const char *p = s; *p && !letters[(unsigned char)*p - 1]++; ++p) ++n;
if (longest < n) longest = n;
}
return longest;
}
int main( void )
{
char *s = "123145";
printf("The longest substring has %zu characters.\n",
lengthOfLongestSubstring(s));
return 0;
}
The program output is
The longest substring has 5 characters.
Your code crashed because you read data out of range, suppose your input string is amqpcsrumjjufpu its length is 15, in outer loop for i = 13 you do assigment
h = s + i; // h was updated to indicate to 13th element of s
and in inner loop for first iteration, you read this element (j == i == 13)
A[h[j]]
so, you try to read this element A[*(h+j)], but h indicates to 13th element of s, and now you try to add 13 to this value, you want to read 26th position of s, you are out of range of s string.
Thanks Everyone for responses. While Vlad's code worked for all the test cases, here is my code that also passed all the test cases after changes suggested by Vlad and rafix.
int lengthOfLongestSubstring(char* s) {
char *h = s;
int max = 0;
int len = strlen(s);
if (len == 1) {
return 1;
}
for(int i = 0; i < len;i ++){
int A[256] = {0};
int length = 0;
for(int j = i; j < len; j++){
if (A[(unsigned char)h[j]] == 1) {
break;
} else {
A[(unsigned char) h[j]] = 1;
length +=1;
}
}
if (max < length) {
max = length;
}
}
return max;
}

What's wrong with my Heap's Algorithm code?

My homework requires me to write a program that takes a string from the terminal (argc and argv) and print every possible permutation. I have tried to use Heap's Algorithm, but it doesn't seem to be working out. Below is my function.
char **getPermutation(char * in)
{
//n is the size of the input string.
int n = strlen(in);
int count[n];
int counter= 0;
char copy[n];
char **permutations = malloc(sizeof(char*)*(factorial(n)));
permutations[0] = in;
strcpy(in, copy);
counter++;
for( int i = 1; i < n;)
{
if (count[i] < i){
if (i%2==0){
swap(&in[0],&in[i]);
}
else
{
swap(&in[count[i]],&in[i]);
}
permutations[counter] = in;
strcpy(in, copy);
counter++;
count[i]++;
i = 1;
}
else
{
count[i] = 0;
i++;
}
}
return permutations;
}
The function must return the pointer to the character pointer as specified by the instructions. That's also why there are so many variables (although, I'm not really sure what to do with the copy of the string. I'm fairly sure I need it). Testing shows that the program will loop, often too much and eventually hit a seg fault. It doesn't seem like the swapped strings make it into the returned array on top of that.
Below is a rework of your code with cleaned up memory allocation and it addresses some problems mentioned in the above comments. Additionally, you have a bug in your algorithm, this statement strcpy(in, copy); keeps you from getting all the permutations (causes repeats instead.) This code works but isn't finished, it can use more error checking and other finishing touches:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
unsigned int factorial(unsigned int n)
{
/* ... */
}
void swap(char *a, char *b)
{
/* ... */
}
char **getPermutations(const char *input)
{
char *string = strdup(input);
size_t length = strlen(string);
char **permutations = calloc(factorial(length), sizeof(char *));
int *counts = calloc(length, sizeof(int)); // point to array of ints all initialized to 0
int counter = 0;
permutations[counter++] = strdup(string);
for (size_t i = 1; i < length;)
{
if (counts[i] < i)
{
if (i % 2 == 0)
{
swap(&string[0], &string[i]);
}
else
{
swap(&string[counts[i]], &string[i]);
}
permutations[counter++] = strdup(string);
counts[i]++;
i = 1;
}
else
{
counts[i++] = 0;
}
}
free(counts);
free(string);
return permutations;
}
int main(int argc, char *argv[])
{
char *string = argv[1];
char **permutations = getPermutations(string);
unsigned int total = factorial(strlen(string));
for (unsigned int i = 0; i < total; i++)
{
printf("%s\n", permutations[i]);
}
free(permutations);
return 0;
}
OUTPUT
> ./a.out abc
abc
bac
cab
acb
bca
cba
>

Character table-based searcher doesn't work

The following code is supposed to return the character that is registered in the table set and has been determined in the source string.
int find (char source[], char set[])
{
int i, l = strlen(set);
int exit = 0;
for(i = 0; source[i] != '\0';)
{
do
{
if(source[i] == set[l])
{
exit = 1;
break;
}
else l--;
} while (!l);
if(exit)
break;
else
{
i++;
l = strlen(set);
}
}
return set[l];
}
What am I doing wrong?
Use early returns when appropriate, use for loops with all three clauses most of the time, avoid calling strlen() more than once, and avoid do ā€¦ while loops.
int find(char source[], char set[])
{
int len = strlen(set);
for (int i = 0; source[i] != '\0'; i++)
{
for (int l = 0; l < len; l++)
{
if (source[i] == set[l])
return (unsigned char)set[l];
}
}
return -1;
}
The cast returns a positive value even if plain char is a signed type.
I'm not wholly convinced that returning the character is best; the index where the character is found might be better.
If you're stuck with a C89 compiler, then you can use:
int find(char source[], char set[])
{
int len = strlen(set);
int i;
for (i = 0; source[i] != '\0'; i++)
{
int l;
for (l = 0; l < len; l++)
{
if (source[i] == set[l])
return (unsigned char)set[l];
}
}
return -1;
}
I'm letting the compiler optimize source[i] in the inner loop. If you don't trust your compiler, you could use:
int find(char source[], char set[])
{
int len = strlen(set);
int i;
for (i = 0; source[i] != '\0'; i++)
{
char c = source[i];
int l;
for (l = 0; l < len; l++)
{
if (c == set[l])
return (unsigned char)set[l];
}
}
return -1;
}
If you want to use a standard function, you probably want strcspn(), a much-neglected part of Standard C. This will return 0 (or '\0') if there is no other match, unlike the other functions that return -1.
int find(char source[], char set[])
{
size_t i = strcspn(source, set);
return (unsigned char)source[i];
}
If the negative return is important, then you'd use:
int find(char source[], char set[])
{
size_t i = strcspn(source, set);
return (source[i] == '\0') ? -1 : (unsigned char)source[i];
}
Or you could use strpbrk():
int find(char source[], char set[])
{
char *tgt = strpbrk(source, set);
if (tgt == 0)
return -1;
return (unsigned char)*tgt;
}
And there are probably other variants I've not thought of.
If you want to keep the inner do/while loop (thus fixing up the original logic) you can write:
int find(char source[], char set[])
{
int i;
int len = strlen(set) - 1;
for (i = 0; source[i] != '\0'; i++)
{
int l = len;
do
{
if (source[i] == set[l])
return (unsigned char)set[l];
} while (--l >= 0);
}
return -1;
}
This avoids testing set[strlen(set)], which is by definition '\0', with source[i] which is known not to be '\0'. It still uses an early return which radically simplifies the code (no exit variable, which is not a good name to use since there's a standard function exit() too). Note, too, how this keeps the loop control for the variable i all in the for statement ā€” that is one of its principal virtues. You should aim to use it whenever possible. Note that the original code scans from the end of set to the beginning instead of from the beginning to the end. Both methods work essentially equally well, but starting at the beginning and ending at the end is the more conventional way to work. It is also less apt to create bugs. If someone changes the type of l to size_t, then it never goes negative, so the do/while variant fails. The original proposed version will work fine if every int in the body of the function is changed to size_t.
I think, instead of
while (!l);
what you want is
while (l);
while (l > -1); //yeah, because array index starts from 0.
because you are decrementing the value of l and want to continue looping untill l becomes less than 0, right?
Also, return type of function find() is int and you're returning a char. Even it is not an error, you may probably want to change that.

Implementation of strspn( )

The definition of library function strspn is:
size_t strspn(const char *str, const char *chars)
/* Return number of leading characters at the beginning of the string `str`
which are all members of string `chars`. */
e.g. if str is "fecxdy" and chars is "abcdef" then the function would return 3, since f, e and c all appear somewhere in chars, giving 3 leading characters of str, and x is the first character of str which is not a member of chars.
Could someone help me write an implementation of strspn in C. The only library function I am allowed to call from the implementation is strlen?
The basic idea is to step through the string, one character at a time, and test if it's in the character set. If it's not, stop and return the answer. In pseudocode, that would look like:
count = 0
for each character c in str
if c is not in chars
break
count++
return count
The if c is not in chars test can be implemented by iterating through all of the characters of chars and testing if c matches any of the characters. Note that this is not the fastest implementation, since it involves stepping through the chars string for each character in str. A faster implementation would use a lookup table to test if c is not in chars.
I found this question while going over old exams. You weren't allowed to use indexing or any standard functions. Here's my attempt at a solution:
#include <stdio.h>
size_t myStrspn(const char *str1, const char *str2){
size_t i,j;
i=0;
while(*(str1+i)){
j=0;
while(*(str2+j)){
if(*(str1+i) == *(str2+j)){
break; //Found a match.
}
j++;
}
if(!*(str2+j)){
return i; //No match found.
}
i++;
}
return i;
}
void main(){
char s[] = "7803 Elm St.";
int n = 0;
n = myStrspn(s,"1234567890");
printf("The number length is %d. \n",n);
}
Here's the solution from the exam:
#include<stdio.h>
size_t strspn(const char* cs, const char* ct) {
size_t n;
const char* p;
for(n=0; *cs; cs++, n++) {
for(p=ct; *p && *p != *cs; p++)
;
if (!*p)
break;
}
return n;
}
For loops made it much more compact.
I think this should be pretty fast
size_t strspn(const unsigned char *str, const unsigned char *chars){
unsigned char ta[32]={0};
size_t i;
for(i=0;chars[i];++i)
ta[chars[i]>>3]|=0x1<<(chars[i]%8);
for(i=0;((ta[str[i]>>3]>>(str[i]%8))&0x1);++i);
return i;
}
Thanks to others for sanity checks.
A naive implementation of strspn() would iterate on the first string, as long as it finds the corresponding character in the second string:
#include <string.h>
size_t strspn(const char *str, const char *chars) {
size_t i = 0;
while (str[i] && strchr(chars, str[i]))
i++;
return i;
}
Given that you are not allowed to call strchr(), here is a naive native implementation:
size_t strspn(const char *str, const char *chars) {
size_t i, j;
for (i = 0; str[i] != '\0'; i++) {
for (j = 0; chars[j] != str[i]; j++) {
if (chars[j] == '\0')
return i; // char not found, return index so far
}
}
return i; // complete string matches, return length
}
Scanning the second string repeatedly can be costly. Here is an alternative that combines different methods depending on the length of chars, assuming 8-bit bytes:
size_t strspn(const char *str, const char *chars) {
size_t i = 0;
char c = chars[0];
if (c != '\0') { // if second string is empty, return 0
if (chars[1] == '\0') {
// second string has single char, use a simple loop
while (str[i] == c)
i++;
} else {
// second string has more characters, construct a bitmap
unsigned char x, bits[256 / 8] = { 0 };
for (i = 0; (x = chars[i]) != '\0'; i++)
bits[x >> 3] |= 1 << (x & 7);
// iterate while characters are found in the bitmap
for (i = 0; (x = str[i]), (bits[x >> 3] & (1 << (x & 7))); i++)
continue;
}
}
return i;
}
int my_strspn(const char *str1,const char *str2){
int i,k,counter=0;
for(i=0;str1[i]!='\0';i++){
if(counter != i) break;
for(k=0;str2[k]!='\0';k++){
if(str1[i]==str2[k])
counter++;
}
}
return counter;
}
Create a lookup table (a poor man's set) for all possible ASCII chars, and just lookup each character in str. This is worst case O(max(N,M)), where N is the number of characters in str and M is the number of characters in chars.
#include <string.h>
size_t strspn(const char *str, const char *chars) {
int i;
char ch[256] = {0};
for (i = 0; i < strlen(chars); i++) {
ch[chars[i]] = 1;
}
for (i = 0; i < strlen(str); i++) {
if (ch[str[i]] == 0) {
break;
}
}
return i;
}
This could also be solved without using strlen at all, assuming both strings are zero-terminated. The disadvantage of this solution is that one needs 256 bytes of memory for the lookup table.
Without touching a C-compiler for the last couple of years. From the top of my head something like this should work:
int spn = 0;
while(*str++ != '\0')
{
char *hay = chars;
bool match = false;
while(*hay++ != '\0')
{
if(*hay == *str)
{
match = true;
break;
}
}
if(match)
spn++;
else
return spn;
}
return spn;
Well, implementing a standard library for my OS, here is my solution (C++).
KCSTDLIB_API_FUNC(size_t DECL_CALL strspn(const char * str1, const char * str2))
{
size_t count = 0;
auto isin = [&](char c)
{
for (size_t x = 0; str2[x]; x++)
{
if (c == str2[x])
return true;
};
return false;
};
for (; isin(str1[count]); count++);
return count;
}

Resources