Character table-based searcher doesn't work - c

The following code is supposed to return the character that is registered in the table set and has been determined in the source string.
int find (char source[], char set[])
{
int i, l = strlen(set);
int exit = 0;
for(i = 0; source[i] != '\0';)
{
do
{
if(source[i] == set[l])
{
exit = 1;
break;
}
else l--;
} while (!l);
if(exit)
break;
else
{
i++;
l = strlen(set);
}
}
return set[l];
}
What am I doing wrong?

Use early returns when appropriate, use for loops with all three clauses most of the time, avoid calling strlen() more than once, and avoid do … while loops.
int find(char source[], char set[])
{
int len = strlen(set);
for (int i = 0; source[i] != '\0'; i++)
{
for (int l = 0; l < len; l++)
{
if (source[i] == set[l])
return (unsigned char)set[l];
}
}
return -1;
}
The cast returns a positive value even if plain char is a signed type.
I'm not wholly convinced that returning the character is best; the index where the character is found might be better.
If you're stuck with a C89 compiler, then you can use:
int find(char source[], char set[])
{
int len = strlen(set);
int i;
for (i = 0; source[i] != '\0'; i++)
{
int l;
for (l = 0; l < len; l++)
{
if (source[i] == set[l])
return (unsigned char)set[l];
}
}
return -1;
}
I'm letting the compiler optimize source[i] in the inner loop. If you don't trust your compiler, you could use:
int find(char source[], char set[])
{
int len = strlen(set);
int i;
for (i = 0; source[i] != '\0'; i++)
{
char c = source[i];
int l;
for (l = 0; l < len; l++)
{
if (c == set[l])
return (unsigned char)set[l];
}
}
return -1;
}
If you want to use a standard function, you probably want strcspn(), a much-neglected part of Standard C. This will return 0 (or '\0') if there is no other match, unlike the other functions that return -1.
int find(char source[], char set[])
{
size_t i = strcspn(source, set);
return (unsigned char)source[i];
}
If the negative return is important, then you'd use:
int find(char source[], char set[])
{
size_t i = strcspn(source, set);
return (source[i] == '\0') ? -1 : (unsigned char)source[i];
}
Or you could use strpbrk():
int find(char source[], char set[])
{
char *tgt = strpbrk(source, set);
if (tgt == 0)
return -1;
return (unsigned char)*tgt;
}
And there are probably other variants I've not thought of.
If you want to keep the inner do/while loop (thus fixing up the original logic) you can write:
int find(char source[], char set[])
{
int i;
int len = strlen(set) - 1;
for (i = 0; source[i] != '\0'; i++)
{
int l = len;
do
{
if (source[i] == set[l])
return (unsigned char)set[l];
} while (--l >= 0);
}
return -1;
}
This avoids testing set[strlen(set)], which is by definition '\0', with source[i] which is known not to be '\0'. It still uses an early return which radically simplifies the code (no exit variable, which is not a good name to use since there's a standard function exit() too). Note, too, how this keeps the loop control for the variable i all in the for statement — that is one of its principal virtues. You should aim to use it whenever possible. Note that the original code scans from the end of set to the beginning instead of from the beginning to the end. Both methods work essentially equally well, but starting at the beginning and ending at the end is the more conventional way to work. It is also less apt to create bugs. If someone changes the type of l to size_t, then it never goes negative, so the do/while variant fails. The original proposed version will work fine if every int in the body of the function is changed to size_t.

I think, instead of
while (!l);
what you want is
while (l);
while (l > -1); //yeah, because array index starts from 0.
because you are decrementing the value of l and want to continue looping untill l becomes less than 0, right?
Also, return type of function find() is int and you're returning a char. Even it is not an error, you may probably want to change that.

Related

Difference b/w using i<strlen() and str[i] != '\0'

When I use for(i=0;i<strlen(s);i++) then I am getting time limit exceed error. And When I use for(i=0;s[i]!='\0';i++) my code get successful submit. Why?
I am also providing link of question from codechef - https://www.codechef.com/problems/LCPESY
Type 1:
for (i = 0; i < strlen(s1); i++) {
f1[s1[i]]++;
}
for (i = 0; i < strlen(s2); i++) {
f2[s2[i]]++;
}
Type 2:
for (i = 0; s1[i] != '\0'; i++) {
f1[s1[i]]++;
}
for (i = 0; s2[i] != '\0'; i++) {
f2[s2[i]]++;
}
Complete code:
#include <stdio.h>
#include <string.h>
long int min(long int a, long int b) {
if (a >= b)
return b;
else
return a;
}
int main(void) {
// your code goes here
int t;
scanf("%d", &t);
while (t--) {
char s1[10001], s2[10001];
scanf("%s%s", s1, s2);
long int f1[200] = { 0 }, f2[200] = { 0 }, i, count = 0;
for (i = 0; i < strlen(s1); i++) {
f1[s1[i]]++;
}
for (i = 0; i < strlen(s2); i++) {
f2[s2[i]]++;
}
for (i = 0; i < 200; i++) {
count += min(f1[i], f2[i]);
}
printf("%ld\n", count);
}
return 0;
}
If a non-optimizing compiler is used it can be that strlen is re-evaluated once per each iteration. strlen then needs to check each and every character in the string for equivalence with 0. This results in quadratic runtime, where there are O(n²) checks for the terminatin null instead of just the necessary O(n) times. In the strlen code the timeout happens because it does perhaps 2,000,000 null checks and 10,000 other operations; the other code would do 2,000 null checks and those same 10,000 other operations and not time out.
However, this need not be a case. Due to the as-if rule, a C compiler can generate exactly equivalent machine for the cases
for (i = 0; i < strlen(s1); i++){
f1[s1[i]] ++;
}
and
for (i = 0; s1[i] != '\0'; i++) {
f1[s1[i]] ++;
}
because a compiler can easily prove that the inner loop cannot possibly change s1 and therefore both forms would behave equivalently.
In addition to #Antti Haapala good answer:`
Difference b/w using i<strlen() and str[i] != '\0'
Code like int i; ... i < strlen(s1) readily complains about mismatched sign-ness - when such warnings are enabled. Usually inoffensive code like that discourages wide use of that warning. I see that as a less preferred approach. str[i] != '\0' does not cause that warning.
Some other concerns
Prevent buffer overflow
char s1[10001], s2[10001];
// scanf("%s%s", s1, s2);
if (scanf("%10000s%10000s", s1, s2) == 2) {
// OK, success, lets go!
There are more than 200 characters.
// long int f1[200] = { 0 };
long int f1[256] = { 0 };
// or better
long int f1[UCHAR_MAX + 1] = { 0 };
Avoid a negative index
// f1[s1[i]]++;
f1[(unsigned char) s1[i]]++;
or use unquestionable unsigned types.
// char s1[10001];
unsigned char s1[10001];

Is there a way if string repeats to return only repeated letters once?

I made code which will for string "aabbcc" return "abc" but in cases when there is more letters like "aaa" it will return "aa" instead of just one.
Here is the code I made.
void Ponavljanje(char *s, char *p) {
int i, j = 0, k = 0, br = 0, m = 0;
for (i = 0; i < strlen(s) - 1; i++) {
for (j = i + 1; j < strlen(s); j++) {
if (s[i] == s[j]) {
br++;
if (br == 1) {
p[k++] = s[i];
}
}
}
br = 0;
}
p[k] = '\0';
puts(p);
}
For "112233" output should be "123" or for "11122333" it should be also "123".
Avoid repeated calls to strlen(s). A weak compiler may not see that s is unchanged and call strlen(s) many times, each call insuring a cost of n operations - quite inefficient. #arkku.1 Instead simply stop iterating when the null character detected.
Initialize a boolean list of flags for all char to false. When a character occurs, set the flag to prevent subsequent usage. Be careful when indexing that list as char can be negative.
Using a const char *s allows for wider allocation and helps a compiler optimization.
Example:
#include <stdbool.h>
#include <limits.h>
void Ponavljanje(const char *s, char *p) {
const char *p_original = p;
bool occurred[CHAR_MAX - CHAR_MIN + 1] = { 0 }; // all values set to 0 (false)
while (*s) {
if (!occurred[*s - CHAR_MIN]) {
occurred[*s - CHAR_MIN] = true;
*p++ = *s;
}
s++;
}
*p = '\0';
puts(p_original);
}
1 #wrongway4you comments that many compilers may assume the string did not change and optimize out the repeated strlen() call. A compliant compiler cannot do that though without restrict unless it is known that in all calls, s and p do not overlap. A compiler otherwise needs to assume p may affect s and warrant a repeated strlen() call.
does the work with a complexity O(n)
I suppose programming can give rmg
void Ponavljanje(char *s,char *p)
{
char n[256] = {0};
int i = 0;
while (*s) {
switch (n[(unsigned char) *s]) {
case 0:
n[(unsigned char) *s] = 1;
break;
case 1:
p[i++] = *s;
n[(unsigned char) *s] = 2;
}
s += 1;
}
p[i] = 0;
puts(p);
}
While the inner loop checks br to only copy the output on the first repetition, the outer loop still passes over each repetition in s on future iterations. Hence each further occurrence of the same character will run a separate inner loop after br has already been reset.
With aaa as the input, both the first and the second a cause the inner loop to find a repetition, giving you aa. In fact, you always get one occurrence fewer of each character in the output than there is in the input, which means it only works for 1 or 2 occurrences in the input (resulting in 0 and 1 occurrences, respectively, in the output).
If you only want to remove the successive double letters, then this function would be sufficient, and the examples given in the question would fit:
#include <stdio.h>
void Ponavljanje(char *s,char *p)
{
char dd = '\0';
char *r;
if(s == NULL || p == NULL)
return;
r = p;
while(*s){
if(*s != dd){
*r = *s;
dd = *s;
r++;
}
s++;
}
*r = '\0';
puts(p);
}
int main(void)
{
char s[20] = "1111332222";
char p[20];
Ponavljanje(s,p);
}
Here is something that works regardless of order:
#include <stdio.h>
#include <string.h>
void
repeat(char *s, char *p)
{
int slen;
int sidx;
int pidx;
int plen;
int schr;
slen = strlen(s);
plen = 0;
for (sidx = 0; sidx < slen; ++sidx) {
schr = s[sidx];
// look for duplicate char
int dupflg = 0;
for (pidx = 0; pidx < plen; ++pidx) {
if (p[pidx] == schr) {
dupflg = 1;
break;
}
}
// skip duplicate chars
if (dupflg)
continue;
p[plen++] = schr;
}
p[plen] = 0;
puts(p);
}
int
main(void)
{
char p[100];
repeat("112233",p);
repeat("123123",p);
return 0;
}
Note: As others have mentioned, strlen should not be placed in the loop condition clause of the for [because the length of s is invariant]. Save strlen(s) to a separate variable and loop to that limit
Here is a different/faster version that uses a histogram so that only a single loop is required:
#include <stdio.h>
#include <string.h>
void
repeat(char *s, char *p)
{
char dups[256] = { 0 };
int slen;
int sidx;
int pidx;
int plen;
int schr;
slen = strlen(s);
sidx = 0;
plen = 0;
for (sidx = 0; sidx < slen; ++sidx) {
schr = s[sidx] & 0xFF;
// look for duplicate char
if (dups[schr])
continue;
dups[schr] = 1;
p[plen++] = schr;
}
p[plen] = 0;
puts(p);
}
int
main(void)
{
char p[100];
repeat("112233",p);
repeat("123123",p);
return 0;
}
UPDATE #2:
I would suggest iterating until the terminating NUL byte
Okay, here's a full pointer version that is as fast as I know how to make it:
#include <stdio.h>
#include <string.h>
void
repeat(char *s, char *p)
{
char dups[256] = { 0 };
char *pp;
int schr;
pp = p;
for (schr = *s++; schr != 0; schr = *s++) {
schr &= 0xFF;
// look for duplicate char
if (dups[schr])
continue;
dups[schr] = 1;
*pp++ = schr;
}
*pp = 0;
puts(p);
}
int
main(void)
{
char p[100];
repeat("112233",p);
repeat("123123",p);
return 0;
}

Can someone help me figure out what's wrong with this program? (C)

I am doing the easy challenge on /r/dailyprogrammer in C. I actually managed to write over a hundred lines of code, and spend a couple hours total on it (usually I end up chickening out), and figure out all the compiler errors. But now, when I run it, I immediately get a segfault. What I'm doing wrong?
Yes, it's sort of homework help, but at least I tried before coming here.
#include <ctype.h>
#include <stdio.h>
#include <string.h>
#define MAXLEN 50
#define LIMIT 20
#define TRUE 1
#define FALSE 0
char* reverse(char *a);
char* ltoa(long i);
long atol(char *a); /* NOTE: Handle leading zeros. */
long palindromize(long p);
int ispalindrome(long p);
/* Meat. */
int main(int argc, char *argv[])
{
long p;
int count, limr;
p = (long) argv[1];
count = 0;
limr = FALSE;
while (TRUE)
{
p = palindromize(p);
count++;
if (ispalindrome(p))
{
break;
} else if (count == LIMIT) {
limr = TRUE;
break;
}
}
if (limr)
{
printf("It might be a palindrome, but it'd take quite a while to find out.\nLast number reached: %ld\n", p);
} else {
printf("Palindrome found! After %d steps, we've found %ld.\n", count, p);
}
}
long palindromize(long p)
{
return (atol(reverse(ltoa(p)))) + p;
}
int ispalindrome(long p)
{
char *t, *r;
t = ltoa(p);
r = reverse(ltoa(p));
if (t == r)
{
return TRUE;
} else {
return FALSE;
}
}
/* Utility functions. */
/* Converts string to long integer. */
long atol(char *a)
{
int i, sign;
long r;
for (i = 0; a[i] == '0'; i++)
{
i++;
}
if (a[0] == '-' || a[-1] == '-')
{
sign = -1;
}
else
{
sign = 1;
}
for (; isdigit(a[i]); i++)
{
r = 10 * r + (a[i] - '0');
}
return r * sign;
}
/* Converts long integer to string.
This and reverse are based on the ones in K&R. */
char* ltoa(long n)
{
char *a;
int i, sign;
if ((sign = n) < 0)
{
n = -n;
}
i = 0;
do
{
a[i++] = n % 10 + '0';
} while ((n /= 10) > 0);
if (sign < 0)
{
a[i++] = '-';
}
a[i] = '\0';
return reverse(a);
}
char* reverse(char *s)
{
int i, j;
char c;
for (i = 0, j = strlen(s)-1; i<j; i++, j--) {
c = s[i];
s[i] = s[j];
s[j] = c;
}
return s;
}
In ltoa you declare char *a; but never malloc any space for it. When you access a[i] it crashes. For problems like this, remember a simple first debugging step is to add print statements everywhere so you can at least pinpoint where the error is occurring.
Looks like your ispalindrome is wrong too, but not in a seg fault way. I'll let you figure out why =D
char* ltoa(long n)
{
char *a;
int i, sign;
This is, in fact, a very common error. char *a is a pointer to a character (possibly a character array). But you don't tell it where the character are in memory. So it is pointing to some random location.
You can either reserve characters locally by using char a[100] or so, but that will land you with problems on returning from the function. Or you can reserve memory with char *a = (char *)malloc(100). Or you could consider palindromize the input string itself, and leave the memory problems to the caller.

Implementation of strspn( )

The definition of library function strspn is:
size_t strspn(const char *str, const char *chars)
/* Return number of leading characters at the beginning of the string `str`
which are all members of string `chars`. */
e.g. if str is "fecxdy" and chars is "abcdef" then the function would return 3, since f, e and c all appear somewhere in chars, giving 3 leading characters of str, and x is the first character of str which is not a member of chars.
Could someone help me write an implementation of strspn in C. The only library function I am allowed to call from the implementation is strlen?
The basic idea is to step through the string, one character at a time, and test if it's in the character set. If it's not, stop and return the answer. In pseudocode, that would look like:
count = 0
for each character c in str
if c is not in chars
break
count++
return count
The if c is not in chars test can be implemented by iterating through all of the characters of chars and testing if c matches any of the characters. Note that this is not the fastest implementation, since it involves stepping through the chars string for each character in str. A faster implementation would use a lookup table to test if c is not in chars.
I found this question while going over old exams. You weren't allowed to use indexing or any standard functions. Here's my attempt at a solution:
#include <stdio.h>
size_t myStrspn(const char *str1, const char *str2){
size_t i,j;
i=0;
while(*(str1+i)){
j=0;
while(*(str2+j)){
if(*(str1+i) == *(str2+j)){
break; //Found a match.
}
j++;
}
if(!*(str2+j)){
return i; //No match found.
}
i++;
}
return i;
}
void main(){
char s[] = "7803 Elm St.";
int n = 0;
n = myStrspn(s,"1234567890");
printf("The number length is %d. \n",n);
}
Here's the solution from the exam:
#include<stdio.h>
size_t strspn(const char* cs, const char* ct) {
size_t n;
const char* p;
for(n=0; *cs; cs++, n++) {
for(p=ct; *p && *p != *cs; p++)
;
if (!*p)
break;
}
return n;
}
For loops made it much more compact.
I think this should be pretty fast
size_t strspn(const unsigned char *str, const unsigned char *chars){
unsigned char ta[32]={0};
size_t i;
for(i=0;chars[i];++i)
ta[chars[i]>>3]|=0x1<<(chars[i]%8);
for(i=0;((ta[str[i]>>3]>>(str[i]%8))&0x1);++i);
return i;
}
Thanks to others for sanity checks.
A naive implementation of strspn() would iterate on the first string, as long as it finds the corresponding character in the second string:
#include <string.h>
size_t strspn(const char *str, const char *chars) {
size_t i = 0;
while (str[i] && strchr(chars, str[i]))
i++;
return i;
}
Given that you are not allowed to call strchr(), here is a naive native implementation:
size_t strspn(const char *str, const char *chars) {
size_t i, j;
for (i = 0; str[i] != '\0'; i++) {
for (j = 0; chars[j] != str[i]; j++) {
if (chars[j] == '\0')
return i; // char not found, return index so far
}
}
return i; // complete string matches, return length
}
Scanning the second string repeatedly can be costly. Here is an alternative that combines different methods depending on the length of chars, assuming 8-bit bytes:
size_t strspn(const char *str, const char *chars) {
size_t i = 0;
char c = chars[0];
if (c != '\0') { // if second string is empty, return 0
if (chars[1] == '\0') {
// second string has single char, use a simple loop
while (str[i] == c)
i++;
} else {
// second string has more characters, construct a bitmap
unsigned char x, bits[256 / 8] = { 0 };
for (i = 0; (x = chars[i]) != '\0'; i++)
bits[x >> 3] |= 1 << (x & 7);
// iterate while characters are found in the bitmap
for (i = 0; (x = str[i]), (bits[x >> 3] & (1 << (x & 7))); i++)
continue;
}
}
return i;
}
int my_strspn(const char *str1,const char *str2){
int i,k,counter=0;
for(i=0;str1[i]!='\0';i++){
if(counter != i) break;
for(k=0;str2[k]!='\0';k++){
if(str1[i]==str2[k])
counter++;
}
}
return counter;
}
Create a lookup table (a poor man's set) for all possible ASCII chars, and just lookup each character in str. This is worst case O(max(N,M)), where N is the number of characters in str and M is the number of characters in chars.
#include <string.h>
size_t strspn(const char *str, const char *chars) {
int i;
char ch[256] = {0};
for (i = 0; i < strlen(chars); i++) {
ch[chars[i]] = 1;
}
for (i = 0; i < strlen(str); i++) {
if (ch[str[i]] == 0) {
break;
}
}
return i;
}
This could also be solved without using strlen at all, assuming both strings are zero-terminated. The disadvantage of this solution is that one needs 256 bytes of memory for the lookup table.
Without touching a C-compiler for the last couple of years. From the top of my head something like this should work:
int spn = 0;
while(*str++ != '\0')
{
char *hay = chars;
bool match = false;
while(*hay++ != '\0')
{
if(*hay == *str)
{
match = true;
break;
}
}
if(match)
spn++;
else
return spn;
}
return spn;
Well, implementing a standard library for my OS, here is my solution (C++).
KCSTDLIB_API_FUNC(size_t DECL_CALL strspn(const char * str1, const char * str2))
{
size_t count = 0;
auto isin = [&](char c)
{
for (size_t x = 0; str2[x]; x++)
{
if (c == str2[x])
return true;
};
return false;
};
for (; isin(str1[count]); count++);
return count;
}

memcmp C implementation - any logical errors with this one

memcmp C implementation - any logical errors with this one?
I was going looking for an implementation of memcmp(), I found this code snippet, but it is clearly marked that there is 1 logical error with the code snippet. Could you help me find the logical error.
Basically, I tested this code against the string.h library implementation of memcmp() with different inputs, but the expected output is always the same as the library version of the function.
Here is the code snippet:
#include <stdio.h>
#include <string.h>
int memcmp_test(const char *cs, const char *ct, size_t n)
{
size_t i;
for (i = 0; i < n; i++, cs++, ct++)
{
if (*cs < *ct)
{
return -1;
}
else if (*cs > *ct)
{
return 1;
}
else
{
return 0;
}
}
}
int main()
{
int ret_val = 20; //initialize with non-zero value
char *string1 = "china";
char *string2 = "korea";
ret_val = memcmp_test(string1,string2,5);
printf ("ret_val is = %d",ret_val);
getchar();
return 0;
}
I ran the program with the two example strings and the program would return just after the comparison of the first characters of the two strings. ret_val is -1 in the above case.
The definition of memcmp() which the above code snippet should conform to is:
The definition of the ‘C’ Library function memcmp is
int memcmp(const char *cs, const char *ct, size_t n)
Compare the first n characters of cs with the first n characters of ct.
Return < 0 if cs < ct.
Return > 0 if cs > ct.
Return 0 if cs == ct.
There is definitely on LOGICAL error, could you help me find it.
As it's written now, this code will only test the first byte of the inputs. The else return 0 needs to be moved out of the loop, leaving return 0 at the end:
for (i = 0; i < n; i++, cs++, ct++)
{
if (*cs < *ct)
{
return -1;
}
else if (*cs > *ct)
{
return 1;
}
}
return 0;
}
I guess since char signedness is implementation defined, you could make your comparison unsigned:
int memcmp_test(const char *cs_in, const char *ct_in, size_t n)
{
size_t i;
const unsigned char * cs = (const unsigned char*) cs_in;
const unsigned char * ct = (const unsigned char*) ct_in;
for (i = 0; i < n; i++, cs++, ct++)
{
if (*cs < *ct)
{
return -1;
}
else if (*cs > *ct)
{
return 1;
}
}
return 0;
}
Look at your for loop. It's only examining one character.
Strictly speaking the signature is wrong. The correct one is:
int memcmp(const void *s1, const void *s2, size_t n);
Your code compares c and k and on finding that c is less than k dutifully returns a -1. However, if these two were equal you would get an incorrect result since you are returning early.
If you read the documentation you'd find:
The sign of a non-zero return value shall be determined by the sign of the difference between the values of the first pair of bytes (both interpreted as type unsigned char) that differ in the objects being compared
Which basically means you are doing the right thing by returning something that preserves the sign of ('c' - 'k').
A simpler implementation can be found here.
The return 0; occurs after only comparing the first character. It should be placed outside of the loop.
This snippet works perfect for me!!
#include <stdio.h>
#include <string.h>
int memcmp_test(const char *cs, const char *ct, size_t n)
{
size_t i;
for (i = 0; i < n; i++, cs++, ct++)
{
if (*cs < *ct)
{
return -1;
}
else if (*cs > *ct)
{
return 1;
}
}
return 0;
}
int main()
{
int ret_val = 20; //initialize with non-zero value
const char *string1 = "DWgaOtP12df0";
const char *string2 = "DWGAOTP12DF0";
ret_val = memcmp_test(string1,string2,5);
printf ("ret_val is = %d",ret_val);
getchar();
return 0;
}

Resources