Need help creating a FindMaxOverlap function - c

I'm trying to create a function that, given two C strings, it spits back the number of consecutive character overlap between the two strings.
For example,
String 1: "Today is monday."
String 2: " is monday."
The overlap here would be " is monday.", which is 11 characters (it includes the space and '.').

If you need something more efficient, consider that a partial mismatch between Strings 1 and 2 means you can jump the length of the remainder of String 2 along String 1. This means you don't need to search the entirety of String 1.
Take a look at the Boyer-Moore algorithm. Though it is used for string searching, you could implement this algorithm for finding the maximum-length substring using String 2 as your pattern and String 1 as your target text.

There is probably a more efficient way to do this, but here's a simple approach:
#include <string.h>
int main() {
char s1[17] = "Today is monday.";
char s2[12] = " is monday.";
int max = 0;
int i_max = -1;
int j_max = -1;
int i = 0, j = 0, k=0;
int endl = 0, sl1, sl2;
char *ss1, *ss2;
for(i = 0; i < strlen(s1)-1; i++) {
ss1 = s1+i;
sl1 = strlen(ss1);
if(max >= sl1) {
break; // You found it.
}
for(j = 0; j < strlen(s2)-1; j++) {
ss2 = s2+j;
sl2 = strlen(ss2);
if(max >= sl2) {
break; // Can't find a bigger overlap.
}
endl = (sl1 > sl2)?sl2:sl1;
int n_char = 0;
for(k = 0; k < endl+1; k++) {
// printf("%s\t%s\n", ss1+k, ss2+k); // Uncomment if you want to see what it compares.
if(ss1[k] != ss2[k] || ss1[k] == '\0') {
n_char = k;
break;
}
}
if(n_char > max) {
max = n_char;
i_max = i;
j_max = j;
}
}
}
char nstr[max+1];
nstr[max] = '\0';
strncpy(nstr, s1+i_max, max);
printf("Maximum overlap is %d characters, substring: %s\n", max, nstr);
return 0;
}
Update: I have fixed the bugs. This definitely compiles. Here is the result: http://codepad.org/SINhmm7f
The problems were that endl was defined wrong and I wasn't checking for end-of-line conditions.
Hopefully the code speaks for itself.

Here is my solution, it will return the position of the overlap starting point, it's a bit complex, but that's how it's done in C:
#include <string.h>
int FindOverlap (const char * a, const char * b)
{
// iterators
char * u = a;
char * v = b;
char * c = 0; // overlap iterator
char overlapee = 'b';
if (strlen(a) < strlen(b)) overlapee = 'a';
if (overlapee == 'b')
{
while (*u != '\0')
{
v = b; // reset b iterator
c = u;
while (*v != '\0')
{
if (*c != *v) break;
c++;
v++;
}
if (*v == '\0') return (u-a); // return overlap starting point
}
}
else if (overlapee == 'a')
{
while (*v != '\0')
{
u = a; // reset b iterator
c = v;
while (*u != '\0')
{
if (*c != *u) break;
c++;
u++;
}
if (*v == '\0') return (v-b); // return overlap starting point
}
}
return (-1); // not found
}

Related

why can't my program recognize similar words in a string?

I want to write a program that will take an input T. In the next T lines, each line will take a string as an input. The output would be how many ways the string can be reordered.
#include <stdio.h>
#include <stdlib.h>
int main() {
int T, i, l, count = 1, test = 0, word = 0, ans;
char line[200];
scanf("%d", &T);
for (i = 0; i < T; i++) {
scanf(" %[^\n]", line);
l = strlen(line);
for (int q = 0; q < l; q++) {
if (line[q] == ' ') {
word++;
}
}
ans = fact(word + 1);
word = 0;
for (int j = 0; j < l; j++) {
for (int k = j + 1; k < l; k++) {
if (line[k] == ' ' && line[k + 1] == line[j]) {
int m = j;
int n = k + 1;
for (;;) {
if (line[m] != line[n]) {
break;
} else
if (line[m] == ' ' && line[n] == ' ') {
test = 1;
break;
} else {
m++;
n++;
}
}
if (test == 1) {
count++;
ans = ans / fact(count);
count = 0;
test = 0;
}
}
}
}
printf("%d\n", ans);
}
}
int fact(int n) {
if (n == 1) {
return 1;
} else {
return n * fact(n - 1);
}
}
Now, in my program,
my output is like this:
2
no way no good
12
yes no yes yes no
120
if T = 2 and the 1st string is no way no good, it gives the right output that is 12 (4!/2!). That means, it has identified that there are two similar words.
But in the 2nd input, the string is yes no yes yes no. that means 3 yes and 2 nos. So the and should be 5!/(3!2!) = 10. But why is the answer 120? and why can't it recognize the similar words?
The main problem in your duplicate detector is you test the end of word with if (line[m] == ' ' && line[n] == ' ') but this test fails to identify a duplicate that occurs with the last word because line[n] is '\0', not ' '.
Note these further problems:
you do not handle words that occur more than twice correctly: you should perform ans = ans / fact(count); only after the outer loop finishes. For example, if a word is present 3 times, it will be detected as 3 pairs of duplicates, effectively causing ans to be divided by 23 = 8, instead of 3! = 6.
you should protect against buffer overflow and detect invalid input with:
if (scanf(" %199[^\n]", line) != 1)
break;
the range of type int for ans is too small for a moderately large number of words: 13! is 6227020800, larger than INT_MAX on most systems.
The code is difficult to follow. You should consider parsing the line into an array of words and using a more conventional way of counting duplicates.
Here is a modified version using this approach:
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
static int cmpstr(const void *p1, const void *p2) {
char * const *pp1 = p1;
char * const *pp2 = p2;
return strcmp(*pp1, *pp2);
}
unsigned long long factorial(int n) {
unsigned long long res = 1;
while (n > 1)
res *= n--;
return res;
}
int main() {
int T, i, n, begin, count;
unsigned long long ans;
char line[200];
char *words[100];
if (!fgets(line, sizeof line, stdin) || sscanf(line, "%d", &T) != 1)
return 1;
while (T --> 0) {
if (!fgets(line, sizeof line, stdin))
break;
n = 0;
begin = 1;
for (char *p = line; *p; p++) {
if (isspace((unsigned char)*p)) {
*p = '\0';
begin = 1;
} else {
if (begin) {
words[n++] = p;
begin = 0;
}
}
}
qsort(words, n, sizeof(*words), cmpstr);
ans = factorial(n);
for (i = 0; i < n; i += count) {
for (count = 1; i + count < n && !strcmp(words[i], words[i + count]); count++)
continue;
ans /= factorial(count);
}
printf("%llu\n", ans);
}
return 0;
}

returning string in C function [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I was trying to solve CountAndSay problem at one of the online coding site but I am not able to get why my program is printing NULL. I am sure I am doing some conceptual mistake but not getting it.
Here is my code :
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char* countAndSay(int A) {
int i,j,k,f,count;
char a;
char *c = (char*)malloc(sizeof(char)*100);
char *temp = (char*)malloc(sizeof(char)*100);
c[0] = 1;c[1] = '\0';
for(k=2; k<=A; k++)
{
for(i=0, j=0; i<strlen(c); i++)
{
a = c[i];
count = 1;
i++;
while(c[i] != '\0')
{
if(c[i]==a)
{
count++;
i++;
}
else if(c[i] != a)
{
i--;
break;
}
else
{
break;
}
}
temp[j] = count;
temp[j+1] = a;
j += 2;
}
*(temp+j) = '\0';
if(k<A)
{
for(j=0; j<strlen(temp); j++)
{
c[j] = temp[j];
}
c[j] = '\0';
}
}
return temp;
}
int main(void) {
// your code goes here
char *c = countAndSay(8);
printf("%s\n",c);
return 0;
}
The idea is not that bad, the main errors are the mix-up of numerical digits and characters as shown in the comments.
Also: if you use dynamic memory, than use dynamic memory. If you only want to use a fixed small amount you should use the stack instead, e.g.: c[100], but that came up in the comments, too. You also need only one piece of memory. Here is a working example based on your code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// ALL CHECKS OMMITTED!
char *countAndSay(int A)
{
int k, count, j;
// "i" gets compared against the output of
// strlen() which is of type size_t
size_t i;
char a;
// Seed needs two bytes of memory
char *c = malloc(2);
// Another pointer, pointing to the same memory later.
// Set to NULL to avoid an extra malloc()
char *temp = NULL;
// a temporary pointer needed for realloc()-ing
char *cp;
// fill c with seed
c[0] = '1';
c[1] = '\0';
if (A == 1) {
return c;
}
// assuming 1-based input, that is: the first
// entry of the sequence is numbered 1 (one)
for (k = 2; k <= A; k++) {
// Memory needed is twice the size of
// the former entry at most.
// (Averages to Conway's constant but that
// number is not usable here, it is only a limit)
cp = realloc(temp, strlen(c) * 2 + 1);
temp = cp;
for (i = 0, j = 0; i < strlen(c); i++) {
//printf("A i = %zu, j = %zu\n",i,j);
a = c[i];
count = 1;
i++;
while (c[i] != '\0') {
if (c[i] == a) {
count++;
i++;
} else {
i--;
break;
}
}
temp[j++] = count + '0';
temp[j++] = a;
//printf("B i = %zu, j = %zu\n",i,j-1)
//printf("B i = %zu, j = %zu\n",i,j);
}
temp[j] = '\0';
if (k < A) {
// Just point "c" to the new sequence in "temp".
// Why does this work and temp doesn't overwrite c later?
// Or does it *not* always work and fails at one point?
// A mystery! Try to find it out! Some hints in the code.
c = temp;
temp = NULL;
}
// intermediate results:
//printf("%s\n\n",c);
}
return temp;
}
int main(int argc, char **argv)
{
// your code goes here
char *c = countAndSay(atoi(argv[1]));
printf("%s\n", c);
free(c);
return 0;
}
To get a way to check for sequences not in the list over at OEIS, I rummaged around in my attic and found this little "gem":
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <limits.h>
char *conway(char *s)
{
char *seq;
char c;
size_t len, count, i = 0;
len = strlen(s);
/*
* Worst case is twice as large as the input, e.g.:
* 1 -> 11
* 21 -> 1211
*/
seq = malloc(len * 2 + 1);
if (seq == NULL) {
return NULL;
}
while (len) {
// counter for occurrences of ...
count = 0;
// ... this character
c = s[0];
// as long as the string "s"
while (*s != '\0' && *s == c) {
// move pointer to next character
s++;
// increment counter
count++;
// decrement the length of the string
len--;
}
// to keep it simple, fail if c > 9
// but that cannot happen with a seed of 1
// which is used here.
// For other seeds it might be necessary to
// use a map with the higher digits as characters.
// If it is not possible to fit it into a
// character, the approach with a C-string is
// obviously not reasonable anymore.
if (count > 9) {
free(seq);
return NULL;
}
// append counter as a character
seq[i++] = (char) (count + '0');
// append character "c" from above
seq[i++] = c;
}
// return a proper C-string
seq[i] = '\0';
return seq;
}
int main(int argc, char **argv)
{
long i, n;
char *seq0, *seq1;
if (argc != 2) {
fprintf(stderr, "Usage: %s n>0\n", argv[0]);
exit(EXIT_FAILURE);
}
// reset errno, just in case
errno = 0;
// get amount from commandline
n = strtol(argv[1], NULL, 0);
if ((errno == ERANGE && (n == LONG_MAX || n == LONG_MIN))
|| (errno != 0 && n == 0)) {
fprintf(stderr, "strtol failed: %s\n", strerror(errno));
exit(EXIT_FAILURE);
}
if (n <= 0) {
fprintf(stderr, "Usage: %s n>0\n", argv[0]);
exit(EXIT_FAILURE);
}
// allocate space for seed value "1" plus '\0'
// If the seed is changed the limit in the conway() function
// above might need a change.
seq0 = malloc(2);
if (seq0 == NULL) {
fprintf(stderr, "malloc() failed to allocate a measly 2 bytes!?\n");
exit(EXIT_FAILURE);
}
// put the initial value into the freshly allocated memory
strcpy(seq0, "1");
// print it, nicely formatted
/*
* putc('1', stdout);
* if (n == 1) {
* putc('\n', stdout);
* free(seq0);
* exit(EXIT_SUCCESS);
* } else {
* printf(", ");
* }
*/
if (n == 1) {
puts("1");
free(seq0);
exit(EXIT_SUCCESS);
}
// adjust count
n--;
for (i = 0; i < n; i++) {
// compute conway sequence as a recursion
seq1 = conway(seq0);
if (seq1 == NULL) {
fprintf(stderr, "conway() failed, probably because malloc() failed\n");
exit(EXIT_FAILURE);
}
// make room
free(seq0);
seq0 = NULL;
// print sequence, comma separated
// printf("%s%s", seq1, (i < n - 1) ? "," : "\n");
// or print sequence and length of sequence, line separated
// printf("%zu: %s%s", strlen(seq1), seq1, (i < n-1) ? "\n\n" : "\n");
// print the endresult only
if (i == n - 1) {
printf("%s\n", seq1);
}
// reuse seq0
seq0 = seq1;
// not necessary but deemed good style by some
// although frowned upon by others
seq1 = NULL;
}
// free the last memory
free(seq0);
exit(EXIT_SUCCESS);
}

How to avoid duplicates when finding all k-length substrings

I want to display all substrings with k letters, one per line, but avoid duplicate substrings. I managed to write to a new string all the k length words with this code:
void subSent(char str[], int k) {
int MaxLe, i, j, h, z = 0, Length, count;
char stOu[1000] = {'\0'};
Length = (int)strlen(str);
MaxLe = maxWordLength(str);
if((k >= 1) && (k <= MaxLe)) {
for(i = 0; i < Length; i++) {
if((int)str[i] == 32) {
j = i = i + 1;
} else {
j = i;
}
for(; (j < i + k) && (Length - i) >= k; j++) {
if((int)str[j] != 32) {
stOu[z] = str[j];
} else {
stOu[z] = str[j + 1];
}
z++;
}
stOu[z] = '\n';
z++;
}
}
}
But I'm struggling with the part that needs to save only one time of a word.
For example, the string HAVE A NICE DAY
and k = 1 it should print:
H
A
V
E
N
I
C
D
Y
Your subSent() routine poses a couple of challenges: first, it neither returns nor prints it's result -- you can only see it in the debugger; second it calls maxWordLength() which you didn't supply.
Although avoiding duplicates can be complicated, in the case of your algorithm, it's not hard to do. Since all your words are fixed length, we can walk the output string with the new word, k letters (plus a newline) at a time, doing strncmp(). In this case the new word is the last word added so we quit when the pointers meet.
I've reworked your code below and added a duplication elimination routine. I didn't know what maxWordLength() does so I just aliased it to strlen() to get things running:
#include <stdio.h>
#include <string.h>
#include <stdbool.h>
#define maxWordLength strlen
// does the last (fixed size) word in string appear previously in string
bool isDuplicate(const char *string, const char *substring, size_t n) {
for (const char *pointer = string; pointer != substring; pointer += (n + 1)) {
if (strncmp(pointer, substring, n) == 0) {
return true;
}
}
return false;
}
void subSent(const char *string, int k, char *output) {
int z = 0;
size_t length = strlen(string);
int maxLength = maxWordLength(string);
if (k >= 1 && k <= maxLength) {
for (int i = 0; i < length - k + 1; i++) {
int start = z; // where does the newly added word begin
for (int j = i; (z - start) < k; j++) {
output[z++] = string[j];
while (string[j + 1] == ' ') {
j++; // assumes leading spaces already dealt with
}
}
output[z++] = '\n';
if (isDuplicate(output, output + start, k)) {
z -= k + 1; // last word added was a duplicate so back it out
}
while (string[i + 1] == ' ') {
i++; // assumes original string doesn't begin with a space
}
}
}
output[z] = '\0'; // properly terminate the string
}
int main() {
char result[1024];
subSent("HAVE A NICE DAY", 1, result);
printf("%s", result);
return 0;
}
I somewhat cleaned up your space avoidance logic but it can be tripped by leading spaces on the input string.
OUTPUT
subSent("HAVE A NICE DAY", 1, result);
H
A
V
E
N
I
C
D
Y
subSent("HAVE A NICE DAY", 2, result);
HA
AV
VE
EA
AN
NI
IC
CE
ED
DA
AY
subSent("HAVE A NICE DAY", 3, result);
HAV
AVE
VEA
EAN
ANI
NIC
ICE
CED
EDA
DAY

Segmentation fault in base number program?

I keep trying to test this code but I keep getting a segmentation fault in my power() function. The code is supposed to take a word made up of lowercase letters and change the word to a number of base 10. The word is supposed to take on the form of a number of base 20, where 'a' = 0, 'b' = 1,...., 't' = 19;
int power(int i){
if(i==1){
return 20;
}else{
return 20*power(i--);
}
}
int main(){
int len;
char mayan[6];
int n;
int val;
while(scanf("%s", mayan)){
val = 0;
n = 0;
for(len = 0; mayan[len] != '\0'; len++){
mayan[len] = tolower(mayan[len]);
mayan[len] = mayan[len] - 'a';
}
for(i = 0; len >= 0; len--, i++){
if(mayan[len] <= 19){
n = n + mayan[len] * power(i);
}else{
fprintf(stderr, "Error, not a base 20 input \n");
val = 1;
break;
}
}
if(val==0){
printf("%d \n", n);
}
}
return val;
}
There were three mistakes in your code.
Case for i==0 not added in the power function, which basically translates to any number to the power of zero is one i.e. x^0 = 1;.
Instead of using return 20*power(i--); for your recursive call, use return 20*power(i-1);. i-- is post decrement operator, which means that, it will return the value of i as it is and will the decrement it for further use, which is not what you want. Also, you altogether don't even want to change the value of i for this iteration too; what you want to do is use a value one less than i for the next iteration, which is what, passing i-1, will do.
Add a len-- in the initialization of the for(i = 0; len >= 0; len--, i++) loop, because len is now over the last index of the input because of the previous loop.
Correcting these mistakes the final code is:
#include<stdio.h>
int power(int i)
{
if(i==0)
{
return 1;
}
if(i==1)
{
return 20;
}
else
{
return 20*power(i-1);
}
}
int main()
{
int len,i;
char mayan[6];
int n;
int val;
while(scanf("%s", mayan))
{
val = 0;
n = 0;
for(len = 0; mayan[len] != '\0'; len++)
{
mayan[len] = tolower(mayan[len]);
mayan[len] = mayan[len] - 'a';
}
for(i = 0, len--; len >= 0; len--, i++)
{
if(mayan[len] <= 19)
{
n = n + mayan[len] * power(i);
}
else
{
fprintf(stderr, "Error, not a base 20 input \n");
val = 1;
break;
}
}
if(val==0)
{
printf("%d \n", n);
}
}
return val;
}
Note that, your code would essentially only work for at most a five digit base 20 number, because, the array mayan that you are using to store it has size 6, of which, one character will be spent for storing the terminating character \0. I recommend that you increase the size of the array mayan unless you want to support only five digit base 20 numbers.

using array to store big numbers

i'm newbie in C programming .
i have written this code for adding two numbers with 100 digits , but i don't know why the code does not work correctly , it suppose to move the carry but it doesn't .
and the other problem is its just ignoring the first digit (most significant digit) .
can anybody help me please ?
#include <stdio.h>
#include <ctype.h>
int sum[101] = {0};
int add(int a, int b);
void main()
{
static int a[100];
static int b[100];
char ch;
int i = 0;
int t;
for (t = 0; t != 100; ++t)
{
a[t] = 0;
}
for (t = 0; t != 100; ++t)
{
b[t] = 0;
}
do
{
ch = fgetc(stdin);
if ( isdigit(ch) )
{
a[i] = ch - 48;
++i;
}
else
break;
}
while (ch != '\n' || i == 100 || i != '\0');
i = 0;
do
{
ch = fgetc(stdin);
if ( isdigit(ch) )
{
b[i] = ch - 48;
++i;
}
else
break;
}
while (ch != '\n' || i == 100 || i != '\0');
for (;i!=0; --i)
{
add(a[i], b[i]);
}
for (i==0;i != 101; ++i)
{
printf("%d", sum[i]);
}
}
int add( int a , int b)
{
static int carry = 0;
float s = 0;
static int p = 101;
if (0 <= a+b+carry <= 9)
{
sum[p] = (a + b + carry);
carry = 0;
--p;
return 0;
}
else
{
if (10 <= a+b+carry < 20)
{
s = (((a+b+carry)/10.0 ) - 1) * 10 ;
carry = ((a+b+carry)/10.0) - (s/10);
}
else
{
s = (((a+b+carry)/10 ) - 2) * 10;
carry = ((a+b+carry)/10.0) - (s/10);
}
sum[p] = s;
--p;
return 0;
}
}
Your input loops have serious problem. Also you use i to count the length of both a and b, but you don't store the length of a. So if they type two numbers that are not equal length then you will get strange results.
The losing of the first digit is because of the loop:
for (;i!=0; --i)
This will execute for values i, i-1, i-2, ..., 1. It never executes with i == 0. The order of operations at the end of each iteration of a for loop is:
apply the third condition --i
test the second condition i != 0
if test succeeded, enter loop body
Here is some fixed up code:
int a_len;
for (a_len = 0; a_len != 100; ++a_len)
{
int ch = fgetc(stdin); // IMPORTANT: int, not char
if ( ch == '\n' || ch == EOF )
break;
a[a_len] = ch;
}
Similarly for b. In fact it would be a smart idea to make this code be a function, instead of copy-pasting it and changing a to b.
Once the input is complete, then you could write:
if ( a_len != b_len )
{
fprintf(stderr, "My program doesn't support numbers of different length yet\n");
exit(EXIT_FAILURE);
}
for (int i = a_len - 1; i >= 0; --i)
{
add(a[i], b[i]);
}
Moving onto the add function there are more serious problems here:
It's not even possible to hit the case of sum being 20
Do not use floating point, it introduces inaccuracies. Instead, doing s = a+b+carry - 10; carry = 1; achieves what you want.
You write out of bounds of sum: an array of size [101] has valid indices 0 through 100. But p starts at 101.
NB. The way that large-number code normally tackles the problems of different size input, and some other problems, is to have a[0] be the least-significant digit; then you can just expand into the unused places as far as you need to go when you are adding or multiplying.

Resources