C Program to count substrings - c

Why does my program always skip the last substring count?
eg1. String: dbdbsnasdb dbdxx
Substring: db
count: 4 (no error)
eg2. String: dbdbsnasmfdb
Substring: db
count: 2 (supposed to be 3)
** #include <stdio.h> only
int countSubstr(char string[], char substring[]) {
int i, j;
int count = 0;
int subcount = 0;
for (i = 0; i <= strlen(string);) {
j = 0;
count = 0;
while ((string[i] == substring[j])) {
count++;
i++;
j++;
}
if (count == strlen(substring)) {
subcount++;
count = 0;
} else
i++;
}
return subcount;
}
and why must I declare my j and count to be 0 in the for loop? is it because j has to remain as 0 (substring remains the same) whenever it loops?

Your inner loop (while) can continue to compare well past the null terminators in either strings.
You need to stop it as soon as one of the strings reach their terminating null character.
Your outer loop condition has an off-by-one error. But you don't need strlen call anyway. Just iterate until the null character.
You can also move the strlen(substring) outside the loop to avoid potentially recalculating it.
A better version might look like:
int countSubstr(char string[], char substring[])
{
int subcount = 0;
size_t sub_len = strlen(substring);
if (!sub_len) return 0;
for (size_t i = 0;string[i];) {
size_t j = 0;
size_t count = 0;
while (string[i] && string[j] && string[i] == substring[j]) {
count++;
i++;
j++;
}
if (count == sub_len) {
subcount++;
count = 0;
}
else {
i = i - j + 1; /* no match, so reset to the next index in 'string' */
}
}
return subcount;
}

There are some issues in your code:
The loop for (i = 0; i <= strlen(string);) recomputes the length of the string once per iteration of the loop and you iterate one time too far. You should instead write: for (i = 0; string[i] != '\0';)
The second loop may run beyond the end of the string, and produce a value of count that is loo large: it will produce at least 3 for the second example as the null terminator is counted in all cases. This explains why you get an incorrect count of matches. The behavior is actually undefined as you are reading beyond the end of both strings.
Here is an corrected version:
int countSubstr(char string[], char substring[]) {
int len = strlen(string);
int sublen = strlen(substring);
int i, j, count = 0;
for (i = 0; i <= len - sublen; i++) {
for (j = 0; j < sublen && string[i + j] == substring[j]; j++)
continue;
if (j == sublen)
count++;
}
return count;
}
Note that the number of occurrences of the empty string in any given string will come out as one plus the length of the string, which does make sense.
Note also that this code returns 2 for countSubstr("bbb", "bb") which may of may not be what you expect. The accepted answer returns 1, which is arguable.

This works for all edge-cases I tested
#include <stdio.h>
int countSubstr(char string[], char substring[])
{
int count = 0;
size_t i = 0;
while(string[i])
{
int match = 1;
size_t j = 0;
while (substring[j])
{
match &= substring[j] == string[i + j];
j++;
}
count += match;
i++;
}
return count;
}
Here are some test cases:
void test(char name[], int expected, char string[], char substring[]){
int actual = countSubstr(string, substring);
char* status = (actual == expected)? "PASS" : "FAIL";
printf("%s: %s\nActual: %d\nExpected: %d\n\n",name,status,actual,expected);
}
int main(void) {
test("Two empty strings", 0, "", "");
test("Empty substring", 19, "sub str sub str sub", "");
test("Empty string", 0, "", "sub");
test("Case 1", 4, "dbdbsnasdb dbdxx", "db");
test("Case 2", 3, "dbdbsnasmfdb", "db");
test("No match", 0, "dbdbsnasmfdb", "dxb");
test("Inner matching", 3, "abababa", "aba");
test("Identity test", 1, "a", "a");
return 0;
}

In your while loop, you haven't check if you get past the string length.
edit:
Remember that in C all string have a '\0' at the end but in your while loop you don't check it.
On your particular example we get (starting at last db):
i = 10, j = 0, count = 1 (check for 'd')
i = 11, j = 1, count = 2 (check for 'b')
i = 12, j = 2, count = 3 (check for '\0')
i = 13, j = 3, count = 3 (exit loop)
count = 3 is different than strlen(substring) == 2
-> no increase on subcount

for (i = 0; i <= strlen(string);)
Must be
for (i = 0; i < strlen(string);)
Use two for instead a very complex loop becouse it's more easy to debug .

Related

How to remove all repeated character in character array

I want to remove all the repeated characters from array. here is example.
"aabccdee"
"bd"
I'm doing this C language. use only array, loop, if,else(conditional statements) not using pointer.
#include<stdio.h>
int main() {
char c[10];
char com[10] = {0,};
char result[10] = { 0, };
int cnt = 0;
for (int i = 0; i < 10; i++) {
scanf("%c", &c[i]);
}
for (int i = 0; i < 10; i++) {
for (int j = i+1; j < 10; j++) {
if (c[i] == c[j]) {
com[i] = c[i];
cnt++;
printf("%c", com[i]);
}
}
}
for (int i = 0; i < cnt; i++) {
for (int j = 0; j < 10; j++) {
if (com[i] != c[j]) {
result[j] = c[j];
}
}
}
printf("\n");
for (int i = 0; i < 10; i++) {
printf("%c", result[i]);
}
}
I thought this
Make repeated array
Compare original array to repeated array
Output
But repeated array loop can't looping all original array.
How can I do remove all repeated character?
Not good SO policy to blatantly answer homework, but I rarely do it and thought this was an interesting task. Certainly making no claims on efficiency, but it looks like it works to me. As far as I can tell, the first and last cases are corner cases, so I handle those individually, and use a loop for everything in the middle. If you're not allowed to use strlen, then you can roll your own or use some other method, that's not the primary focus of this problem (would be best to fgets the string from a command line argument).
#include <stdio.h>
#include <string.h>
int main(void)
{
char source[] = "aabccdee";
char result[sizeof(source)] = { 0 };
unsigned resultIndex = 0;
unsigned i = 0;
// do this to avoid accessing out of bounds of source.
if (strlen(source) > 1)
{
// handle the first case, compare index 0 to index 1. If they're unequal, save
// index 0.
if (source[i] != source[i+1])
{
result[resultIndex++] = source[i];
}
// source[0] has already been checked, increment i to 1.
i++;
// comparing to strlen(source) - 1 because in this loop we are comparing the
// previous and next characters to the current. Looping from 1 to second-to-the-
// last char means we stay in bounds of source
for ( ; i < strlen(source) - 1; i++)
{
if (source[i-1] != source[i] && source[i] != source[i+1])
{
// write to result if curr char != prev char AND curr char != next char
result[resultIndex++] = source[i];
}
}
// handle the end. At this point, i == the last index of the string. Compare to
// previous character. If they're not equal, save the last character.
//
if (source[i] != source[i-1])
{
result[resultIndex] = source[i];
}
}
else if (strlen(source) == 1)
{
// if source is only 1 character, then it's trivial
result[resultIndex] = source[i];
}
else
{
// source has no length
fprintf(stderr, "source has no length.\n");
return -1;
}
// print source and result
printf("source = %s\n", source);
printf("result = %s\n", result);
return 0;
}
Various outputs for source:
source = "aabccdee"
result = "bd"
source = "aaee"
result =
source = "a"
result = "a"
source = "abcde"
result = "abcde"
source = "abcdee"
result = "abcd"
source = "aabcde"
result = "bcde"
source = "aaaaaaaaaaaabdeeeeeeee"
result = "bd"
source = ""
source has no length.
first of all before we speak , you have to check this
you need to put a whitespace when scaning a char using scanf
so
scanf("%c", &c[i]);
becomes
scanf(" %c", &c[i]);
secondly your idea is kinda a messy as the result showed you're only handling cases and it doesn't continue verifying the whole array . you need to learn how to shift an array to the right or left
your issue later on that when you shift your table(not completely) you still print out of the size .
so bascilly in general your code should be something like this :
#include<stdio.h>
int main() {
char c[10];
int length=5;
for (int i = 0; i < 5; i++) {
scanf(" %c", &c[i]);
}
int j,k,i;
for(i=0; i<length; i++)
{
for(j=i+1; j<length; j++)
{
if(c[i] == c[j])
{
length--;
for(k=j; k<length; k++)
{
c[k] = c[k + 1];
}
j--;
}
}
}
printf("\n");
for (int i = 0; i < length; i++) {
printf("%c", c[i]);
}
}
you simply take one case and compare it to the rest , if it exists you shift from the position you find for the second time the element and so on

Program to count length of each word in string in C

I'm writting a program to count the length of each word in array of characters. I was wondering if You guys could help me, because I'm struggling with it for at least two hours for now and i don't know how to do it properly.
It should go like that:
(number of letters) - (number of words with this many letters)
2 - 1
3 - 4
5 - 1
etc.
char tab[1000];
int k = 0, x = 0;
printf("Enter text: ");
fgets(tab, 1000, stdin);
for (int i = 2; i < (int)strlen(tab); i++)
{
for (int j = 0; j < (int)strlen(tab); j++)
{
if (tab[j] == '\0' || tab[j]=='\n')
break;
if (tab[j] == ' ')
k = 0;
else k++;
if (k == i)
{
x++;
k = 0;
}
}
if (x != 0)
{
printf("%d - %d\n", i, x);
x = 0;
k = 0;
}
}
return 0;
By using two for loops, you're doing len**2 character scans. (e.g.) For a buffer of length 1000, instead of 1000 character comparisons, you're doing 1,000,000 comparisons.
This can be done in a single for loop if we use a word length histogram array.
The basic algorithm is the same as your inner loop.
When we have a non-space character, we increment a current length value. When we see a space, we increment the histogram cell (indexed by the length value) by 1. We then set the length value to 0.
Here's some code that works:
#include <stdio.h>
int
main(void)
{
int hist[100] = { 0 };
char buf[1000];
char *bp;
int chr;
int curlen = 0;
printf("Enter text: ");
fflush(stdout);
fgets(buf,sizeof(buf),stdin);
bp = buf;
for (chr = *bp++; chr != 0; chr = *bp++) {
if (chr == '\n')
break;
// end of word -- increment the histogram cell
if (chr == ' ') {
hist[curlen] += 1;
curlen = 0;
}
// got an alpha char -- increment the length of the word
else
curlen += 1;
}
// catch the final word on the line
hist[curlen] += 1;
for (curlen = 1; curlen < sizeof(hist) / sizeof(hist[0]); ++curlen) {
int count = hist[curlen];
if (count > 0)
printf("%d - %d\n",curlen,count);
}
return 0;
}
UPDATE:
and i don't really understand pointers. Is there any simpler method to do this?
Pointers are a very important [essential] tool in the C arsenal, so I hope you get to them soon.
However, it is easy enough to convert the for loop (Removing the char *bp; and bp = buf;):
Change:
for (chr = *bp++; chr != 0; chr = *bp++) {
Into:
for (int bufidx = 0; ; ++bufidx) {
chr = buf[bufidx];
if (chr == 0)
break;
The rest of the for loop remains the same.
Here's another loop [but, without optimization by the compiler] double fetches the char:
for (int bufidx = 0; buf[bufidx] != 0; ++bufidx) {
chr = buf[bufidx];
Here is a single line version. Note this is not recommended practice because of the embedded assignment of chr inside the loop condition clause, but is for illustration purposes:
for (int bufidx = 0; (chr = buf[bufidx]) != 0; ++bufidx) {

How to find first position of substring in a string in C [duplicate]

I have created a function that should find the numerical position of the first character of a substring in a larger string. I am having some problems with the output and I am not too sure why. These problems include -1 being returned every single time instead of the integer position of the substring. I have debugged and cannot trace where the function goes wrong.
This is how the function should perform: If my string is "The dog was fast" and I am searching for the substring "dog", the function should return 4. Thanks to chqrlie for help with the loop.
Here is the function:
int findSubString(char original[], char toFind[]) {
size_t i, j;
int originalLength = 0;
int toFindLength = 0;
originalLength = strlen(original) + 1;
toFindLength = strlen(toFind) + 1;
for (i = 0; i < toFindLength + 1; i++) {
for (j = 0; j < originalLength + 1; j++) {
if (toFind[j] == '\0') {
return i;
}
if (original[i + j] != toFind[j]) {
break;
}
}
if (original[i] == '\0') {
return -1;
}
}
}
The function parameters cannot be modified, this is a requirement. Any help appreciated!
These statements inside the loops
if (toFind[j] == '\0') {
return i;
}
results in undefined behavior because the string toFind can be shorter than the string original.
The same is valid for this loop
if (original[i + j] != toFind[j]) {
break;
}
because i + j can be greater than the length of the string original.
And there is no need to scan all characters of the string original if you are going to find a substring inside it.
Also you should check whether the length of the string original is not less than the length of the string toFind.
If you want to find only the first character of the string toFind in the string original it is enough to use standard C function strchr. If you want to find the whole string toFind in the string original then you could use another C standard function strstr.
If you want to write the function yourself to find a string in other string then it can look for example the following way
I declared the function like
long long int findSubString( const char original[], const char toFind[] );
however you can write its declaration as you like for example like
int findSubString( char original[], char toFind[] );
But in this case you should declare function local variable success like
int success = -1;
and output the result using format specifier "%d" instead of "%lld".
Here you are.
#include <stdio.h>
#include <string.h>
#include <stddef.h>
long long int findSubString( const char original[], const char toFind[] )
{
size_t n = strlen( original );
size_t m = strlen( toFind );
long long int success = -1;
if ( !( n < m ) )
{
n = n - m + 1;
for ( size_t i = 0; success == -1 && i < n; i++ )
{
size_t j = 0;
while ( j < m && original[i+j] == toFind[j] ) j++;
if ( j == m ) success = i;
}
}
return success;
}
int main(void)
{
printf( "%lld\n", findSubString( "The dog was fast", "dog" ) );
return 0;
}
Its output is
4
Your loops are reversed. The outer loop should walk positions from zero to originalLength, inclusive; the nested loop should walk positions from zero to toFindLength, inclusive.
Both originalLength and toFindLength should be set to values returned by strlen, not strlen plus one, because null terminator position is not a good start.
Finally, you are returning -1 from inside the outer loop. This is too early - you should be returning -1 only after you are done with the outer loop as well.
Your loop counter tests are incorrect: wrong upper limit and the limits are off by one. Note that the tests are actually not necessary as you exit both loops when hitting the '\0' terminators.
Here is a simpler version:
int findSubString(const char *original, const char *toFind) {
for (size_t i = 0;; i++) {
for (size_t j = 0;; j++) {
if (toFind[j] == '\0') {
return i;
}
if (original[i + j] != toFind[j]) {
break;
}
}
if (original[i] == '\0') {
return -1;
}
}
}
There is a small advantage at computing the string lengths to reduce the number of comparisons in pathological cases such as findSubString("aaaaaaaaaaa", "aaaaaaaaaaaa");
int findSubString(const char *original, const char *toFind) {
size_t originalLength = strlen(original);
size_t toFindLength = strlen(toFind);
if (toFindLength <= originalLength) {
for (size_t i = 0; i <= originalLength - toFindLength; i++) {
for (size_t j = 0;; j++) {
if (toFind[j] == '\0') {
return i;
}
if (original[i + j] != toFind[j]) {
break;
}
}
}
}
return -1;
}

How to avoid duplicates when finding all k-length substrings

I want to display all substrings with k letters, one per line, but avoid duplicate substrings. I managed to write to a new string all the k length words with this code:
void subSent(char str[], int k) {
int MaxLe, i, j, h, z = 0, Length, count;
char stOu[1000] = {'\0'};
Length = (int)strlen(str);
MaxLe = maxWordLength(str);
if((k >= 1) && (k <= MaxLe)) {
for(i = 0; i < Length; i++) {
if((int)str[i] == 32) {
j = i = i + 1;
} else {
j = i;
}
for(; (j < i + k) && (Length - i) >= k; j++) {
if((int)str[j] != 32) {
stOu[z] = str[j];
} else {
stOu[z] = str[j + 1];
}
z++;
}
stOu[z] = '\n';
z++;
}
}
}
But I'm struggling with the part that needs to save only one time of a word.
For example, the string HAVE A NICE DAY
and k = 1 it should print:
H
A
V
E
N
I
C
D
Y
Your subSent() routine poses a couple of challenges: first, it neither returns nor prints it's result -- you can only see it in the debugger; second it calls maxWordLength() which you didn't supply.
Although avoiding duplicates can be complicated, in the case of your algorithm, it's not hard to do. Since all your words are fixed length, we can walk the output string with the new word, k letters (plus a newline) at a time, doing strncmp(). In this case the new word is the last word added so we quit when the pointers meet.
I've reworked your code below and added a duplication elimination routine. I didn't know what maxWordLength() does so I just aliased it to strlen() to get things running:
#include <stdio.h>
#include <string.h>
#include <stdbool.h>
#define maxWordLength strlen
// does the last (fixed size) word in string appear previously in string
bool isDuplicate(const char *string, const char *substring, size_t n) {
for (const char *pointer = string; pointer != substring; pointer += (n + 1)) {
if (strncmp(pointer, substring, n) == 0) {
return true;
}
}
return false;
}
void subSent(const char *string, int k, char *output) {
int z = 0;
size_t length = strlen(string);
int maxLength = maxWordLength(string);
if (k >= 1 && k <= maxLength) {
for (int i = 0; i < length - k + 1; i++) {
int start = z; // where does the newly added word begin
for (int j = i; (z - start) < k; j++) {
output[z++] = string[j];
while (string[j + 1] == ' ') {
j++; // assumes leading spaces already dealt with
}
}
output[z++] = '\n';
if (isDuplicate(output, output + start, k)) {
z -= k + 1; // last word added was a duplicate so back it out
}
while (string[i + 1] == ' ') {
i++; // assumes original string doesn't begin with a space
}
}
}
output[z] = '\0'; // properly terminate the string
}
int main() {
char result[1024];
subSent("HAVE A NICE DAY", 1, result);
printf("%s", result);
return 0;
}
I somewhat cleaned up your space avoidance logic but it can be tripped by leading spaces on the input string.
OUTPUT
subSent("HAVE A NICE DAY", 1, result);
H
A
V
E
N
I
C
D
Y
subSent("HAVE A NICE DAY", 2, result);
HA
AV
VE
EA
AN
NI
IC
CE
ED
DA
AY
subSent("HAVE A NICE DAY", 3, result);
HAV
AVE
VEA
EAN
ANI
NIC
ICE
CED
EDA
DAY

Print out the longest substring in c

Suppose that we have a string "11222222345646". So how to print out subsequence 222222 in C.
I have a function here, but I think something incorrect. Can someone correct it for me?
int *longestsubstring(int a[], int n, int *length)
{
int location = 0;
length = 0;
int i, j;
for (i = 0, j = 0; i <= n-1, j < i; i++, j++)
{
if (a[i] != a[j])
{
if (i - j >= *length)
{
*length = i - j;
location = j;
}
j = i;
}
}
return &a[location];
}
Sorry,I don't really understand your question.
I just have a little code,and it can print the longest sub string,hope it can help.
/*breif : print the longest sub string*/
void printLongestSubString(const char * str,int length)
{
if(length <= 0)
return;
int i ;
int num1 = 0,num2 = 0;
int location = 0;
for(i = 0; i< length - 1; ++i)
{
if(str[i] == str[i+1])
++num2;//count the sub string ,may be not the longest,but we should try.
else
{
if(num2 >num1)//I use num1 store the sum longest of current sub string.
{ num1 = num2;location = i - num2;}
else
;//do nothing for short sub string.
num2 = 0;
}
}
for(i = location;str[i]== str[num1];++i)
printf("%c",str[i]);
printf("\n");
}
int main()
{
char * str = "1122222234566";
printLongestSubString(str,13);
return 0;
}
From your code it appears you want to return the longest sub-sequence (sub-string). Since I'm relearning C I thought I would give it a shot.
I've used strndup to extract the substring. I'm not sure how portable it is but I found an implementation if needed, just click on the link. It will allocate memory to store the new cstring so you have to remember to free the memory once finished with the substring. Following your argument list, the length of the sub-string is returned as the third argument of the extraction routine.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
char *extract_longest_subsequence(const char *str, size_t str_len, size_t *longest_len);
int main()
{
char str[] = "11222234555555564666666";
size_t substr_len = 0;
char *substr = extract_longest_subsequence(str, sizeof(str), &substr_len);
if (!substr)
{
printf("Error: NULL sub-string returned\n");
return 1;
}
printf("original string: %s, length: %zu\n", str, sizeof(str)-1);
printf("Longest sub-string: %s, length: %zu\n", substr, substr_len);
/* Have to remember to free the memory allocated by strndup */
free(substr);
return 0;
}
char *extract_longest_subsequence(const char *str, size_t str_len, size_t *longest_len)
{
if (str == NULL || str_len < 1 || longest_len == NULL)
return NULL;
size_t longest_start = 0;
*longest_len = 0;
size_t curr_len = 1;
size_t i = 0;
for (i = 1; i < str_len; ++i)
{
if (str[i-1] == str[i])
{
++curr_len;
}
else
{
if (curr_len > *longest_len)
{
longest_start = i - curr_len;
*longest_len = curr_len;
}
curr_len = 1;
}
}
/* strndup allocates memory for storing the substring */
return strndup(str + longest_start, *longest_len);
}
It looks like in your loop that j is supposed to be storing where the current "substring" starts, and i is the index of the character that you are currently looking at. In that case, you want to change
for (i = 0, j = 0; i <= n-1, j < i; i++, j++)
to
for (i = 0, j = 0; i <= n-1; i++)
That way, you are using i to store which character you're looking at, and the j = i line will "reset" which string of characters you are checking the length of.
Also, a few other things:
1) length = 0 should be *length = 0. You probably don't actually want to set the pointer to point to address 0x0.
2) That last line would return where your "largest substring" starts, but it doesn't truncate where the characters start to change (i.e. the resulting string isn't necessarily *length long). It can be intentional depending on use case, but figured I'd mention it in case it saves some grief.

Resources