Counting Alphabetic Characters That Are Contained in an Array with C - c

I am having trouble with a homework question that I've been working at for quite some time.
I don't know exactly why the question is asking and need some clarification on that and also a push in the right direction.
Here is the question:
(2) Solve this problem using one single subscripted array of counters. The program uses an array of characters defined using the C initialization feature. The program counts the number of each of the alphabetic characters a to z (only lower case characters are counted) and prints a report (in a neat table) of the number of occurrences of each lower case character found. Only print the counts for the letters that occur at least once. That is do not print a count if it is zero. DO NOT use a switch statement in your solution. NOTE: if x is of type char, x-‘a’ is the difference between the ASCII codes for the character in x and the character ‘a’. For example if x holds the character ‘c’ then x-‘a’ has the value 2, while if x holds the character ‘d’, then x-‘a’ has the value 3. Provide test results using the following string:
“This is an example of text for exercise (2).”
And here is my source code so far:
#include<stdio.h>
int main() {
char c[] = "This is an example of text for exercise (2).";
char d[26];
int i;
int j = 0;
int k;
j = 0;
//char s = 97;
for(i = 0; i < sizeof(c); i++) {
for(s = 'a'; s < 'z'; s++){
if( c[i] == s){
k++;
printf("%c,%d\n", s, k);
k = 0;
}
}
}
return 0;
}
As you can see, my current solution is a little anemic.
Thanks for the help, and I know everyone on the net doesn't necessarily like helping with other people's homework. ;P

char c[] = "This is an example of text for exercise (2).";
int d[26] = {0}, i, value;
for(i=0; i < sizeof(c) - 1; i++){ //-1 to exclude terminating NULL
value = c[i]-'a';
if(value < 26 && value >= 0) d[value]++;
}
for(i=0; i < 26; i++){
if(d[i]) printf("Alphabet-%c Count-%d\n", 'a'+i, d[i]);
}
Corrected. Thanks caf and Leffler.

The intention of the question is for you to figure out how to efficiently convert a character between 'a' and 'z' into an index between 0 and 25. You are apparently allowed to assume ASCII for this (although the C standard does not guarantee any particular character set), which has the useful property that values of the characters 'a' through 'z' are sequential.
Once you've done that, you can increment the corresponding slot in your array d (note that you will need to initialise that array to all-zeroes to begin with, which can be done simply with char d[26] = { 0 };. At the end, you'd scan through the array d and print the counts out that are greater than zero, along with the corresponding character (which will involve the reverse transformation - from an index 0 through 25 into a character 'a' through 'z').

Fortunately for you, you do not seem to be required to produce a solution that would work on an EBCDIC machine (mainframe).
Your inner loop needs to be replaced by a conditional:
if (c[i] is lower-case alphabetic)
increment the appropriate count in the d-array
After finishing the string, you then need a loop to scan through the d-array, printing out the letter corresponding to the entry and the count associated with it.
Your d-array uses 'char' for the counts; that is OK for the exercise but you would probably need to use a bigger integer type for a general purpose solution. You should also ensure that it is initialized to all zeros; it is difficult to get meaningful information out of random garbage (and the language does not guarantee that anything other than garbage will be on the stack where the d-array is stored).

char c[] = "This is an example of text for exercise (2).";
char d[26];
int i;
int j;
for(i = 0; i < 26; i++)
{
d[i] = 0; // Set the frequency of the letter to zero before we start counting.
for(j = 0; j < strlen(c); j++)
{
if(c[j] == i + 'a')
d[i]++;
}
if(d[i] > 0) // If the frequency of the letter is greater than 0, show it.
printf("%c - %d\n", (i + 'a'), d[i]);
}

.
for(s = 'a'; s < 'z'; s++){
j=0;
for(i = 0; i < sizeof(c); i++) {
if( c[i] == s )
j++;
}
if (j > 0)
printf("%c,%d\n", s, j);
}

Related

Why is this C program detecting two '\0' characters in a string when an unconnected line of code gets uncommented?

I am currently learning C and I wrote this program to check if the '\0' char is really at the end of strings, as pointed in "K and R".
I had the strangest result though.
If I comment the "int lista[] = {0, 1, 2, 3, 4};" statement out of the program (this is a statement that has nothing to do with the other statements of this program, it was part of another test I was going to make).
the output of the program comes out how expected, detecting one '\0' char ending the string.
However, if I leave the statement uncommented, the program output detects two '\0' chars at the end of the string.
Why does this happens?
This is the program with the statement uncommented:
#include <stdio.h>
int main(void)
{
int lista[] = {0, 1, 2, 3, 4};
char string[] = "linhas";
for (int i = 0; i <= sizeof(string); i++)
{
if (string[i] != '\0')
{
printf("%c\n", string[i]);
}
else
{
printf("this dawmn null char\n");
}
}
}
This outputs:
l
i
n
h
a
s
this dawmn null char
this dawmn null char
this is the program with the line commented out:
#include <stdio.h>
int main(void)
{
/*int lista[] = {0, 1, 2, 3, 4};*/
char string[] = "linhas";
for (int i = 0; i <= sizeof(string); i++)
{
if (string[i] != '\0')
{
printf("%c\n", string[i]);
}
else
{
printf("this dawmn null char\n");
}
}
}
it outputs:
l
i
n
h
a
s
this dawmn null char
Your loop
for (int i = 0; i <= sizeof(string); i++)
is ever-so-slightly wrong. It should be
for (int i = 0; i < sizeof(string); i++)
By using <=, you make one too many trips through the loop, and you access memory outside of the string array. It looks like, with the lista array in place, the extra byte you mistakenly access (outside of the string array) happens to be a 0, so you get an extra, second printout of your "this dawmn null char" message.
But then, when you comment out the lista array, it must be the case that the extra byte you mistakenly access isn't 0, so it gets printed as itself, instead. It might be an invisible control character, which is why you don't see anything. I suggest changing your code to
if (string[i] != '\0')
printf("string contains %d\n", string[i]);
else printf("this damn null char\n");
to see this more clearly.
The important lesson here is that if you have a loop that's supposed to run N times, there are two ways to write it. In C, the vast majority of the time, you want to write it as
for(i = 0; i < N; i++)
That's a "0-based" loop, that runs from 0 to N-1, for a total of N trips. Once in a while, you want a 1-based loop:
for(i = 1; i <= N; i++)
This runs from 1 to N, again for a total of N trips. But if you write
for(i = 0; i <= N; i++) /* usually WRONG */
your loop runs from 0 to N, for a total of N+1 trips.
Do not confuse strlen(string) which in your case should be 6 and sizeof(string) which is the size of the array including the '\0' byte ! ;-)
In the case of a string declared as an array with "automatic size" the difference is only one but if you had char string[256], sizeof(string) would not be the same has strlen(string) + 1.
With char *string, sizeof(string) would likely be 8 or 4.
#SteveSummit has explained everything in detail. Here is a short answer.
Accessing the element lista[sizeof(lista)] is undefined behavior, so it's "pointless" to discuss what value it should have. I quoted pointless, because it can be a good thing to understand how undefined behavior manifests itself for debugging purposes. But if this code were to go into production, you should NEVER access lista[sizeof(lista)]. It's always out of bound and always a bug.

Storing multiple strings In C

How can multiple strings be stored in C?
if we consider that the number of strings are taken from user , how can we save them in c language , considering that we had not declared any char strings before asking the number of strings from user , because the number of strings was not available.here is what i did but i ended up printing the first character of strings.
#include <stdio.h>
int main()
{
int a;
scanf("%d",&a);
char array[a][1];
for(int i = 0 ; i < a ; i++)
{
for(int j = 0 ; j < 1 ; j++)
{
scanf("%s",&array[i][j]);
}
}
for(int i = 0 ; i < a ; i++)
{
for(int j = 0 ; j < 1 ; j++)
{
printf("%c",array[i][j]);
}
printf("\n");
}
}
Let's start from the beginning.
I don't think you really understand what is char though. Char is only ONE character. So things like char array[16][1] mean that you have array of 16 strings where each string have maximum length of one. Also your next step scanf("%s", &array[i][j]); doesn't make sense, since you're getting multiple symbols as an input, but you're writing only to single character. Proper solution would be something like this:
char array[a][255]; // 255 will be maximum length of one 'string'
for(int i = 0; i < a; ++i) {
scanf("%s", array[i]);
}
As you can see, you don't need & sign here, because array[i] already returns address of the first character in the string. The same thing applies to printing. Proper way is to do following thing:
for(int i = 0; i < a; ++i) {
printf("%s\n", array[i]);
}
Your solution only displays one character.
And remember, char is just basic number, ranging typically from 0-255 (if compiler defaults char to unsigned char). Your code implies that you treat char as a full C++ string, which it definitely isn't.

How to reduce time complexity in traversing a string?

I was solving a problem to find number of such indexes, a, b, c, d in a string s, of size n made only of lowercase letters such that:
1 <= a < b < c < d <= n
and
s[a] == s[c] and s[b] == s[d]
The code I wrote traverses the string character by character in a basic manner:
#include<stdio.h>
int main()
{
int n, count = 0;
char s[2002];
scanf("%d%s", &n, s);
for(int a = 0; a<n-3; a++)
{
for(int b = a + 1; b<n-2; b++)
{
for(int c = b + 1; c<n-1; c++)
{
for(int d = c + 1; d<n; d++)
{
if(s[a] == s[c] && s[b] == s[d] && a>=0 && b>a && c>b && d>c && d<n)
{
count++;
}
}
}
}
}
printf("%d", count);
return 0;
}
a, b, c and d are the indices.
The trouble is that if the input string is big in size, the time limit is exceeded due to the 4 nested loops. Is there any way I can improve the code to decrease the complexity?
The problem statement is available here: https://www.hackerearth.com/practice/algorithms/searching/linear-search/practice-problems/algorithm/holiday-season-ab957deb/
The problem can be solved if you maintain an array which stores the cumulative frequency (the total of a frequency and all frequencies so far in a frequency distribution) of each character in the input string. Since the string will only consist of lower case characters, hence the array size will be [26][N+1].
For example:
index - 1 2 3 4 5
string - a b a b a
cumulativeFrequency array:
0 1 2 3 4 5
a 0 1 1 2 2 3
b 0 0 1 1 2 2
I have made the array by taking the index of first character of the input string as 1. Doing so will help us in solving the problem later. For now, just ignore column 0 and assume that the string starts from index 1 and not 0.
Useful facts
Using cumulative frequency array we can easily check if a character is present at any index i:
if cumulativeFrequency[i]-cumulativeFrequency[i-1] > 0
number of times a character is present from range i to j (excluding both i and j):
frequency between i and j = cumulativeFrequency[j-1] - cumulativeFrequency[i]
Algorithm
1: for each character from a-z:
2: Locate index a and c such that charAt[a] == charAt[c]
3: for each pair (a, c):
4: for character from a-z:
5: b = frequency of character between a and c
6: d = frequency of character after c
7: count += b*d
Time complexity
Line 1-2:
The outer most loop will run for 26 times. We need to locate all the
pair(a, c), to do that we require a time complexity of O(n^2).
Line 3-4:
For each pair, we again run a loop 26 times to check how many times each character is present between a and c and after c.
Line 5-7:
Using cumulative frequency array, for each character we can easily calculate how many times it appears between a and c and after c in O(1).
Hence, overall complexity is O(26*n^2*26) = O(n^2).
Code
I code in Java. I do not have a code in C. I have used simple loops an array so it should be easy to understand.
//Input N and string
//Do not pay attention to the next two lines since they are basically taking
//input using Java input streams
int N = Integer.parseInt(bufferedReader.readLine().trim());
String str = bufferedReader.readLine().trim();
//Construct an array to store cumulative frequency of each character in the string
int[][] cumulativeFrequency = new int[26][N+1];
//Fill the cumulative frequency array
for (int i = 0;i < str.length();i++)
{
//character an index i
char ch = str.charAt(i);
//Fill the cumulative frequency array for each character
for (int j = 0;j < 26;j++)
{
cumulativeFrequency[j][i+1] += cumulativeFrequency[j][i];
if (ch-97 == j) cumulativeFrequency[j][i+1]++;
}
}
int a, b, c, d;
long count = 0;
//Follow the steps of the algorithm here
for (int i = 0;i < 26;i++)
{
for (int j = 1; j <= N - 2; j++)
{
//Check if character at i is present at index j
a = cumulativeFrequency[i][j] - cumulativeFrequency[i][j - 1];
if (a > 0)
{
//Check if character at i is present at index k
for (int k = j + 2; k <= N; k++)
{
c = cumulativeFrequency[i][k] - cumulativeFrequency[i][k - 1];
if (c > 0)
{
//For each character, find b*d
for (int l = 0; l < 26; l++)
{
//For each character calculate b and d
b = cumulativeFrequency[l][k-1] - cumulativeFrequency[l][j];
d = cumulativeFrequency[l][N] - cumulativeFrequency[l][k];
count += b * d;
}
}
}
}
}
}
System.out.println(count);
I hope I have helped you. The code I provided will not give time complexity error and it will work for all test cases. Do comment if you do not understand anything in my explanation.
Performing the equality check in early stages can save you some time.
Also the check a>=0 && b>a && c>b && d>c && d<n seems to be unnecessary as you are already checking for this condition in the loops. An improved version can be as follows:
#include<stdio.h>
int main()
{
int n, count = 0;
char s[2002];
scanf("%d%s", &n, s);
for(int a = 0; a<n-3; a++)
{
for(int b = a + 1; b<n-2; b++)
{
for(int c = b + 1; c<n-1; c++)
{
if(s[a] == s[c]) {
for(int d = c + 1; d<n; d++)
{
if(s[b] == s[d])
{
count++;
}
}
}
}
}
}
printf("%d", count);
return 0;
}
Since the string S is made of only lowercase letters, you can maintain a 26x26 table (actually 25x25, ignore when i=j) that holds the appearance of all possible distinct two letter cases (e.g. ab, ac, bc, etc).
The following code tracks the completeness of each answer candidate(abab, acac, bcbc, etc) by two functions: checking for the AC position and checking for the BD position. Once the value reaches 4, it means that the candidate is a valid answer.
#include <stdio.h>
int digitsAC(int a)
{
if(a % 2 == 0)
return a + 1;
return a;
}
int digitsBD(int b)
{
if(b % 2 == 1)
return b + 1;
return b;
}
int main()
{
int n, count = 0;
char s[2002];
int appearance2x2[26][26] = {0};
scanf("%d%s", &n, s);
for(int i = 0; i < n; ++i)
{
int id = s[i] - 'a';
for(int j = 0; j < 26; ++j)
{
appearance2x2[id][j] = digitsAC(appearance2x2[id][j]);
appearance2x2[j][id] = digitsBD(appearance2x2[j][id]);
}
}
//counting the results
for(int i = 0; i < 26; ++i)
{
for(int j = 0; j < 26; ++j)
{
if(i == j)continue;
if(appearance2x2[i][j] >= 4)count += ((appearance2x2[i][j] - 2) / 2);
}
}
printf("%d", count);
return 0;
}
The time complexity is O(26N), which is equal to linear.
The code can be further accelerated by making bitwise mask operations, but I left the functions simple for clearness.
Haven't tested it a lot, please tell me if you find any bugs in it!
edit: There exists problem when handling continuous appearing letters like aabbaabb
Here is an O(n) solution (counting the number of characters in the allowed character set as constant).
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
/* As used in this program, "substring" means a string that can be formed by
characters from another string. The resulting characters are not
necessarily consecutive in the original string. For example, "ab" is a
substring of "xaxxxxbxx".
This program requires the lowercase letters to have consecutive codes, as
in ASCII.
*/
#define Max 2000 // Maximum string length supported.
typedef short T1; // A type that can hold Max.
typedef int T2; // A type that can hold Max**2.
typedef long T3; // A type that can hold Max**3.
typedef long long T4; // A type that can hold Max**4.
#define PRIT4 "lld" // A conversion specification that will print a T4.
#define L ('z'-'a'+1) // Number of characters in the set allowed.
/* A Positions structure records all positions of a character in the string.
N is the number of appearances, and Position[i] is the position (index into
the string) of the i-th appearance, in ascending order.
*/
typedef struct { T1 N, Position[Max]; } Positions;
/* Return the number of substrings "aaaa" that can be formed from "a"
characters in the positions indicated by A.
*/
static T4 Count1(const Positions *A)
{
T4 N = A->N;
return N * (N-1) * (N-2) * (N-3) / (4*3*2*1);
}
/* Return the number of substrings "abab" that can be formed from "a"
characters in the positions indicated by A and "b" characters in the
positions indicated by B. A and B must be different.
*/
static T4 Count2(const Positions *A, const Positions *B)
{
// Exit early for trivial cases.
if (A->N < 2 || B->N < 2)
return 0;
/* Sum[i] will record the number of "ab" substrings that can be formed
with a "b" at the position in B->Position[b] or earlier.
*/
T2 Sum[Max];
T3 RunningSum = 0;
/* Iterate b through the indices of B->Position. While doing this, a is
synchronized to index to a corresponding place in A->Position.
*/
for (T1 a = 0, b = 0; b < B->N; ++b)
{
/* Advance a to index into A->Position where where A->Position[i]
first exceeds B->Position[b], or to the end if there is no such
spot.
*/
while (a < A->N && A->Position[a] < B->Position[b])
++a;
/* The number of substrings "ab" that can be formed using the "b" at
position B->Position[b] is a, the number of "a" preceding it.
Adding this to RunningSum produces the number of substrings "ab"
that can be formed using this "b" or an earlier one.
*/
RunningSum += a;
// Record that.
Sum[b] = RunningSum;
}
RunningSum = 0;
/* Iterate a through the indices of A->Position. While doing this, b is
synchronized to index to a corresponding place in B->Position.
*/
for (T1 a = 0, b = 0; a < A->N; ++a)
{
/* Advance b to index into B->Position where where B->Position[i]
first exceeds A->Position[a], or to the end if there is no such
spot.
*/
while (b < B->N && B->Position[b] < A->Position[a])
++b;
/* The number of substrings "abab" that can be formed using the "a"
at A->Position[a] as the second "a" in the substring is the number
of "ab" substrings that can be formed with a "b" before the this
"a" multiplied by the number of "b" after this "a".
That number of "ab" substrings is in Sum[b-1], if 0 < b. If b is
zero, there are no "b" before this "a", so the number is zero.
The number of "b" after this "a" is B->N - b.
*/
if (0 < b) RunningSum += (T3) Sum[b-1] * (B->N - b);
}
return RunningSum;
}
int main(void)
{
// Get the string length.
size_t length;
if (1 != scanf("%zu", &length))
{
fprintf(stderr, "Error, expected length in standard input.\n");
exit(EXIT_FAILURE);
}
// Skip blanks.
int c;
do
c = getchar();
while (c != EOF && isspace(c));
ungetc(c, stdin);
/* Create an array of Positions, one element for each character in the
allowed set.
*/
Positions P[L] = {{0}};
for (size_t i = 0; i < length; ++i)
{
c = getchar();
if (!islower(c))
{
fprintf(stderr,
"Error, malformed input, expected only lowercase letters in the string.\n");
exit(EXIT_FAILURE);
}
c -= 'a';
P[c].Position[P[c].N++] = i;
}
/* Count the specified substrings. i and j are iterated through the
indices of the allowed characters. For each pair different i and j, we
count the number of specified substrings that can be performed using
the character of index i as "a" and the character of index j as "b" as
described in Count2. For each pair where i and j are identical, we
count the number of specified substrings that can be formed using the
character of index i alone.
*/
T4 Sum = 0;
for (size_t i = 0; i < L; ++i)
for (size_t j = 0; j < L; ++j)
Sum += i == j
? Count1(&P[i])
: Count2(&P[i], &P[j]);
printf("%" PRIT4 "\n", Sum);
}
In the worst-case scenario, the whole string contains the same character, and in this case every indexes such that 1 <= a < b < c < d <= N will satisfy s[a] == s[c] && s[b] == s[d], hence the counter would add up to n*(n-1)*(n-2)*(n-3) / 4!, which is O(n^4). In other words, assuming the counting process is one-by-one (using counter++), there is no way to make the worst-case time complexity better than O(n^4).
Having that said, this algorithm can be improved. One possible and very important improvement, is that if s[a] != s[c], there is no point in continuing to check all possible indexes b and d. user3777427 went in this direction, and it can be further improved like this:
for(int a = 0; a < n-3; a++)
{
for(int c = a + 2; c < n-1; c++)
{
if(s[a] == s[c])
{
for(int b = a + 1; b < c; b++)
{
for(int d = c + 1; d < n; d++)
{
if(s[b] == s[d])
{
count++;
}
}
}
}
}
}
Edit:
After some more thought, I have found a way to reduce to worst-cast time complexity to O(n^3), by using a Histogram.
First, we go over the char array once and fill up the Histogram, such that index 'a' in the Histogram will contain the number of occurences of 'a', index 'b' in the Histogram will contain the number of occurences of 'b', etc.
Then, we use the Histogram to eliminate the need for the most inner loop (the d loop), like this:
int histogram1[256] = {0};
for (int i = 0; i < n; ++i)
{
++histogram1[(int) s[i]];
}
int histogram2[256];
for(int a = 0; a < n-3; a++)
{
--histogram1[(int) s[a]];
for (int i = 'a'; i <= 'z'; ++i)
{
histogram2[i] = histogram1[i];
}
--histogram2[(int) s[a+1]];
for (int c = a + 2; c < n-1; c++)
{
--histogram2[(int) s[c]];
for (int b = a + 1; b < c; b++)
{
if (s[a] == s[c])
{
count += histogram2[(int) s[b]];
}
}
}
}
Problem
It is perhaps useful for thinking about the problem to recognize that it is an exercise in counting overlapping intervals. For example, if we view each pair of the same characters in the input as marking the endpoints of a half-open interval, then the question is asking to count the number of pairs of intervals that overlap without one being a subset of the other.
Algorithm
One way to approach the problem would begin by identifying and recording all the intervals. It is straightforward to do this in a way that allows the intervals to be grouped by left endpoint and ordered by right endpoint within each group -- this falls out easily from a naive scan of the input with a two-level loop nest.
Such an organization of the intervals is convenient both for reducing the search space for overlaps and for more efficiently counting them. In particular, one can approach the counting like this:
For each interval I, consider the interval groups for left endpoints strictly between the endpoints of I.
Within each of the groups considered, perform a binary search for an interval having right endpoint one greater than the right endpoint of I, or the position where such an interval would occur.
All members of that group from that point to the end satisfy the overlap criterion, so add that number to the total count.
Complexity Analysis
The sorted interval list and group sizes / boundaries can be created at O(n2) cost via a two-level loop nest. There may be as many as n * (n - 1) intervals altogether, occurring when all input characters are the same, so the list requires O(n2) storage.
The intervals are grouped into exactly n - 1 groups, some of which may be empty. For each interval (O(n2)), we consider up to n - 2 of those, and perform a binary search (O(log n)) on each one. This yields O(n3 log n) overall operations.
That's an algorithmic improvement over the O(n4) cost of your original algorithm, though it remains to be seen whether the improved asymptotic complexity manifests improved performance for the specific problem sizes being tested.

C: Printing INT Values on array - How to stop printing on NULL values?

I need to print values stored in an int array, stopping when a NULL character is encountered ('\0').
So I have this code:
const int display[10] = {1,4,8,2,0,9,2};
int main(){
int i = 0;
for (i = 0; i < 10; i++){
if (display[i] == '\0'){
break;
}
printf("%d\n", display[i]);
}
exit(0);
}
I expected to print all the values of display[10] OR break when a '\0' was encountered but my code breaks on display[4] (0) instead of continuing until display[6].
Any advice on how to achieve this, avoiding printing the null characters at the end of my array?
The null character, '\0', is equal to 0. That's why your loop is only printing the first four elements. It breaks when it encounters 0.
In C, '\0'==0. If you want to print only the initialized fields, put a sentinel (say, a negative number) right after the last initialized field and break the loop when you either encounter the sentinel or count to 10.
const int display[10] = {1,4,8,2,0,9,2,-1 /* a sentinel */};
for (i = 0; i < 10 && display[i] >= 0; i++) {
You do not need to check whether null is exist or not. First declare the array without total number initialization. Then You can just run the loop by checking the total number of array element. Now the question is how you get the number of element? It is easy.
const int arr[]= { 1, 2, 3};
int size = sizeof(arr) / sizeof(arr[0]);
for (int i = 0; i < size; i++) {
printf("%d", arr[i]);
}

Number of valid sub-string algorithm

I came across a question for which I couldn't find the algorithm. Can you help me?
Question- A valid substring is one which contains the letter a or z. You will get a string and you have to calculate the number of valid sub-strings of that string.For example- the string 'abcd' contains 4 valid substrings. The string 'azazaz' contains 21 valid substrings and similarly 'abbzbba' contains 22 valid substrings.
I just want to know the algorithm.
Define D[i] - number of valid substrings ending at index i.
Assuming you have this D[i], the solution is simply D[0]+D[1]+...+D[n-1].
Calculating D is fairly simple, by iterating the string and for each charater:
if it is "valid", all substrings ending with this characters are valid.
Otherwise, only by extending a valid substring that ended at last character - makes it valid.
C code:
int NumValidSubstrings(char* s) {
int n = strlen(s);
int D[n] = {0}; // VLA, if that's an issue, just use dynamic allocation
for (int i = 0; i < n; i++) {
if (s[i] == 'z' || s[i] == 'a') {
// if character is valid, each substring ending with it is also valid.
D[i] += i + 1;
} else if (i > 0) {
// Else, only valid substrings from last character, that are extended by 1
D[i] = D[i-1];
}
}
int count = 0;
for (int i = 0; i < n; i++) count += D[i];
return count;
}
Notes:
This technique is called Dynamic Programming.
This solution is O(n) time + space.
You can save some space by not storing the entire D array - but only the last value and calculate count on the fly, making this solution O(1) space and O(n) time.

Resources