Finding substring, but not for all inputs? - c

I wrote a code to find the index of the largest substring in a larger string.
A substring is found when there is an equal amount of a's and b's.
For example, giving 12 and bbbbabaababb should give 2 9, since the first appearing substring starts at index 0 and ends at index 9. 3 10 is also an answer, but since this is not the first appearing substring, this will not be the answer.
The code I made is:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <string.h>
void substr(char str[], int n) {
int sum = 0;
int max = -1, start;
for (int i = 0; i < n; i++) {
if (str[i]=='a') {
str[i] = 0;
} else if(str[i]=='b') {
str[i] = 1;
}
}
// starting point i
for (int i = 0; i < n - 1; i++) {
sum = (str[i] == 0) ? -1 : 1;
// all subarrays from i
for (int j = i + 1; j < n; j++) {
(str[j] == 0) ? (sum += -1) : (sum += 1);
// sum == 0
if (sum == 0 && max < j - i + 1 && n%2==0) {
max = j - i + 1;
start = i-1;
} else if (sum == 0 && max < j - i + 1 && n%2!=0) {
max = j - i + 1;
start = i;
}
}
}
// no subarray
if (max == -1) {
printf("No such subarray\n");
} else {
printf("%d %d\n", start, (start + max - 1));
}
}
/* driver code */
int main(int argc, char* v[]) {
int n; // stores the length of the input
int i = 0; // used as counter
scanf("%d", &n);
n += 1; // deals with the /0 at the end of a str
char str[n]; // stores the total
/* adding new numbers */
while(i < n) {
char new;
scanf("%c", &new);
str[i] = new;
++i;
}
substr(str, n);
return 0;
}
It works for a lot of values, but not for the second example (given below). It should output 2 9 but gives 3 10. This is a valid substring, but not the first one...
Example inputs and outputs should be:
Input Input Input
5 12 5
baababb bbbbabaababb bbbbb
Output Output Output
0 5 2 9 No such subarray

You have several problems, many of them to do with arrays sizes and indices.
When you read in the array, you want n characters. You then increase n in oder to accomodate the null terminator. It is a good idea to null-terminate the string, but the '\0' at the end is really not part of the string data. Instead, adjust the array size when you create the array and place the null terminator explicitly:
char str[n + 1];
// scan n characters
str[n] = '\0';
In C (and other languages), ranges are defined by an inclusive lower bound, but by an exclusive upper bound: [lo, hi). The upper bound hi is not part of the range and there are hi - lo elements in the range. (Arrays with n elements are a special case, where the valid range is [0, n).) You should embrace rather than fight this convention. If your output should be different, amend the output, not the representation in your program.
(And notw how your first example, where you are supposed to have a string of five characters actually reads and considers the b in the 6th position. That's a clear error.)
The position of the maximum valid substring does not depend on whether the overall string length is odd or even!
The first pass, where you convert all "a"s and "b"s to 0's and 1's is unnecessary and it destroys the original string. That's not a big problem here, but keep that in mind.
The actual problem is how you try to find the substrings. Your idea to add 1 for an "a" and subtract one for a "b" is good, but you don't keep your sums correctly. For each possible starting point i, you scan the rest of the string and look for a zero sum. That will only work, if you reset the sum to zero for each i.
void substr(char str[], int n)
{
int max = 0;
int start = -1;
for (int i = 0; i + max < n; i++) {
int sum = 0;
for (int j = i; j < n; j++) {
sum += (str[j] == 'a') ? -1 : 1;
if (sum == 0 && max < j - i) {
max = j - i;
start = i;
}
}
}
if (max == 0) {
printf("No such subarray\n");
} else {
printf("%d %d\n", start, start + max);
}
}
Why initialize max = 0 instead of -1? Because you add +1/−1 as first thing, your check can never find a substring of max == 0, but there's a possibility of optimization: If you have already found a long substring, there's no need to look at the "tail" of your string: The loop condition i + max < n will cut the search short.
(There's another reason: Usually, sizes and indices are represented by unsigned types, e.g. size_t. If you use 0 as initial value, your code will work for unsigned types.)
The algorithm isn't the most efficient for large arrays, but it should work.

Related

function that returns the sum of the longest arithmetic sequence(without using arrays)

So I have this problem I'm trying to solve for a couple of days now, and I just feel lost.
The function basically needs to get the size(n) of a sequence.
The user inputs the size, and then the function will ask him to put the numbers of the sequence one after the other.
Once he puts all the numbers, the function needs to return the sum of the longest sequence.
For example, n=8, and the user put [1,3,5,7,11,13,15,16].
The result will be 16 because [1,3,5,7] is the longest sequence.
If n=8 and the user put [1,3,5,7,11,15,19,20], the result will be 52, because although there are 2 sequences with the length of 4, the sum of [7,11,15,19] is bigger then [1,3,5,7].
The sequence doesn't necessarily needs to be increasing, it can be decreasing too.
The function can't be recursive, and arrays can't be used.
I hope it's clear enough what the problem is, and if not, please let me know so I'll try to explain better.
#define _CRT_SECURE_NO_WARNINGS
#include <stdio.h>
int main()
{
int i, size, num, nextNum, diff, prevDiff, currSeqLength = 0, currSum, prevSum = 0;
printf("Please enter the arithmetic list size: ");
scanf_s("%d", &size);
for (i = 1; i <= size; i++)
{
printf("Please enter num: ");
scanf_s("%d", &num);
while (i == 1)
{
prevSum = num;
nextNum = num;
currSeqLength++;
break;
}
while (i == 2)
{
currSum = prevSum + num;
diff = num - nextNum;
nextNum = num;
currSeqLength++;
break;
}
while (i >= 3)
{
prevDiff = diff;
diff = num - nextNum;
nextNum = num;
if (prevDiff == diff)
{
currSum += num;
currSeqLength++;
break;
}
else
{
prevDiff = diff;
// diff now should be the latest num - previous one
}
}
}
}
This is basically what I've managed so far. I know some things here aren't working as intended, and I know the code is only half complete, but I've tried so many things and I can't seem to put my finger on what's the problem, and would really love some guidance, I'm really lost.
A few problems I encountered.
When I enter a loop in which the difference between the new number and the old one is different than the previous loops(for instance, [4,8,11]), I can't seem to manage to save the old number(in this case 8) to calculate the next difference(which is 3). Not to mention the first 2 while loops are probably not efficient and can be merged together.
P.S I know that the code is not a function, but I wrote it this way so I can keep track on each step, and once the code works as intended I convert it into a function.
I tried out your code, but as noted in the comments, needed to keep track at various stages in the sequence checks which sequence had the longest consistent difference value. With that I added in some additional arrays to perform that function. Following is a prototype of how that might be accomplished.
#include <stdio.h>
#include <stdlib.h>
int main()
{
int len, diff, longest;
printf("Enter the size of your sequence: ");
scanf("%d", &len);
int num[len], seq[len], sum[len];
for (int i = 0; i < len; i++)
{
printf("Enter value #%d: ", i + 1);
scanf("%d", &num[i]);
seq[i] = 0; /* Initialize these arrays as the values are entered */
sum[i] = 0;
}
for (int i = 0; i < len - 1; i++)
{
seq[i] = 1; /* The sequence length will always start at "1" */
sum[i] = num[i];
diff = num[i + 1] - num[i];
for (int j = i; j < len - 1; j++)
{
if (diff == num[j + 1] - num[j])
{
sum[i] += num[j + 1]; /* Accumulate the sum for this sequence */
seq[i] += 1; /* Increment the sequence length for this sequence portion */
}
else
{
break;
}
}
}
longest = 0; /* Now, determine which point in the lise of numbers has the longest sequence and sum total */
for (int i = 1; i < len; i++)
{
if ((seq[i] > seq[longest]) || ((seq[i] == seq[longest]) && (sum[i] > sum[longest])))
{
longest = i;
}
}
printf("The sequence with the longest sequence and largest sum is: [%d ", num[longest]);
diff = num[longest + 1] - num[longest];
for (int i = longest + 1; i < len; i++)
{
if ((num[i] - num[i - 1]) == diff)
{
printf("%d ", num[i]);
}
else
{
break;
}
}
printf("]\n");
return 0;
}
Some points to note.
Additional arrays are defined to track sequence length and sequence summary values.
A brute force method is utilized reading through the entered value list starting with the first value to determine its longest sequence length and continuing on to the sequence starting with the second value in the list and continuing on through the list.
Once all possible starting points are evaluated for length and total, a check is then made for the starting point that has the longest sequence or the longest sequence and largest sum value.
Following is a some sample terminal output utilizing the list values in your initial query.
#Dev:~/C_Programs/Console/Longest/bin/Release$ ./Longest
Enter the size of your sequence: 8
Enter value #1: 1
Enter value #2: 3
Enter value #3: 5
Enter value #4: 7
Enter value #5: 11
Enter value #6: 13
Enter value #7: 15
Enter value #8: 16
The sequence with the longest sequence and largest sum is: [1 3 5 7 ]
#Dev:~/C_Programs/Console/Longest/bin/Release$ ./Longest
Enter the size of your sequence: 8
Enter value #1: 1
Enter value #2: 3
Enter value #3: 5
Enter value #4: 7
Enter value #5: 11
Enter value #6: 15
Enter value #7: 19
Enter value #8: 20
The sequence with the longest sequence and largest sum is: [7 11 15 19 ]
No doubt this code snippet could use some polish, but give it a try and see if it meets the spirit of your project.
I know code-only answers are frowned upon, but this is the simplest I can come up with and its logic seems easy to follow:
#define _CRT_SECURE_NO_WARNINGS
#include <stdio.h>
int main()
{
int i, size;
int currNum, currDiff;
int prevNum = 0, prevDiff = 0;
int currSum = 0, currSeqLen = 0;
int bestSum = 0, bestSeqLen = 0;
printf("Please enter the arithmetic list size: ");
scanf_s("%d", &size);
for (i = 0; i < size; i++)
{
printf("Please enter num: ");
scanf_s("%d", &currNum);
if (currSeqLen > 0)
{
currDiff = currNum - prevNum;
if (currSeqLen > 1 && currDiff != prevDiff)
{
/* New arithmetic sequence. */
currSeqLen = 1;
currSum = prevNum;
}
prevDiff = currDiff;
}
currSum += currNum;
prevNum = currNum;
currSeqLen++;
if (currSeqLen > bestSeqLen ||
currSeqLen == bestSeqLen && currSum > bestSum)
{
/* This is the best sequence so far. */
bestSeqLen = currSeqLen;
bestSum = currSum;
}
}
printf("\nbest sequence length=%d, sum=%d\n", bestSeqLen, bestSum);
return 0;
}
I have omitted error checking for the scanf_s calls. They can be changed to scanf for non-Windows platforms.

How to reduce time complexity in traversing a string?

I was solving a problem to find number of such indexes, a, b, c, d in a string s, of size n made only of lowercase letters such that:
1 <= a < b < c < d <= n
and
s[a] == s[c] and s[b] == s[d]
The code I wrote traverses the string character by character in a basic manner:
#include<stdio.h>
int main()
{
int n, count = 0;
char s[2002];
scanf("%d%s", &n, s);
for(int a = 0; a<n-3; a++)
{
for(int b = a + 1; b<n-2; b++)
{
for(int c = b + 1; c<n-1; c++)
{
for(int d = c + 1; d<n; d++)
{
if(s[a] == s[c] && s[b] == s[d] && a>=0 && b>a && c>b && d>c && d<n)
{
count++;
}
}
}
}
}
printf("%d", count);
return 0;
}
a, b, c and d are the indices.
The trouble is that if the input string is big in size, the time limit is exceeded due to the 4 nested loops. Is there any way I can improve the code to decrease the complexity?
The problem statement is available here: https://www.hackerearth.com/practice/algorithms/searching/linear-search/practice-problems/algorithm/holiday-season-ab957deb/
The problem can be solved if you maintain an array which stores the cumulative frequency (the total of a frequency and all frequencies so far in a frequency distribution) of each character in the input string. Since the string will only consist of lower case characters, hence the array size will be [26][N+1].
For example:
index - 1 2 3 4 5
string - a b a b a
cumulativeFrequency array:
0 1 2 3 4 5
a 0 1 1 2 2 3
b 0 0 1 1 2 2
I have made the array by taking the index of first character of the input string as 1. Doing so will help us in solving the problem later. For now, just ignore column 0 and assume that the string starts from index 1 and not 0.
Useful facts
Using cumulative frequency array we can easily check if a character is present at any index i:
if cumulativeFrequency[i]-cumulativeFrequency[i-1] > 0
number of times a character is present from range i to j (excluding both i and j):
frequency between i and j = cumulativeFrequency[j-1] - cumulativeFrequency[i]
Algorithm
1: for each character from a-z:
2: Locate index a and c such that charAt[a] == charAt[c]
3: for each pair (a, c):
4: for character from a-z:
5: b = frequency of character between a and c
6: d = frequency of character after c
7: count += b*d
Time complexity
Line 1-2:
The outer most loop will run for 26 times. We need to locate all the
pair(a, c), to do that we require a time complexity of O(n^2).
Line 3-4:
For each pair, we again run a loop 26 times to check how many times each character is present between a and c and after c.
Line 5-7:
Using cumulative frequency array, for each character we can easily calculate how many times it appears between a and c and after c in O(1).
Hence, overall complexity is O(26*n^2*26) = O(n^2).
Code
I code in Java. I do not have a code in C. I have used simple loops an array so it should be easy to understand.
//Input N and string
//Do not pay attention to the next two lines since they are basically taking
//input using Java input streams
int N = Integer.parseInt(bufferedReader.readLine().trim());
String str = bufferedReader.readLine().trim();
//Construct an array to store cumulative frequency of each character in the string
int[][] cumulativeFrequency = new int[26][N+1];
//Fill the cumulative frequency array
for (int i = 0;i < str.length();i++)
{
//character an index i
char ch = str.charAt(i);
//Fill the cumulative frequency array for each character
for (int j = 0;j < 26;j++)
{
cumulativeFrequency[j][i+1] += cumulativeFrequency[j][i];
if (ch-97 == j) cumulativeFrequency[j][i+1]++;
}
}
int a, b, c, d;
long count = 0;
//Follow the steps of the algorithm here
for (int i = 0;i < 26;i++)
{
for (int j = 1; j <= N - 2; j++)
{
//Check if character at i is present at index j
a = cumulativeFrequency[i][j] - cumulativeFrequency[i][j - 1];
if (a > 0)
{
//Check if character at i is present at index k
for (int k = j + 2; k <= N; k++)
{
c = cumulativeFrequency[i][k] - cumulativeFrequency[i][k - 1];
if (c > 0)
{
//For each character, find b*d
for (int l = 0; l < 26; l++)
{
//For each character calculate b and d
b = cumulativeFrequency[l][k-1] - cumulativeFrequency[l][j];
d = cumulativeFrequency[l][N] - cumulativeFrequency[l][k];
count += b * d;
}
}
}
}
}
}
System.out.println(count);
I hope I have helped you. The code I provided will not give time complexity error and it will work for all test cases. Do comment if you do not understand anything in my explanation.
Performing the equality check in early stages can save you some time.
Also the check a>=0 && b>a && c>b && d>c && d<n seems to be unnecessary as you are already checking for this condition in the loops. An improved version can be as follows:
#include<stdio.h>
int main()
{
int n, count = 0;
char s[2002];
scanf("%d%s", &n, s);
for(int a = 0; a<n-3; a++)
{
for(int b = a + 1; b<n-2; b++)
{
for(int c = b + 1; c<n-1; c++)
{
if(s[a] == s[c]) {
for(int d = c + 1; d<n; d++)
{
if(s[b] == s[d])
{
count++;
}
}
}
}
}
}
printf("%d", count);
return 0;
}
Since the string S is made of only lowercase letters, you can maintain a 26x26 table (actually 25x25, ignore when i=j) that holds the appearance of all possible distinct two letter cases (e.g. ab, ac, bc, etc).
The following code tracks the completeness of each answer candidate(abab, acac, bcbc, etc) by two functions: checking for the AC position and checking for the BD position. Once the value reaches 4, it means that the candidate is a valid answer.
#include <stdio.h>
int digitsAC(int a)
{
if(a % 2 == 0)
return a + 1;
return a;
}
int digitsBD(int b)
{
if(b % 2 == 1)
return b + 1;
return b;
}
int main()
{
int n, count = 0;
char s[2002];
int appearance2x2[26][26] = {0};
scanf("%d%s", &n, s);
for(int i = 0; i < n; ++i)
{
int id = s[i] - 'a';
for(int j = 0; j < 26; ++j)
{
appearance2x2[id][j] = digitsAC(appearance2x2[id][j]);
appearance2x2[j][id] = digitsBD(appearance2x2[j][id]);
}
}
//counting the results
for(int i = 0; i < 26; ++i)
{
for(int j = 0; j < 26; ++j)
{
if(i == j)continue;
if(appearance2x2[i][j] >= 4)count += ((appearance2x2[i][j] - 2) / 2);
}
}
printf("%d", count);
return 0;
}
The time complexity is O(26N), which is equal to linear.
The code can be further accelerated by making bitwise mask operations, but I left the functions simple for clearness.
Haven't tested it a lot, please tell me if you find any bugs in it!
edit: There exists problem when handling continuous appearing letters like aabbaabb
Here is an O(n) solution (counting the number of characters in the allowed character set as constant).
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
/* As used in this program, "substring" means a string that can be formed by
characters from another string. The resulting characters are not
necessarily consecutive in the original string. For example, "ab" is a
substring of "xaxxxxbxx".
This program requires the lowercase letters to have consecutive codes, as
in ASCII.
*/
#define Max 2000 // Maximum string length supported.
typedef short T1; // A type that can hold Max.
typedef int T2; // A type that can hold Max**2.
typedef long T3; // A type that can hold Max**3.
typedef long long T4; // A type that can hold Max**4.
#define PRIT4 "lld" // A conversion specification that will print a T4.
#define L ('z'-'a'+1) // Number of characters in the set allowed.
/* A Positions structure records all positions of a character in the string.
N is the number of appearances, and Position[i] is the position (index into
the string) of the i-th appearance, in ascending order.
*/
typedef struct { T1 N, Position[Max]; } Positions;
/* Return the number of substrings "aaaa" that can be formed from "a"
characters in the positions indicated by A.
*/
static T4 Count1(const Positions *A)
{
T4 N = A->N;
return N * (N-1) * (N-2) * (N-3) / (4*3*2*1);
}
/* Return the number of substrings "abab" that can be formed from "a"
characters in the positions indicated by A and "b" characters in the
positions indicated by B. A and B must be different.
*/
static T4 Count2(const Positions *A, const Positions *B)
{
// Exit early for trivial cases.
if (A->N < 2 || B->N < 2)
return 0;
/* Sum[i] will record the number of "ab" substrings that can be formed
with a "b" at the position in B->Position[b] or earlier.
*/
T2 Sum[Max];
T3 RunningSum = 0;
/* Iterate b through the indices of B->Position. While doing this, a is
synchronized to index to a corresponding place in A->Position.
*/
for (T1 a = 0, b = 0; b < B->N; ++b)
{
/* Advance a to index into A->Position where where A->Position[i]
first exceeds B->Position[b], or to the end if there is no such
spot.
*/
while (a < A->N && A->Position[a] < B->Position[b])
++a;
/* The number of substrings "ab" that can be formed using the "b" at
position B->Position[b] is a, the number of "a" preceding it.
Adding this to RunningSum produces the number of substrings "ab"
that can be formed using this "b" or an earlier one.
*/
RunningSum += a;
// Record that.
Sum[b] = RunningSum;
}
RunningSum = 0;
/* Iterate a through the indices of A->Position. While doing this, b is
synchronized to index to a corresponding place in B->Position.
*/
for (T1 a = 0, b = 0; a < A->N; ++a)
{
/* Advance b to index into B->Position where where B->Position[i]
first exceeds A->Position[a], or to the end if there is no such
spot.
*/
while (b < B->N && B->Position[b] < A->Position[a])
++b;
/* The number of substrings "abab" that can be formed using the "a"
at A->Position[a] as the second "a" in the substring is the number
of "ab" substrings that can be formed with a "b" before the this
"a" multiplied by the number of "b" after this "a".
That number of "ab" substrings is in Sum[b-1], if 0 < b. If b is
zero, there are no "b" before this "a", so the number is zero.
The number of "b" after this "a" is B->N - b.
*/
if (0 < b) RunningSum += (T3) Sum[b-1] * (B->N - b);
}
return RunningSum;
}
int main(void)
{
// Get the string length.
size_t length;
if (1 != scanf("%zu", &length))
{
fprintf(stderr, "Error, expected length in standard input.\n");
exit(EXIT_FAILURE);
}
// Skip blanks.
int c;
do
c = getchar();
while (c != EOF && isspace(c));
ungetc(c, stdin);
/* Create an array of Positions, one element for each character in the
allowed set.
*/
Positions P[L] = {{0}};
for (size_t i = 0; i < length; ++i)
{
c = getchar();
if (!islower(c))
{
fprintf(stderr,
"Error, malformed input, expected only lowercase letters in the string.\n");
exit(EXIT_FAILURE);
}
c -= 'a';
P[c].Position[P[c].N++] = i;
}
/* Count the specified substrings. i and j are iterated through the
indices of the allowed characters. For each pair different i and j, we
count the number of specified substrings that can be performed using
the character of index i as "a" and the character of index j as "b" as
described in Count2. For each pair where i and j are identical, we
count the number of specified substrings that can be formed using the
character of index i alone.
*/
T4 Sum = 0;
for (size_t i = 0; i < L; ++i)
for (size_t j = 0; j < L; ++j)
Sum += i == j
? Count1(&P[i])
: Count2(&P[i], &P[j]);
printf("%" PRIT4 "\n", Sum);
}
In the worst-case scenario, the whole string contains the same character, and in this case every indexes such that 1 <= a < b < c < d <= N will satisfy s[a] == s[c] && s[b] == s[d], hence the counter would add up to n*(n-1)*(n-2)*(n-3) / 4!, which is O(n^4). In other words, assuming the counting process is one-by-one (using counter++), there is no way to make the worst-case time complexity better than O(n^4).
Having that said, this algorithm can be improved. One possible and very important improvement, is that if s[a] != s[c], there is no point in continuing to check all possible indexes b and d. user3777427 went in this direction, and it can be further improved like this:
for(int a = 0; a < n-3; a++)
{
for(int c = a + 2; c < n-1; c++)
{
if(s[a] == s[c])
{
for(int b = a + 1; b < c; b++)
{
for(int d = c + 1; d < n; d++)
{
if(s[b] == s[d])
{
count++;
}
}
}
}
}
}
Edit:
After some more thought, I have found a way to reduce to worst-cast time complexity to O(n^3), by using a Histogram.
First, we go over the char array once and fill up the Histogram, such that index 'a' in the Histogram will contain the number of occurences of 'a', index 'b' in the Histogram will contain the number of occurences of 'b', etc.
Then, we use the Histogram to eliminate the need for the most inner loop (the d loop), like this:
int histogram1[256] = {0};
for (int i = 0; i < n; ++i)
{
++histogram1[(int) s[i]];
}
int histogram2[256];
for(int a = 0; a < n-3; a++)
{
--histogram1[(int) s[a]];
for (int i = 'a'; i <= 'z'; ++i)
{
histogram2[i] = histogram1[i];
}
--histogram2[(int) s[a+1]];
for (int c = a + 2; c < n-1; c++)
{
--histogram2[(int) s[c]];
for (int b = a + 1; b < c; b++)
{
if (s[a] == s[c])
{
count += histogram2[(int) s[b]];
}
}
}
}
Problem
It is perhaps useful for thinking about the problem to recognize that it is an exercise in counting overlapping intervals. For example, if we view each pair of the same characters in the input as marking the endpoints of a half-open interval, then the question is asking to count the number of pairs of intervals that overlap without one being a subset of the other.
Algorithm
One way to approach the problem would begin by identifying and recording all the intervals. It is straightforward to do this in a way that allows the intervals to be grouped by left endpoint and ordered by right endpoint within each group -- this falls out easily from a naive scan of the input with a two-level loop nest.
Such an organization of the intervals is convenient both for reducing the search space for overlaps and for more efficiently counting them. In particular, one can approach the counting like this:
For each interval I, consider the interval groups for left endpoints strictly between the endpoints of I.
Within each of the groups considered, perform a binary search for an interval having right endpoint one greater than the right endpoint of I, or the position where such an interval would occur.
All members of that group from that point to the end satisfy the overlap criterion, so add that number to the total count.
Complexity Analysis
The sorted interval list and group sizes / boundaries can be created at O(n2) cost via a two-level loop nest. There may be as many as n * (n - 1) intervals altogether, occurring when all input characters are the same, so the list requires O(n2) storage.
The intervals are grouped into exactly n - 1 groups, some of which may be empty. For each interval (O(n2)), we consider up to n - 2 of those, and perform a binary search (O(log n)) on each one. This yields O(n3 log n) overall operations.
That's an algorithmic improvement over the O(n4) cost of your original algorithm, though it remains to be seen whether the improved asymptotic complexity manifests improved performance for the specific problem sizes being tested.

Non divisible subset-Hackerrank solution in C

I am new to programming and C is the only language I know. Read a few answers for the same question written in other programming languages. I have written some code for the same but I only get a few test cases correct (4 to be precise). How do I edit my code to get accepted?
I have tried comparing one element of the array with the rest and then I remove the element (which is being compared with the initial) if their sum is divisible by k and then this continues until there are two elements in the array where their sum is divisible by k. Here is the link to the question:
https://www.hackerrank.com/challenges/non-divisible-subset/problem
#include<stdio.h>
#include<stdlib.h>
void remove_element(int array[],int position,long int *n){
int i;
for(i=position;i<=(*n)-1;i++){
array[i]=array[i+1];
}
*n=*n-1;
}
int main(){
int k;
long int n;
scanf("%ld",&n);
scanf("%d",&k);
int *array=malloc(n*sizeof(int));
int i,j;
for(i=0;i<n;i++)
scanf("%d",&array[i]);
for(i=n-1;i>=0;i--){
int counter=0;
for(j=n-1;j>=0;j--){
if((i!=j)&&(array[i]+array[j])%k==0)
{
remove_element(array,j,&n);
j--;
continue;
}
else if((i!=j)&&(array[i]+array[j])%k!=0){
counter++;
}
}
if(counter==n-1){
printf("%ld",n);
break;
}
}
return 0;
}
I only get about 4 test cases right from 20 test cases.
What Gerhardh in his comment hinted at is that
for(i=position;i<=(*n)-1;i++){
array[i]=array[i+1];
}
reads from array[*n] when i = *n-1, overrunning the array. Change that to
for (i=position; i<*n-1; i++)
array[i]=array[i+1];
Additionally, you have
remove_element(array,j,&n);
j--;
- but j will be decremented when continuing the for loop, so decrementing it here is one time too many, while adjustment of i is necessary, since remove_element() shifted array[i] one position to the left, so change j-- to i--.
Furthermore, the condition
if(counter==n-1){
printf("%ld",n);
break;
}
makes just no sense; remove that block and place printf("%ld\n", n); before the return 0;.
To solve this efficiently, you have to realize several things:
Two positive integer numbers a and b are divisible by k (also positive integer number) if ((a%k) + (b%k))%k = 0. That means, that either ((a%k) + (b%k)) = 0 (1) or ((a%k) + (b%k)) = k (2).
Case (1) ((a%k) + (b%k)) = 0 is possible only if both a and b are multiples of k or a%k=0 and b%k=0. For case (2) , there are at most k/2 possible pairs. So, our task is to pick elements that don't fall in case 1 or 2.
To do this, map each number in your array to its corresponding remainder by modulo k. For this, create a new array remainders in which an index stands for a remainder, and a value stands for numbers having such remainder.
Go over the new array remainders and handle 3 cases.
4.1 If remainders[0] > 0, then we can still pick only one element from the original (if we pick more, then sum of their remainders 0, so they are divisible by k!!!).
4.2 if k is even and remainders[k/2] > 0, then we can also pick only one element (otherwise their sum is k!!!).
4.3 What about the other numbers? Well, for any remainder rem > 0 make sure to pick max(remainders[rem], remainders[k - rem]). You can't pick both since rem + k - rem = k, so numbers from such groups can be divisible by k.
Now, the code:
int nonDivisibleSubset(int k, int s_count, int* s) {
static int remainders[101];
for (int i = 0; i < s_count; i++) {
int rem = s[i] % k;
remainders[rem]++;
}
int maxSize = 0;
bool isKOdd = k & 1;
int halfK = k / 2;
for (int rem = 0; rem <= halfK; rem++) {
if (rem == 0) {
maxSize += remainders[rem] > 0;
continue;
}
if (!isKOdd && (rem == halfK)) {
maxSize++;
continue;
}
int otherRem = k - rem;
if (remainders[rem] > remainders[otherRem]) {
maxSize += remainders[rem];
} else {
maxSize += remainders[otherRem];
}
}
return maxSize;
}

C - Counting the occurrence of same number in an array

I have an array in C where:
int buf[4];
buf[0] = 1;
buf[1] = 2;
buf[2] = 5;
buf[3] = 2;
and I want to count how many elements in the array that have the same value with a counter.
In the above example, the number of elements of similar value is 2 since there are two 2s in the array.
I tried:
#include <stdio.h>
int main() {
int buf[4];
int i = 0;
int count = 0;
buf[0] = 1;
buf[1] = 2;
buf[2] = 5;
buf[3] = 2;
int length = sizeof(buf) / sizeof(int);
for (i=0; i < length; i++) {
if (buf[i] == buf[i+1]) {
count++;
}
}
printf("count = %d", count);
return 0;
}
but I'm getting 0 as the output. Would appreciate some help on this.
Update
Apologies for not being clear.
First:
the array is limited to only of size 4 since it involves 4 directions, left, bottom, top and right.
Second:
if there is at least 2 elements in the array that have the same value, the count is accepted. Anything less will simply not register.
Example:
1,2,5,2
count = 2 since there are two '2's in the array.
1,2,2,2
count = 3 since there are three '2's in the array
1,2,3,4
count = 0 since there are no similarities in the array. Hence this is not accepted.
Anything less than the count = 2 is invalid.
You are really rather hamstrung by the order the values appear within buf. The only rudimentary way to handle this when limited to 4-values is to make a pass with nested loops to determine what the matching value is, and then make a single pass over buf again counting how many times it occurs (and since you limit to 4-values, even with a pair of matches, your count is limited to 2 -- so it doesn't make a difference which you count)
A short example would be:
#include <stdio.h>
int main (void) {
int buf[] = {1, 2, 5, 2},
length = sizeof(buf) / sizeof(int),
count = 0,
same = 0;
for (int i = 0; i < length - 1; i++) /* identify what value matches */
for (int j = i + 1; i < length; i++)
if (buf[i] == buf[j]) {
same = buf[i];
goto saved; /* jump out of both loops when same found */
}
saved:; /* the lowly, but very useful 'goto' saves the day - again */
for (int i = 0; i < length; i++) /* count matching numbers */
if (buf[i] == same)
count++;
printf ("count = %d\n", count);
return 0;
}
Example Use/Output
$ ./bin/arr_freq_count
count = 2
While making that many passes over the values, it takes little more to use an actual frequency array to fully determine how often each value occurs, e.g.
#include <stdio.h>
#include <string.h>
#include <limits.h>
int main (void) {
int buf[] = {1, 2, 3, 4, 5, 2, 5, 6},
n = sizeof buf / sizeof *buf,
max = INT_MIN,
min = INT_MAX;
for (int i = 0; i < n; i++) { /* find max/min for range */
if (buf[i] > max)
max = buf[i];
if (buf[i] < min)
min = buf[i];
}
int range = max - min + 1; /* max-min elements (inclusive) */
int freq[range]; /* declare VLA */
memset (freq, 0, range * sizeof *freq); /* initialize VLA zero */
for (int i = 0; i < n; i++) /* loop over buf setting count in freq */
freq[buf[i]-min]++;
for (int i = 0; i < range; i++) /* output frequence of values */
printf ("%d occurs %d times\n", i + min, freq[i]);
return 0;
}
(note: add a sanity check on the range to prevent being surprised by the amount of storage required if min is actually close to INT_MIN and your max is close to INT_MAX -- things could come to quick stop depending on the amount of memory available)
Example Use/Output
$ ./bin/freq_arr
1 occurs 1 times
2 occurs 2 times
3 occurs 1 times
4 occurs 1 times
5 occurs 2 times
6 occurs 1 times
After your edit and explanation that you are limited to 4-values, the compiler should optimize first rudimentary approach just fine. However, for any more than 4-values or when needing the frequency of anything (characters in a file, duplicates in an array, etc..), think frequency array.
The first thing that's wrong is that you are only comparing adjacent values in the buf array. You have to compare all the values to each other.
How to do this is an architectural question. The approach suggested by David Rankin in the comments is one, using an array of structs with the value and count count is a second, and using a hash table is a third option. You've got some coding to do! Good luck. Ask for more help as you need it.
You are comparing values of buf[i] and buf[i+1]. i.e. You are comparing buf[0] with buf[1], buf[1] with buf[2] etc.
What you need is a nested for loop to compare all buf values with each other.
count = 0;
for (i=0; i<4; i++)
{
for (j=i+1; j<4; j++)
{
if (buf[i]==buf[j])
{
count++;
}
}
}
As pointed out by Jonathan Leffler, there is an issue in the above algorithm in case the input has elements {1,1,1,1}. It gives a value of 6 when expected value is 4.
I am keeping it up, as the OP has mentioned that he wants to only check anything above 2. So, this method may still be useful.

How do the functions work?

Could you explain me how the following two algorithms work?
int countSort(int arr[], int n, int exp)
{
int output[n];
int i, count[n] ;
for (int i=0; i < n; i++)
count[i] = 0;
for (i = 0; i < n; i++)
count[ (arr[i]/exp)%n ]++;
for (i = 1; i < n; i++)
count[i] += count[i - 1];
for (i = n - 1; i >= 0; i--)
{
output[count[ (arr[i]/exp)%n] - 1] = arr[i];
count[(arr[i]/exp)%n]--;
}
for (i = 0; i < n; i++)
arr[i] = output[i];
}
void sort(int arr[], int n)
{
countSort(arr, n, 1);
countSort(arr, n, n);
}
I wanted to apply the algorithm at this array:
After calling the function countSort(arr, n, 1) , we get this:
When I call then the function countSort(arr, n, n) , at this for loop:
for (i = n - 1; i >= 0; i--)
{
output[count[ (arr[i]/exp)%n] - 1] = arr[i];
count[(arr[i]/exp)%n]--;
}
I get output[-1]=arr[4].
But the array doesn't have such a position...
Have I done something wrong?
EDIT:Considering the array arr[] = { 10, 6, 8, 2, 3 }, the array count will contain the following elements:
what do these numbers represent? How do we use them?
Counting sort is very easy - let's say you have an array which contains numbers from range 1..3:
[3,1,2,3,1,1,3,1,2]
You can count how many times each number occurs in the array:
count[1] = 4
count[2] = 2
count[3] = 3
Now you know that in a sorted array,
number 1 will occupy positions 0..3 (from 0 to count[1] - 1), followed by
number 2 on positions 4..5 (from count[1] to count[1] + count[2] - 1), followed by
number 3 on positions 6..8 (from count[1] + count[2] to count[1] + count[2] + count[3] - 1).
Now that you know final position of every number, you can just insert every number at its correct position. That's basically what countSort function does.
However, in real life your input array would not contain just numbers from range 1..3, so the solution is to sort numbers on the least significant digit (LSD) first, then LSD-1 ... up to the most significant digit.
This way you can sort bigger numbers by sorting numbers from range 0..9 (single digit range in decimal numeral system).
This code: (arr[i]/exp)%n in countSort is used just to get those digits. n is base of your numeral system, so for decimal you should use n = 10 and exp should start with 1 and be multiplied by base in every iteration to get consecutive digits.
For example, if we want to get third digit from right side, we use n = 10 and exp = 10^2:
x = 1234,
(x/exp)%n = 2.
This algorithm is called Radix sort and is explained in detail on Wikipedia: http://en.wikipedia.org/wiki/Radix_sort
It took a bit of time to pick though your countSort routine and attempt to determine just what it was you were doing compared to a normal radix sort. There are some versions that split the iteration and the actual sort routine which appears to be what you attempted using both countSort and sort functions. However, after going though that exercise, it was clear you had just missed including necessary parts of the sort routine. After fixing various compile/declaration issues in your original code, the following adds the pieces you overlooked.
In your countSort function, the size of your count array was wrong. It must be the size of the base, in this case 10. (you had 5) You confused the use of exp and base throughout the function. The exp variable steps through the powers of 10 allowing you to get the value and position of each element in the array when combined with a modulo base operation. You had modulo n instead. This problem also permeated you loop ranges, where you had a number of your loop indexes iterating over 0 < n where the correct range was 0 < base.
You missed finding the maximum value in the original array which is then used to limit the number of passes through the array to perform the sort. In fact all of your existing loops in countSort must fall within the outer-loop iterating while (m / exp > 0). Lastly, you omitted a increment of exp within the outer-loop necessary to applying the sort to each element within the array. I guess you just got confused, but I commend your effort in attempting to rewrite the sort routine and not just copy/pasting from somewhere else. (you may have copied/pasted, but if that's the case, you have additional problems...)
With each of those issues addressed, the sort works. Look though the changes and understand what it is doing. The radix sort/count sort are distribution sorts relying on where numbers occur and manipulating indexes rather than comparing values against one another which makes this type of sort awkward to understand at first. Let me know if you have any questions. I made attempts to preserve your naming convention throughout the function, with the addition of a couple that were omitted and to prevent hardcoding 10 as the base.
#include <stdio.h>
void prnarray (int *a, int sz);
void countSort (int arr[], int n, int base)
{
int exp = 1;
int m = arr[0];
int output[n];
int count[base];
int i;
for (i = 1; i < n; i++) /* find the maximum value */
m = (arr[i] > m) ? arr[i] : m;
while (m / exp > 0)
{
for (i = 0; i < base; i++)
count[i] = 0; /* zero bucket array (count) */
for (i = 0; i < n; i++)
count[ (arr[i]/exp) % base ]++; /* count keys to go in each bucket */
for (i = 1; i < base; i++) /* indexes after end of each bucket */
count[i] += count[i - 1];
for (i = n - 1; i >= 0; i--) /* map bucket indexes to keys */
{
output[count[ (arr[i]/exp) % base] - 1] = arr[i];
count[(arr[i]/exp)%n]--;
}
for (i = 0; i < n; i++) /* fill array with sorted output */
arr[i] = output[i];
exp *= base; /* inc exp for next group of keys */
}
}
int main (void) {
int arr[] = { 10, 6, 8, 2, 3 };
int n = 5;
int base = 10;
printf ("\n The original array is:\n\n");
prnarray (arr, n);
countSort (arr, n, base);
printf ("\n The sorted array is\n\n");
prnarray (arr, n);
printf ("\n");
return 0;
}
void prnarray (int *a, int sz)
{
register int i;
printf (" [");
for (i = 0; i < sz; i++)
printf (" %d", a[i]);
printf (" ]\n");
}
output:
$ ./bin/sort_count
The original array is:
[ 10 6 8 2 3 ]
The sorted array is
[ 2 3 6 8 10 ]

Resources