How to reduce time complexity in traversing a string? - c

I was solving a problem to find number of such indexes, a, b, c, d in a string s, of size n made only of lowercase letters such that:
1 <= a < b < c < d <= n
and
s[a] == s[c] and s[b] == s[d]
The code I wrote traverses the string character by character in a basic manner:
#include<stdio.h>
int main()
{
int n, count = 0;
char s[2002];
scanf("%d%s", &n, s);
for(int a = 0; a<n-3; a++)
{
for(int b = a + 1; b<n-2; b++)
{
for(int c = b + 1; c<n-1; c++)
{
for(int d = c + 1; d<n; d++)
{
if(s[a] == s[c] && s[b] == s[d] && a>=0 && b>a && c>b && d>c && d<n)
{
count++;
}
}
}
}
}
printf("%d", count);
return 0;
}
a, b, c and d are the indices.
The trouble is that if the input string is big in size, the time limit is exceeded due to the 4 nested loops. Is there any way I can improve the code to decrease the complexity?
The problem statement is available here: https://www.hackerearth.com/practice/algorithms/searching/linear-search/practice-problems/algorithm/holiday-season-ab957deb/

The problem can be solved if you maintain an array which stores the cumulative frequency (the total of a frequency and all frequencies so far in a frequency distribution) of each character in the input string. Since the string will only consist of lower case characters, hence the array size will be [26][N+1].
For example:
index - 1 2 3 4 5
string - a b a b a
cumulativeFrequency array:
0 1 2 3 4 5
a 0 1 1 2 2 3
b 0 0 1 1 2 2
I have made the array by taking the index of first character of the input string as 1. Doing so will help us in solving the problem later. For now, just ignore column 0 and assume that the string starts from index 1 and not 0.
Useful facts
Using cumulative frequency array we can easily check if a character is present at any index i:
if cumulativeFrequency[i]-cumulativeFrequency[i-1] > 0
number of times a character is present from range i to j (excluding both i and j):
frequency between i and j = cumulativeFrequency[j-1] - cumulativeFrequency[i]
Algorithm
1: for each character from a-z:
2: Locate index a and c such that charAt[a] == charAt[c]
3: for each pair (a, c):
4: for character from a-z:
5: b = frequency of character between a and c
6: d = frequency of character after c
7: count += b*d
Time complexity
Line 1-2:
The outer most loop will run for 26 times. We need to locate all the
pair(a, c), to do that we require a time complexity of O(n^2).
Line 3-4:
For each pair, we again run a loop 26 times to check how many times each character is present between a and c and after c.
Line 5-7:
Using cumulative frequency array, for each character we can easily calculate how many times it appears between a and c and after c in O(1).
Hence, overall complexity is O(26*n^2*26) = O(n^2).
Code
I code in Java. I do not have a code in C. I have used simple loops an array so it should be easy to understand.
//Input N and string
//Do not pay attention to the next two lines since they are basically taking
//input using Java input streams
int N = Integer.parseInt(bufferedReader.readLine().trim());
String str = bufferedReader.readLine().trim();
//Construct an array to store cumulative frequency of each character in the string
int[][] cumulativeFrequency = new int[26][N+1];
//Fill the cumulative frequency array
for (int i = 0;i < str.length();i++)
{
//character an index i
char ch = str.charAt(i);
//Fill the cumulative frequency array for each character
for (int j = 0;j < 26;j++)
{
cumulativeFrequency[j][i+1] += cumulativeFrequency[j][i];
if (ch-97 == j) cumulativeFrequency[j][i+1]++;
}
}
int a, b, c, d;
long count = 0;
//Follow the steps of the algorithm here
for (int i = 0;i < 26;i++)
{
for (int j = 1; j <= N - 2; j++)
{
//Check if character at i is present at index j
a = cumulativeFrequency[i][j] - cumulativeFrequency[i][j - 1];
if (a > 0)
{
//Check if character at i is present at index k
for (int k = j + 2; k <= N; k++)
{
c = cumulativeFrequency[i][k] - cumulativeFrequency[i][k - 1];
if (c > 0)
{
//For each character, find b*d
for (int l = 0; l < 26; l++)
{
//For each character calculate b and d
b = cumulativeFrequency[l][k-1] - cumulativeFrequency[l][j];
d = cumulativeFrequency[l][N] - cumulativeFrequency[l][k];
count += b * d;
}
}
}
}
}
}
System.out.println(count);
I hope I have helped you. The code I provided will not give time complexity error and it will work for all test cases. Do comment if you do not understand anything in my explanation.

Performing the equality check in early stages can save you some time.
Also the check a>=0 && b>a && c>b && d>c && d<n seems to be unnecessary as you are already checking for this condition in the loops. An improved version can be as follows:
#include<stdio.h>
int main()
{
int n, count = 0;
char s[2002];
scanf("%d%s", &n, s);
for(int a = 0; a<n-3; a++)
{
for(int b = a + 1; b<n-2; b++)
{
for(int c = b + 1; c<n-1; c++)
{
if(s[a] == s[c]) {
for(int d = c + 1; d<n; d++)
{
if(s[b] == s[d])
{
count++;
}
}
}
}
}
}
printf("%d", count);
return 0;
}

Since the string S is made of only lowercase letters, you can maintain a 26x26 table (actually 25x25, ignore when i=j) that holds the appearance of all possible distinct two letter cases (e.g. ab, ac, bc, etc).
The following code tracks the completeness of each answer candidate(abab, acac, bcbc, etc) by two functions: checking for the AC position and checking for the BD position. Once the value reaches 4, it means that the candidate is a valid answer.
#include <stdio.h>
int digitsAC(int a)
{
if(a % 2 == 0)
return a + 1;
return a;
}
int digitsBD(int b)
{
if(b % 2 == 1)
return b + 1;
return b;
}
int main()
{
int n, count = 0;
char s[2002];
int appearance2x2[26][26] = {0};
scanf("%d%s", &n, s);
for(int i = 0; i < n; ++i)
{
int id = s[i] - 'a';
for(int j = 0; j < 26; ++j)
{
appearance2x2[id][j] = digitsAC(appearance2x2[id][j]);
appearance2x2[j][id] = digitsBD(appearance2x2[j][id]);
}
}
//counting the results
for(int i = 0; i < 26; ++i)
{
for(int j = 0; j < 26; ++j)
{
if(i == j)continue;
if(appearance2x2[i][j] >= 4)count += ((appearance2x2[i][j] - 2) / 2);
}
}
printf("%d", count);
return 0;
}
The time complexity is O(26N), which is equal to linear.
The code can be further accelerated by making bitwise mask operations, but I left the functions simple for clearness.
Haven't tested it a lot, please tell me if you find any bugs in it!
edit: There exists problem when handling continuous appearing letters like aabbaabb

Here is an O(n) solution (counting the number of characters in the allowed character set as constant).
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
/* As used in this program, "substring" means a string that can be formed by
characters from another string. The resulting characters are not
necessarily consecutive in the original string. For example, "ab" is a
substring of "xaxxxxbxx".
This program requires the lowercase letters to have consecutive codes, as
in ASCII.
*/
#define Max 2000 // Maximum string length supported.
typedef short T1; // A type that can hold Max.
typedef int T2; // A type that can hold Max**2.
typedef long T3; // A type that can hold Max**3.
typedef long long T4; // A type that can hold Max**4.
#define PRIT4 "lld" // A conversion specification that will print a T4.
#define L ('z'-'a'+1) // Number of characters in the set allowed.
/* A Positions structure records all positions of a character in the string.
N is the number of appearances, and Position[i] is the position (index into
the string) of the i-th appearance, in ascending order.
*/
typedef struct { T1 N, Position[Max]; } Positions;
/* Return the number of substrings "aaaa" that can be formed from "a"
characters in the positions indicated by A.
*/
static T4 Count1(const Positions *A)
{
T4 N = A->N;
return N * (N-1) * (N-2) * (N-3) / (4*3*2*1);
}
/* Return the number of substrings "abab" that can be formed from "a"
characters in the positions indicated by A and "b" characters in the
positions indicated by B. A and B must be different.
*/
static T4 Count2(const Positions *A, const Positions *B)
{
// Exit early for trivial cases.
if (A->N < 2 || B->N < 2)
return 0;
/* Sum[i] will record the number of "ab" substrings that can be formed
with a "b" at the position in B->Position[b] or earlier.
*/
T2 Sum[Max];
T3 RunningSum = 0;
/* Iterate b through the indices of B->Position. While doing this, a is
synchronized to index to a corresponding place in A->Position.
*/
for (T1 a = 0, b = 0; b < B->N; ++b)
{
/* Advance a to index into A->Position where where A->Position[i]
first exceeds B->Position[b], or to the end if there is no such
spot.
*/
while (a < A->N && A->Position[a] < B->Position[b])
++a;
/* The number of substrings "ab" that can be formed using the "b" at
position B->Position[b] is a, the number of "a" preceding it.
Adding this to RunningSum produces the number of substrings "ab"
that can be formed using this "b" or an earlier one.
*/
RunningSum += a;
// Record that.
Sum[b] = RunningSum;
}
RunningSum = 0;
/* Iterate a through the indices of A->Position. While doing this, b is
synchronized to index to a corresponding place in B->Position.
*/
for (T1 a = 0, b = 0; a < A->N; ++a)
{
/* Advance b to index into B->Position where where B->Position[i]
first exceeds A->Position[a], or to the end if there is no such
spot.
*/
while (b < B->N && B->Position[b] < A->Position[a])
++b;
/* The number of substrings "abab" that can be formed using the "a"
at A->Position[a] as the second "a" in the substring is the number
of "ab" substrings that can be formed with a "b" before the this
"a" multiplied by the number of "b" after this "a".
That number of "ab" substrings is in Sum[b-1], if 0 < b. If b is
zero, there are no "b" before this "a", so the number is zero.
The number of "b" after this "a" is B->N - b.
*/
if (0 < b) RunningSum += (T3) Sum[b-1] * (B->N - b);
}
return RunningSum;
}
int main(void)
{
// Get the string length.
size_t length;
if (1 != scanf("%zu", &length))
{
fprintf(stderr, "Error, expected length in standard input.\n");
exit(EXIT_FAILURE);
}
// Skip blanks.
int c;
do
c = getchar();
while (c != EOF && isspace(c));
ungetc(c, stdin);
/* Create an array of Positions, one element for each character in the
allowed set.
*/
Positions P[L] = {{0}};
for (size_t i = 0; i < length; ++i)
{
c = getchar();
if (!islower(c))
{
fprintf(stderr,
"Error, malformed input, expected only lowercase letters in the string.\n");
exit(EXIT_FAILURE);
}
c -= 'a';
P[c].Position[P[c].N++] = i;
}
/* Count the specified substrings. i and j are iterated through the
indices of the allowed characters. For each pair different i and j, we
count the number of specified substrings that can be performed using
the character of index i as "a" and the character of index j as "b" as
described in Count2. For each pair where i and j are identical, we
count the number of specified substrings that can be formed using the
character of index i alone.
*/
T4 Sum = 0;
for (size_t i = 0; i < L; ++i)
for (size_t j = 0; j < L; ++j)
Sum += i == j
? Count1(&P[i])
: Count2(&P[i], &P[j]);
printf("%" PRIT4 "\n", Sum);
}

In the worst-case scenario, the whole string contains the same character, and in this case every indexes such that 1 <= a < b < c < d <= N will satisfy s[a] == s[c] && s[b] == s[d], hence the counter would add up to n*(n-1)*(n-2)*(n-3) / 4!, which is O(n^4). In other words, assuming the counting process is one-by-one (using counter++), there is no way to make the worst-case time complexity better than O(n^4).
Having that said, this algorithm can be improved. One possible and very important improvement, is that if s[a] != s[c], there is no point in continuing to check all possible indexes b and d. user3777427 went in this direction, and it can be further improved like this:
for(int a = 0; a < n-3; a++)
{
for(int c = a + 2; c < n-1; c++)
{
if(s[a] == s[c])
{
for(int b = a + 1; b < c; b++)
{
for(int d = c + 1; d < n; d++)
{
if(s[b] == s[d])
{
count++;
}
}
}
}
}
}
Edit:
After some more thought, I have found a way to reduce to worst-cast time complexity to O(n^3), by using a Histogram.
First, we go over the char array once and fill up the Histogram, such that index 'a' in the Histogram will contain the number of occurences of 'a', index 'b' in the Histogram will contain the number of occurences of 'b', etc.
Then, we use the Histogram to eliminate the need for the most inner loop (the d loop), like this:
int histogram1[256] = {0};
for (int i = 0; i < n; ++i)
{
++histogram1[(int) s[i]];
}
int histogram2[256];
for(int a = 0; a < n-3; a++)
{
--histogram1[(int) s[a]];
for (int i = 'a'; i <= 'z'; ++i)
{
histogram2[i] = histogram1[i];
}
--histogram2[(int) s[a+1]];
for (int c = a + 2; c < n-1; c++)
{
--histogram2[(int) s[c]];
for (int b = a + 1; b < c; b++)
{
if (s[a] == s[c])
{
count += histogram2[(int) s[b]];
}
}
}
}

Problem
It is perhaps useful for thinking about the problem to recognize that it is an exercise in counting overlapping intervals. For example, if we view each pair of the same characters in the input as marking the endpoints of a half-open interval, then the question is asking to count the number of pairs of intervals that overlap without one being a subset of the other.
Algorithm
One way to approach the problem would begin by identifying and recording all the intervals. It is straightforward to do this in a way that allows the intervals to be grouped by left endpoint and ordered by right endpoint within each group -- this falls out easily from a naive scan of the input with a two-level loop nest.
Such an organization of the intervals is convenient both for reducing the search space for overlaps and for more efficiently counting them. In particular, one can approach the counting like this:
For each interval I, consider the interval groups for left endpoints strictly between the endpoints of I.
Within each of the groups considered, perform a binary search for an interval having right endpoint one greater than the right endpoint of I, or the position where such an interval would occur.
All members of that group from that point to the end satisfy the overlap criterion, so add that number to the total count.
Complexity Analysis
The sorted interval list and group sizes / boundaries can be created at O(n2) cost via a two-level loop nest. There may be as many as n * (n - 1) intervals altogether, occurring when all input characters are the same, so the list requires O(n2) storage.
The intervals are grouped into exactly n - 1 groups, some of which may be empty. For each interval (O(n2)), we consider up to n - 2 of those, and perform a binary search (O(log n)) on each one. This yields O(n3 log n) overall operations.
That's an algorithmic improvement over the O(n4) cost of your original algorithm, though it remains to be seen whether the improved asymptotic complexity manifests improved performance for the specific problem sizes being tested.

Related

Finding substring, but not for all inputs?

I wrote a code to find the index of the largest substring in a larger string.
A substring is found when there is an equal amount of a's and b's.
For example, giving 12 and bbbbabaababb should give 2 9, since the first appearing substring starts at index 0 and ends at index 9. 3 10 is also an answer, but since this is not the first appearing substring, this will not be the answer.
The code I made is:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <string.h>
void substr(char str[], int n) {
int sum = 0;
int max = -1, start;
for (int i = 0; i < n; i++) {
if (str[i]=='a') {
str[i] = 0;
} else if(str[i]=='b') {
str[i] = 1;
}
}
// starting point i
for (int i = 0; i < n - 1; i++) {
sum = (str[i] == 0) ? -1 : 1;
// all subarrays from i
for (int j = i + 1; j < n; j++) {
(str[j] == 0) ? (sum += -1) : (sum += 1);
// sum == 0
if (sum == 0 && max < j - i + 1 && n%2==0) {
max = j - i + 1;
start = i-1;
} else if (sum == 0 && max < j - i + 1 && n%2!=0) {
max = j - i + 1;
start = i;
}
}
}
// no subarray
if (max == -1) {
printf("No such subarray\n");
} else {
printf("%d %d\n", start, (start + max - 1));
}
}
/* driver code */
int main(int argc, char* v[]) {
int n; // stores the length of the input
int i = 0; // used as counter
scanf("%d", &n);
n += 1; // deals with the /0 at the end of a str
char str[n]; // stores the total
/* adding new numbers */
while(i < n) {
char new;
scanf("%c", &new);
str[i] = new;
++i;
}
substr(str, n);
return 0;
}
It works for a lot of values, but not for the second example (given below). It should output 2 9 but gives 3 10. This is a valid substring, but not the first one...
Example inputs and outputs should be:
Input Input Input
5 12 5
baababb bbbbabaababb bbbbb
Output Output Output
0 5 2 9 No such subarray
You have several problems, many of them to do with arrays sizes and indices.
When you read in the array, you want n characters. You then increase n in oder to accomodate the null terminator. It is a good idea to null-terminate the string, but the '\0' at the end is really not part of the string data. Instead, adjust the array size when you create the array and place the null terminator explicitly:
char str[n + 1];
// scan n characters
str[n] = '\0';
In C (and other languages), ranges are defined by an inclusive lower bound, but by an exclusive upper bound: [lo, hi). The upper bound hi is not part of the range and there are hi - lo elements in the range. (Arrays with n elements are a special case, where the valid range is [0, n).) You should embrace rather than fight this convention. If your output should be different, amend the output, not the representation in your program.
(And notw how your first example, where you are supposed to have a string of five characters actually reads and considers the b in the 6th position. That's a clear error.)
The position of the maximum valid substring does not depend on whether the overall string length is odd or even!
The first pass, where you convert all "a"s and "b"s to 0's and 1's is unnecessary and it destroys the original string. That's not a big problem here, but keep that in mind.
The actual problem is how you try to find the substrings. Your idea to add 1 for an "a" and subtract one for a "b" is good, but you don't keep your sums correctly. For each possible starting point i, you scan the rest of the string and look for a zero sum. That will only work, if you reset the sum to zero for each i.
void substr(char str[], int n)
{
int max = 0;
int start = -1;
for (int i = 0; i + max < n; i++) {
int sum = 0;
for (int j = i; j < n; j++) {
sum += (str[j] == 'a') ? -1 : 1;
if (sum == 0 && max < j - i) {
max = j - i;
start = i;
}
}
}
if (max == 0) {
printf("No such subarray\n");
} else {
printf("%d %d\n", start, start + max);
}
}
Why initialize max = 0 instead of -1? Because you add +1/−1 as first thing, your check can never find a substring of max == 0, but there's a possibility of optimization: If you have already found a long substring, there's no need to look at the "tail" of your string: The loop condition i + max < n will cut the search short.
(There's another reason: Usually, sizes and indices are represented by unsigned types, e.g. size_t. If you use 0 as initial value, your code will work for unsigned types.)
The algorithm isn't the most efficient for large arrays, but it should work.

Non divisible subset-Hackerrank solution in C

I am new to programming and C is the only language I know. Read a few answers for the same question written in other programming languages. I have written some code for the same but I only get a few test cases correct (4 to be precise). How do I edit my code to get accepted?
I have tried comparing one element of the array with the rest and then I remove the element (which is being compared with the initial) if their sum is divisible by k and then this continues until there are two elements in the array where their sum is divisible by k. Here is the link to the question:
https://www.hackerrank.com/challenges/non-divisible-subset/problem
#include<stdio.h>
#include<stdlib.h>
void remove_element(int array[],int position,long int *n){
int i;
for(i=position;i<=(*n)-1;i++){
array[i]=array[i+1];
}
*n=*n-1;
}
int main(){
int k;
long int n;
scanf("%ld",&n);
scanf("%d",&k);
int *array=malloc(n*sizeof(int));
int i,j;
for(i=0;i<n;i++)
scanf("%d",&array[i]);
for(i=n-1;i>=0;i--){
int counter=0;
for(j=n-1;j>=0;j--){
if((i!=j)&&(array[i]+array[j])%k==0)
{
remove_element(array,j,&n);
j--;
continue;
}
else if((i!=j)&&(array[i]+array[j])%k!=0){
counter++;
}
}
if(counter==n-1){
printf("%ld",n);
break;
}
}
return 0;
}
I only get about 4 test cases right from 20 test cases.
What Gerhardh in his comment hinted at is that
for(i=position;i<=(*n)-1;i++){
array[i]=array[i+1];
}
reads from array[*n] when i = *n-1, overrunning the array. Change that to
for (i=position; i<*n-1; i++)
array[i]=array[i+1];
Additionally, you have
remove_element(array,j,&n);
j--;
- but j will be decremented when continuing the for loop, so decrementing it here is one time too many, while adjustment of i is necessary, since remove_element() shifted array[i] one position to the left, so change j-- to i--.
Furthermore, the condition
if(counter==n-1){
printf("%ld",n);
break;
}
makes just no sense; remove that block and place printf("%ld\n", n); before the return 0;.
To solve this efficiently, you have to realize several things:
Two positive integer numbers a and b are divisible by k (also positive integer number) if ((a%k) + (b%k))%k = 0. That means, that either ((a%k) + (b%k)) = 0 (1) or ((a%k) + (b%k)) = k (2).
Case (1) ((a%k) + (b%k)) = 0 is possible only if both a and b are multiples of k or a%k=0 and b%k=0. For case (2) , there are at most k/2 possible pairs. So, our task is to pick elements that don't fall in case 1 or 2.
To do this, map each number in your array to its corresponding remainder by modulo k. For this, create a new array remainders in which an index stands for a remainder, and a value stands for numbers having such remainder.
Go over the new array remainders and handle 3 cases.
4.1 If remainders[0] > 0, then we can still pick only one element from the original (if we pick more, then sum of their remainders 0, so they are divisible by k!!!).
4.2 if k is even and remainders[k/2] > 0, then we can also pick only one element (otherwise their sum is k!!!).
4.3 What about the other numbers? Well, for any remainder rem > 0 make sure to pick max(remainders[rem], remainders[k - rem]). You can't pick both since rem + k - rem = k, so numbers from such groups can be divisible by k.
Now, the code:
int nonDivisibleSubset(int k, int s_count, int* s) {
static int remainders[101];
for (int i = 0; i < s_count; i++) {
int rem = s[i] % k;
remainders[rem]++;
}
int maxSize = 0;
bool isKOdd = k & 1;
int halfK = k / 2;
for (int rem = 0; rem <= halfK; rem++) {
if (rem == 0) {
maxSize += remainders[rem] > 0;
continue;
}
if (!isKOdd && (rem == halfK)) {
maxSize++;
continue;
}
int otherRem = k - rem;
if (remainders[rem] > remainders[otherRem]) {
maxSize += remainders[rem];
} else {
maxSize += remainders[otherRem];
}
}
return maxSize;
}

How to find left and right subarray for each index which follows below mentioned criteria?

There is a sequence problem wherein for each index
i in the array we define two quantities.
Let r be the maximum index such that
r>=i and sub-array from i to r (inclusive) is either non-decreasing or non-increasing.
Let l be the minimum index such that l<=i and sub-array from l to i (inclusive) is either non-decreasing or non-increasing.
Now, we define points of an index i to be equal to
max(|Ai−Al|,|Ai−Ar|).
Note that l and r can be different for each index.
The task of the problem is to find the index of the array A which have the maximum points.
My Logic :
First scan all the elements in the array .
For every index find l and r which either follows an increasing or decreasing sequence and then calculate the maximum point for that index.
My problem is that this is taking O(N^2) time.
Can the problem be done in less time?
Two consecutive identical number have the same point and does not affect the point of any other point, so it is possible to assume that this scenario does not exist.
So consider a input array a which has no consecutive identical numbers, it can be assumed that the longest none-decreasing or none-increasing in sub sequence are [0, I1] [I1, I2] ... [Ix, n - 1], which is denoted by index and n is the length of the array. Each decreasing sub sequence is followed by a increasing sub sequence and vice versa.
For any Ii, the point with index Ii have point equal to max(|AIi - AI(i - 1)|, |AIi - AI(i + 1)|). Any index between Ii and I(i + 1) have point less than Ii and I(i + 1) and do not have to be considered.
So we only need to find out the maximum value between all AIi andAI(i + 1) .
After a huge lot of attempts, I finally got my program accepted (mainly because the difference between two int 32 are not necessarily in the range of a signed int 32 range), and the code is as follows.
#include <stdio.h>
#define MAXN 200002
long long a[MAXN];
long long abs(long long n)
{
if (n >= 0)
return n;
return -n;
}
long long find_score(int size)
{
int i = 0;
long long maximum_score = 0;
while (i < size - 1)
{
//Jump over consecutive indentical numbers
while (a[i + 1] == a[i])
{
if (i < size - 1)
i++;
else
break;
}
int j = i + 1;
int inc_or_dec = a[j] > a[i];
while (j < size - 1 && (!((a[j + 1] > a[j]) ^ inc_or_dec) || a[j + 1] == a[j]))j++;
if (abs(a[j] - a[i]) > maximum_score)
maximum_score = abs(a[j] - a[i]);
i = j;
}
return maximum_score;
}
int main()
{
int n;
scanf("%d", &n);
while (n--)
{
int num;
scanf("%d", &num);
for (int i = 0; i < num; i++)
{
scanf("%lld", a + i);
}
printf("%lld\n", find_score(num));
}
while (1);
return 0;
}
Glad to know if there are any "implementation defined" problems in my code.

Multiplication of very large numbers using character strings

I'm trying to write a C program which performs multiplication of two numbers without directly using the multiplication operator, and it should take into account numbers which are sufficiently large so that even the usual addition of these two numbers cannot be performed by direct addition.
I was motivated for this when I was trying to (and successfully did) write a C program which performs addition using character strings, I did the following:
#include<stdio.h>
#define N 100000
#include<string.h>
void pushelts(char X[], int n){
int i, j;
for (j = 0; j < n; j++){
for (i = strlen(X); i >= 0; i--){
X[i + 1] = X[i];
}
X[0] = '0';
}
}
int max(int a, int b){
if (a > b){ return a; }
return b;
}
void main(){
char E[N], F[N]; int C[N]; int i, j, a, b, c, d = 0, e;
printf("Enter the first number: ");
gets_s(E);
printf("\nEnter the second number: ");
gets_s(F);
a = strlen(E); b = strlen(F); c = max(a, b);
pushelts(E, c - a); pushelts(F, c - b);
for (i = c - 1; i >= 0; i--){
e = d + E[i] + F[i] - 2*'0';
C[i] = e % 10; d = e / 10;
}
printf("\nThe answer is: ");
for (i = 0; i < c; i++){
printf("%d", C[i]);
}
getchar();
}
It can add any two numbers with "N" digits. Now, how would I use this to perform multiplication of large numbers? First, I wrote a function which performs the multiplication of number, which is to be entered as a string of characters, by a digit n (i.e. 0 <= n <= 9). It's easy to see how such a function is written; I'll call it (*). Now the main purpose is to multiply two numbers (entered as a string of characters) with each other. We might look at the second number with k digits (assuming it's a1a2.....ak) as:
a1a2...ak = a1 x 10^(k - 1) + a2 x 10^(k - 2) + ... + ak-1 x 10 + ak
So the multiplication of the two numbers can be achieved using the solution designed for addition and the function (*).
If the first number is x1x2.....xn and the second one is y1y2....yk, then:
x1x2...xn x y1y2...yk = (x1x2...xn) x y1 x 10^(k-1) + .....
Now the function (*) can multiply (x1x2...xn) with y1 and the multiplication by 10^(k-1) is just adding k-1 zero's next to the number; finally we add all of these k terms with each other to obtain the result. But the difficulty lies in just knowing how many digits each number contains in order to perform the addition each time inside the loop designed for adding them together. I have thought about doing a null array and each time adding to it the obtained result from multiplication of (x1x2....xn) by yi x 10^(i-1), but like I've said I am incapable of precising the required bounds and I don't know how many zeros I should each time add in front of each obtained result in order to add it using the above algorithm to the null array. More difficulty arises when I'll have to do several conversions from char types into int types and conversely. Maybe I'm making this more complicated than it should; I don't know if there's an easier way to do this or if there are tools I'm unaware of. I'm a beginner at programming and I don't know further than the elementary tools.
Does anyone have a solution or an idea or an algorithm to present? Thanks.
There is an algorithm for this which I developed when doing Small Factorials problem on SPOJ.
This algorithm is based on the elementary school multiplication method. In school days we learn multiplication of two numbers by multiplying each digit of the first number with the last digit of the second number. Then multiplying each digit of the first number with second last digit of the second number and so on as follows:
1234
x 56
------------
7404
+6170- // - is denoting the left shift
------------
69104
What actually is happening:
num1 = 1234, num2 = 56, left_shift = 0;
char_array[] = all digits in num1
result_array[]
while(num2)
n = num2%10
num2 /= 10
carry = 0, i = left_shift, j = 0
while(char_array[j])
i. partial_result = char_array[j]*n + carry
ii. partial_result += result_array[i]
iii. result_array[i++] = partial_result%10
iv. carry = partial_result/10
left_shift++
Print the result_array in reverse order.
You should note that the above algorithm work if num1 and num2 do not exceed the range of its data type. If you want more generic program, then you have to read both numbers in char arrays. Logic will be the same. Declare num1 and num2 as char array. See the implementation:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
char num1[200], num2[200];
char result_arr[400] = {'\0'};
int left_shift = 0;
fgets(num1, 200, stdin);
fgets(num2, 200, stdin);
size_t n1 = strlen(num1);
size_t n2 = strlen(num2);
for(size_t i = n2-2; i >= 0; i--)
{
int carry = 0, k = left_shift;
for(size_t j = n1-2; j >= 0; j--)
{
int partial_result = (num1[j] - '0')*(num2[i] - '0') + carry;
if(result_arr[k])
partial_result += result_arr[k] - '0';
result_arr[k++] = partial_result%10 + '0';
carry = partial_result/10;
}
if(carry > 0)
result_arr[k] = carry +'0';
left_shift++;
}
//printf("%s\n", result_arr);
size_t len = strlen(result_arr);
for(size_t i = len-1; i >= 0; i-- )
printf("%c", result_arr[i]);
printf("\n");
}
This is not a standard algorithm but I hope this will help.
Bignum arithmetic is hard to implement efficiently. The algorithms are quite hard to understand (and efficient algorithms are better than the naive one you are trying to implement), and you could find several books on them.
I would suggest using an existing Bignum library like GMPLib or use some language providing bignums natively (e.g. Common Lisp with SBCL)
You could re-use your character-string-addition code as follows (using user300234's example of 384 x 56):
Set result="0" /* using your character-string representation */
repeat:
Set N = ones_digit_of_multiplier /* 6 in this case */
for (i = 0; i < N; ++i)
result += multiplicand /* using your addition algorithm */
Append "0" to multiplicand /* multiply it by 10 --> 3840 */
Chop off the bottom digit of multiplier /* divide it by 10 --> 5 */
Repeat if multiplier != 0.

How do the functions work?

Could you explain me how the following two algorithms work?
int countSort(int arr[], int n, int exp)
{
int output[n];
int i, count[n] ;
for (int i=0; i < n; i++)
count[i] = 0;
for (i = 0; i < n; i++)
count[ (arr[i]/exp)%n ]++;
for (i = 1; i < n; i++)
count[i] += count[i - 1];
for (i = n - 1; i >= 0; i--)
{
output[count[ (arr[i]/exp)%n] - 1] = arr[i];
count[(arr[i]/exp)%n]--;
}
for (i = 0; i < n; i++)
arr[i] = output[i];
}
void sort(int arr[], int n)
{
countSort(arr, n, 1);
countSort(arr, n, n);
}
I wanted to apply the algorithm at this array:
After calling the function countSort(arr, n, 1) , we get this:
When I call then the function countSort(arr, n, n) , at this for loop:
for (i = n - 1; i >= 0; i--)
{
output[count[ (arr[i]/exp)%n] - 1] = arr[i];
count[(arr[i]/exp)%n]--;
}
I get output[-1]=arr[4].
But the array doesn't have such a position...
Have I done something wrong?
EDIT:Considering the array arr[] = { 10, 6, 8, 2, 3 }, the array count will contain the following elements:
what do these numbers represent? How do we use them?
Counting sort is very easy - let's say you have an array which contains numbers from range 1..3:
[3,1,2,3,1,1,3,1,2]
You can count how many times each number occurs in the array:
count[1] = 4
count[2] = 2
count[3] = 3
Now you know that in a sorted array,
number 1 will occupy positions 0..3 (from 0 to count[1] - 1), followed by
number 2 on positions 4..5 (from count[1] to count[1] + count[2] - 1), followed by
number 3 on positions 6..8 (from count[1] + count[2] to count[1] + count[2] + count[3] - 1).
Now that you know final position of every number, you can just insert every number at its correct position. That's basically what countSort function does.
However, in real life your input array would not contain just numbers from range 1..3, so the solution is to sort numbers on the least significant digit (LSD) first, then LSD-1 ... up to the most significant digit.
This way you can sort bigger numbers by sorting numbers from range 0..9 (single digit range in decimal numeral system).
This code: (arr[i]/exp)%n in countSort is used just to get those digits. n is base of your numeral system, so for decimal you should use n = 10 and exp should start with 1 and be multiplied by base in every iteration to get consecutive digits.
For example, if we want to get third digit from right side, we use n = 10 and exp = 10^2:
x = 1234,
(x/exp)%n = 2.
This algorithm is called Radix sort and is explained in detail on Wikipedia: http://en.wikipedia.org/wiki/Radix_sort
It took a bit of time to pick though your countSort routine and attempt to determine just what it was you were doing compared to a normal radix sort. There are some versions that split the iteration and the actual sort routine which appears to be what you attempted using both countSort and sort functions. However, after going though that exercise, it was clear you had just missed including necessary parts of the sort routine. After fixing various compile/declaration issues in your original code, the following adds the pieces you overlooked.
In your countSort function, the size of your count array was wrong. It must be the size of the base, in this case 10. (you had 5) You confused the use of exp and base throughout the function. The exp variable steps through the powers of 10 allowing you to get the value and position of each element in the array when combined with a modulo base operation. You had modulo n instead. This problem also permeated you loop ranges, where you had a number of your loop indexes iterating over 0 < n where the correct range was 0 < base.
You missed finding the maximum value in the original array which is then used to limit the number of passes through the array to perform the sort. In fact all of your existing loops in countSort must fall within the outer-loop iterating while (m / exp > 0). Lastly, you omitted a increment of exp within the outer-loop necessary to applying the sort to each element within the array. I guess you just got confused, but I commend your effort in attempting to rewrite the sort routine and not just copy/pasting from somewhere else. (you may have copied/pasted, but if that's the case, you have additional problems...)
With each of those issues addressed, the sort works. Look though the changes and understand what it is doing. The radix sort/count sort are distribution sorts relying on where numbers occur and manipulating indexes rather than comparing values against one another which makes this type of sort awkward to understand at first. Let me know if you have any questions. I made attempts to preserve your naming convention throughout the function, with the addition of a couple that were omitted and to prevent hardcoding 10 as the base.
#include <stdio.h>
void prnarray (int *a, int sz);
void countSort (int arr[], int n, int base)
{
int exp = 1;
int m = arr[0];
int output[n];
int count[base];
int i;
for (i = 1; i < n; i++) /* find the maximum value */
m = (arr[i] > m) ? arr[i] : m;
while (m / exp > 0)
{
for (i = 0; i < base; i++)
count[i] = 0; /* zero bucket array (count) */
for (i = 0; i < n; i++)
count[ (arr[i]/exp) % base ]++; /* count keys to go in each bucket */
for (i = 1; i < base; i++) /* indexes after end of each bucket */
count[i] += count[i - 1];
for (i = n - 1; i >= 0; i--) /* map bucket indexes to keys */
{
output[count[ (arr[i]/exp) % base] - 1] = arr[i];
count[(arr[i]/exp)%n]--;
}
for (i = 0; i < n; i++) /* fill array with sorted output */
arr[i] = output[i];
exp *= base; /* inc exp for next group of keys */
}
}
int main (void) {
int arr[] = { 10, 6, 8, 2, 3 };
int n = 5;
int base = 10;
printf ("\n The original array is:\n\n");
prnarray (arr, n);
countSort (arr, n, base);
printf ("\n The sorted array is\n\n");
prnarray (arr, n);
printf ("\n");
return 0;
}
void prnarray (int *a, int sz)
{
register int i;
printf (" [");
for (i = 0; i < sz; i++)
printf (" %d", a[i]);
printf (" ]\n");
}
output:
$ ./bin/sort_count
The original array is:
[ 10 6 8 2 3 ]
The sorted array is
[ 2 3 6 8 10 ]

Resources