"Hashing" Algorithm Implementation - arrays

I am given an input consisting of 2 parts, the first line consists of 2 numbers denoting the size of a matrix, N and M, followed by the matrix itself, A. The maximum size of the matrix is 1 <= N, M <= 100, and each of the elements of the matrix is 0 <= A[i][j] <= pow(10, 9). The "hash" is calculated by adding the elements in each column, and multiplying all the sums, modulo 1000000007 (pow(10, 9) + 7).
My code is as follows:
#include <stdio.h>
int main() {
int rows, cols;
scanf("%d %d", &rows, &cols); getchar();
unsigned long long int data[rows][cols];
for (int row = 0; row < rows; row++) {
for (int col = 0; col < cols; col++) {
scanf("%llu", &data[row][col]);
}
getchar();
}
unsigned long long int coltot[cols];
for(int i=0;i<cols;i++){coltot[i]=0;}
for (int row = 0; row < rows; row++) {
for (int col = 0; col < cols; col++) {
coltot[col] += data[row][col];
}
}
unsigned long long int colmult, colmulttemp = (coltot[0] % 1000000007);
for (int i = 1; i < cols; i++) {
colmulttemp *= coltot[i];
colmulttemp %= 1000000007;
}
colmult = colmulttemp;
printf("%llu\n", colmult);
return 0;
}
However, upon submission, the results I got indicated that some test cases were failing. The question only gave the following test cases:
stdin:
2 2
1 5
1 5
stdout:
20
stdin:
3 3
1 4 7
2 5 8
3 6 9
stdout:
2160
which my code passses correctly. I did however, go out of my way to attempt to input the maximum values. I wrote a program to print 1000000000 100 times horizontally and 100 times vertically into a text file. However, upon directing the file in, I got a bus error:
$ ./main < data.txt
Bus error (core dumped)
$
Could this be the reason some test cases are failing? Or is the problem elsewhere? Regardless, how would I fix it?
Thank you for your time.
Update: I have found the problem with the bus derror: I forgot to specify the matrix's size when I generated the matrix. It now works, but I found it generates a different result compared to a python3 program, with the C program returning 213129341, whereas the python3 program returned 991047043.
>>> import math
>>> x = pow(10, 9)
>>> xx = x * 100
>>> m = x + 7
>>> r = x
>>> i = 1
>>> while(i < 100):
... i += 1
... r = r * xx
... r = r % m
...
>>> r
991047043
>>> r % m
991047043
>>>
This still doesn't bring me any closer to figuring out why my code is incorrect, unfortunately. Any ideas?
Thank you for your time.

Overflow is possible when multiplying the sums.
Denote the maximal element in the matrix by M (here M = 1000000000, which is approximately 230). M is also approximately equal to your modulo-number. If the matrix is NxN, the sum over a row or column is bounded by M*N. The initialization value for colmulttemp is bounded by M. When multiplying these numbers, we get M2*N as an intermediate result. For M=109 and N=102, this is 1020, which overflows 64-bit numbers.
Fortunately, there is a simple fix - just do your modulo reduction to all your sums, and not only to the first one. Then your intermediate result is bounded by M2, which is just below 264. BTW this is the reason for this particular value of M - it's large enough to make all your numbers impressively big, and small enough to fit intermediate results into 64-bit integers.
for (int col = 0; col < cols; col++) {
coltot[col] %= 1000000007;
}
unsigned long long int colmult,
colmulttemp = coltot[0]; // % 1000000007 not necessary here

Could this be the reason some test cases are failing?
Yes.

Related

How to calculate running time of algorithm?

I am currently learning about Big O Notation running times. I try to calculate the time complexity of some code:
int i = 1;
int n = 3; //this variable is unknown
int j;
while (i<=n)
{
for (j = 1; j < i; j++)
printf_s("*");
j *= 2;
i *= 3;
}
I think that complexity of this code is О(log n). But even if it is correct, I can`t explain why.
The time complexity is not O(log n), it is O(n).
We can calculate that in a structured way. First we examine the inner loop:
for (j = 1; j < i; j++)
printf_s("*");
Here j iterates from 1 to i. So that means that for a given i, it will take i-1 steps.
Now we can look at the outer loop, and we can abstract away the inner loop:
while (i<=n)
{
// ... i-1 steps ...
j *= 2;
i *= 3;
}
So each iteration of the while loop, we perform i-1 steps. Furthermore each iteration the i doubles, until it is larger than n. We thus can say that the number of steps of this algorithm is:
log3 n
---
\ k
/ 3 - 1
---
k=0
We here use k as an extra variable that starts at 0 and each time increments. It thus counts how many times we perform the body of the while loop. It will end when 3^k > n, hence we will iterate log3(n) times, and each iteration the inner loop will resut in 3k-1 steps.
The above sum is equivalent to:
log3 n
---
\ k
-log3 n + / 3
---
k=0
The above is a geometric series [wiki], which is equal to: (1-3log3n)/(1-3), or simplified, it is equal to (nlog33-1)/2, and hence (n-1)/2.
The total number of steps is thus bounded by: (n-1)/2 - log3n, or formulated more simply O(n).
The body of the inner loop is going to be executed 1, 3, 9, 27, ..., 3^k times, where k = ceil(log3(n)).
Here we can use the fact that Σ0 <= i < k3i <= 3k. One can prove it by induction.
So we can say that the inner loop executes no more than 2*3^k times, where 3^k < 3n, which is linear in n, namely O(n).
First of all, you're really calculating the running time, but the number of time-consuming operations. Here, each call to printf_s is one.
Sometimes if you're not good at maths, you can still find the number with experimentation. The algorithm compiled with -O3 is quite fast to be tested with various n. I replaced printf_s with a simple increment to a counter that is then returned from the function, and use unsigned long long as the type. With those changes we get
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#include <inttypes.h>
unsigned long long alg(unsigned long long n) {
unsigned long long rv = 0;
unsigned long long i = 1;
unsigned long long j;
while (i <= n) {
for (j = 1; j < i; j++)
rv += 1;
i *= 3;
}
return rv;
}
int main(void) {
unsigned long long n = 1;
for (n = 1; n <= ULONG_MAX / 10; n *= 10) {
unsigned long long res = alg(n);
printf("%llu %llu %f\n", n, res, res/(double)n);
}
}
the program runs in 0.01 seconds because GCC is clever enough to completely eliminate the inner loop. The output is
1 0 0.000000
10 10 1.000000
100 116 1.160000
1000 1086 1.086000
10000 9832 0.983200
100000 88562 0.885620
1000000 797148 0.797148
10000000 7174438 0.717444
100000000 64570064 0.645701
1000000000 581130714 0.581131
10000000000 5230176580 0.523018
100000000000 141214768216 1.412148
1000000000000 1270932914138 1.270933
10000000000000 11438396227452 1.143840
100000000000000 102945566047294 1.029456
1000000000000000 926510094425888 0.926510
10000000000000000 8338590849833250 0.833859
100000000000000000 75047317648499524 0.750473
1000000000000000000 675425858836496006 0.675426
And from that we can see that the ratio of number of prints to n is not really converging, but it seems to be very much bounded by constants on both sides, thus O(n).

How to generate random long int in C where every digit is non-zero? Moreover the random numbers are repeating

I am making a library management in C for practice. Now, in studentEntry I need to generate a long int studentID in which every digit is non-zero. So, I am using this function:
long int generateStudentID(){
srand(time(NULL));
long int n = 0;
do
{
n = rand() % 10;
}while(n == 0);
int i;
for(i = 1; i < 10; i++)
{
n *= 10;
n += rand() % 10;
}
if(n < 0)
n = n * (-1); //StudentID will be positive
return n;
}
output
Name : khushit
phone No. : 987546321
active : 1
login : 0
StudentID : 2038393052
Wanted to add another student?(y/n)
I wanted to remove all zeros from it. Moreover, when I run the program the first time the random number will be the same as above, and second time random number is same as past runs like e.g:-
program run 1
StudentID : 2038393052
StudentID : 3436731238
program run 2
StudentID : 2038393052
StudentID : 3436731238
What do I need to fix these problems?
You can either do as gchen suggested and run a small loop that continues until the result is not zero (just like you did for the first digit) or accept a small bias and use rand() % 9 + 1.
The problem with the similar sequences has its reason with the coarse resolution of time(). If you run the second call of the function to fast after the first you get the same seed. You might read this description as proposed by user3386109 in the comments.
A nine-digit student ID with no zeros in the number can be generated by:
long generateStudentID(void)
{
long n = 0;
for (int i = 0; i < 9; i++)
n = n * 10 + (rand() % 9) + 1;
return n;
}
This generates a random digit between 1 and 9 by generating a digit between 0 and 8 with (rand() % 9) and adding 1. There's no need to for loops to avoid zeros.
Note that this does not call srand() — you should only call srand() once in a given program (under normal circumstances). Since a long must be at least 32 bits and a 9-digit number only requires 30 bits, there cannot be overflow to worry about.
It's possible to argue that the result is slightly biassed in favour of smaller digits. You could use a function call to eliminate that bias:
int unbiassed_random_int(int max)
{
int limit = RAND_MAX - RAND_MAX % max;
int value;
while ((value = rand()) >= limit)
;
return value % max;
}
If RAND_MAX is 32767 and max is 9, RAND_MAX % 9 is 7. If you don't ignore the values from 32760 upwards, you are more likely to get a digit in the range 0..7 than you are to get an 8 — there are 3642 ways to each of 0..7 and only 3641 ways to get 8. The difference is not large; it is smaller if RAND_MAX is bigger. For the purposes on hand, such refinement is not necessary.
Slightly modify the order of your original function should perform the trick. Instead of removing 0s, just do not add 0s.
long int generateStudentID(){
srand(time(NULL));
long int n = 0;
for(int i = 0; i < 10; i++)
{
long int m = 0;
do
{
m = rand() % 10;
}while(m == 0);
n *= 10;
n += m;
}
//Not needed as n won't be negative
//if(n < 0)
//n = n * (-1); //StudentID will be positive
return n;
}

Find the time complexity of the function "foo"

I'm struggling to find the time complexity of this function:
void foo(int n) {
int i, m = 1;
for (i = 0; i < n; i++) {
m *= n; // (m = n^n) ??
}
while (m > 1) {
m /= 3;
}
}
Well, the first for iteration is clearly O(n^n), the explanation to it is because m started with value 1, and multiplies itself n times.
Now, we start the while loop with m = n^n and we divide it every time by 3.
which means, (I guess), log(n^n).
Assuming I got it right up till now, I'm not sure if I need to sum or multiply, but my logic says I need to sum them, because they are 'odd' to each other.
So my assumption is: O(n^n) + O(log(n^n)) = O(n^n) Because if n is quite big, we can just refrain from O(log(n^n)).
Well, I really made many assumptions here, and I hope that makes sense. I'd love to hear your opinions about the time complexity of this function.
Theoretically, time complexity is O(n log n) because:
for (i=0; i<n; i++)
m *= n;
this will be executed n times and in the end m=n^n
Then this
while (m>1)
m /= 3;
will be executed log3(n^n) times which is n * log3(n):
P.S. But this is only if you count number of operations. In real life it takes much more time to calculate n^n because the numbers become too big. Also your function will overflow when you will be multiplying such big numbers and most probably you will be bounded by the maximum number of int (in which case the complexity will be O(n))
With foo(int n) and 32-bit int, n cannot exceed the magnitude of 10, else m *= n overflows.
Given such a small range that n works, the O() seems moot. Even with 64-bit unsigned m, n <= 15.
So I suppose O(n lg(n)) is technically correct, but given the constraints of int, suspect code took more time to do a single printf() than iterate through foo(10). IOWs it is practically O(1).
unsigned long long foo(int n) {
unsigned long long cnt = 0;
int i;
unsigned long long m = 1;
for (i = 0; i < n; i++) {
if (m >= ULLONG_MAX/n) exit(1);
m *= n; // (m = n^n) ??
cnt++;
}
while (m > 1) {
m /= 3;
cnt++;
}
return cnt;
}
And came up with
1 1
2 3
3 6
4 9
5 12
6 16
7 19
8 23
9 27
10 31
11 35
12 39
13 43
14 47
15 52

Why is my modulo operation giving me bogus values?

When I run this code, I get outputs that don't really make any sense. I'm most likely just missing something, but I've been working on trying to find the problem with my code by working out the problems by hand, but I'm getting the values I should be getting when I do it by hand. A simplified version of this calculation is (c%10^n - c%10^(n-1)) / 10^(n-1).
The goal of this calculation is to assign the digits of a number to an array of ints. I'm not really looking for alternate solutions.
int cNumberV[nLength];
for(int n = nLength; n > 0; n--) {
cNumberV[nLength - n] = (cNumber % (long long) pow(10, n) - cNumber % (long long) pow(10, n - 1)) / (long long) pow(10, n - 1);
printf("%i\n", cNumberV[n]);
}
This is my output when cNumber = 5105105105105100 and nLength = 16:
-1981492631
232830
-1530494976
1188624
-397102900
134514540
-1081801416
1188624
0
1
5
0
1
5
0
1
The problem is that your loop sets cNumberV[nLength - n], but then prints out cNumberV[n].
So the first half of the loop prints uninitialized array entries, and the second half of the loop prints the result of the first half's calculation in reverse order (but due to an off-by-one error as pointed out by rowan.G, it never prints the first digit).
pow() is an expensive and inaccurate floating function. You only
need simple integer divide by ten to get digits. If you really want
to get them left-to-right as you do above, make a lookup table with
the 19 powers of 10 as integers.
#include <stdio.h>
#define nLength 20
long long cNumber = 5105105105105100;
int cNumberV[nLength];
int negative = 0;
int main(int argc, char *argv[]) {
if (cNumber < 0) {
negative = 1;
cNumber = -cNumber;
}
int n;
for (n = nLength - 1; n >= 0; n -= 1) {
cNumberV[n] = cNumber % 10;
cNumber /= 10;
if (0 == cNumber) break;
}
if (negative) printf("-");
for (int i = n; i < nLength; i += 1) {
printf("%1d", cNumberV[i]);
}
printf("\n");
}

Fastest way to calculate all the even squares from 1 to n?

I did this in c :
#include<stdio.h>
int main (void)
{
int n,i;
scanf("%d", &n);
for(i=2;i<=n;i=i+2)
{
if((i*i)%2==0 && (i*i)<= n)
printf("%d \n",(i*i));
}
return 0;
}
What would be a better/faster approach to tackle this problem?
Let me illustrate not only a fast solution, but also how to derive it. Start with a fast way of listing all squares and work from there (pseudocode):
max = n*n
i = 1
d = 3
while i < max:
print i
i += d
d += 2
So, starting from 4 and listing only even squares:
max = n*n
i = 4
d = 5
while i < max:
print i
i += d
d += 2
i += d
d += 2
Now we can shorten that mess on the end of the while loop:
max = n*n
i = 4
d = 5
while i < max:
print i
i += 2 + 2*d
d += 4
Note that we are constantly using 2*d, so it's better to just keep calculating that:
max = n*n
i = 4
d = 10
while i < max:
print i
i += 2 + d
d += 8
Now note that we are constantly adding 2 + d, so we can do better by incorporating this into d:
max = n*n
i = 4
d = 12
while i < max:
print i
i += d
d += 8
Blazing fast. It only takes two additions to calculate each square.
I like your solution. The only suggestions I would make would be:
Put the (i*i)<=n as the middle clause of your for loop, then it's checked earlier and you break out of the loop sooner.
You don't need to check and see if (i*i)%2==0, since 'i' is always positive and a positive squared is always positive.
With those two changes in mind you can get rid of the if statement in your for loop and just print.
Square of even is even. So, you really do not need to check it again. Following is the code, I would suggest:
for (i = 2; i*i <= n; i+=2)
printf ("%d\t", i*i);
The largest value for i in your loop should be the floor of the square root of n.
The reason is that the square of any i (integer) larger than this will be greater than n. So, if you make this change, you don't need to check that i*i <= n.
Also, as others have pointed out, there is no point in checking that i*i is even since the square of all even numbers is even.
And you are right in ignoring odd i since for any odd i, i*i is odd.
Your code with the aforementioned changes follows:
#include "stdio.h"
#include "math.h"
int main ()
{
int n,i;
scanf("%d", &n);
for( i = 2; i <= (int)floor(sqrt(n)); i = i+2 ) {
printf("%d \n",(i*i));
}
return 0;
}

Resources