After putting efforts i was unable to solve the following question. Question was asked in Graduate Aptitude Test in Engineering (GATE) 2014, India.
Question) For a C program accessing X[i][j][k], the following intermediate code is generated by a compiler. Assume that the size of an integer is 32 bits and size of the character is 8 bits.
t0 = i * 1024
t1 = j * 32
t2 = k * 4
t3 = t1 + t0
t4 = t3 + t2
t5 = X[t4]
Which one of the following statements about the source code for the C program is CORRECT?
(a) X is declared as "int X[32][32][8]".
(b) X is declared as "int X[4][1024][32]".
(c) X is declared as "int X[4][32][8]".
(d) X is declared as "int X[32][16][2]".
One of the book which provide solutions to the previous papers says that the answer is option (a). How? Any explanation
Thanks in advance
t1 is i * (inumInts * sizeof(int)).
So, inumInts * 32 = 1024.
Thus, inumInts = 32.
t1 is j * (jnumInts * (inumInts/sizeof(int)), becasue there is 1 j for every row of i.
So, jnumInts * 1 = 32.
Thus, jnumInts = 32.
t2 is k * (knumInts * (inumInts/sizeof(int) / ((inumInts*jnumInts)/sizeof(int)))).
(because there is one i and i rows of j for every k)
So, knumInts * 1/2 = 4.
Thus, knumInts = 8.
Thus, int X[32][32][8].
Not enough info. I'll try to prove it to you:
To make our life easier, let's divide all values by 4, since that's the size of an integer(considering a character size of 8 bits). That leaves us with:
multiplier of i: 256;
multiplier of j: 8;
multiplier of k: 1.
k must be 1, because it's the last index used, witch means it has to jump only 1 integer to get to the next one in the row.
j, on the other hand, has to jump 8 integers, so it can get to the same position on the next row. That means each row has 8 integers. And we have our value for k. Our array X now looks like: X[i][j][8]
i has to jump through 256 integers to get to the next column. Since a row has 8 integers, and 256/8 = 32, that means each column has 32 rows, leaving array X as: X[i][32][8]
finally, we need to know how many pages the array has. But there's no way to know that, since we would need the full size of the array in bytes, so we can divide it by 256 and then know the number of pages. That leads us back to the beginning of this answer: There's simply not enough info.
Exp: It is given that Size of int is 4B and of char is 1B. The memory is byte addressable.
Let the array be declared as Type X[A][B][C] (where Type = int/char and A,B,C are natural
numbers).
From t0 = i*1024, we conclude that B*C*(size of Type) = 1024.
From t1 = j*32, we conclude that C*(size of Type) = 32.
From t2 = k*4, we conclude that size of Type = 4.
Type = int, and
C = 8, and
B = 32.
The first dimension of the array has no effect on the address calculation. The sizeof(int) does have an effect on the address calculation. So it might help to rewrite answer a) as
X[][32][8][4]
i j k
where the last [4] represents the sizeof(int). So the address calculation is
(k * 4) + (j * 8 * 4) + (i * 32 * 8 * 4) = i * 1024 + j * 32 + k * 4
From this, I would conclude that both a) and c) are correct answers.
Related
in C programming if an 2-D array is given like ( int a[5][3]) and base address and address of particular element (cell ) is also given and have to find index no. of that element(cell) (row and col no.) can we find that? if yes how?
i know the formula of finding address is like this
int a[R][C];
address(a[i][j])=ba+size(C*i+ j);
if ba, R,C,Size and address(a[i][j]) is given... how to find value of i and j?
for finding the value of 2 variable we need 2 equation ..but im not able to find 2nd equation.
The specific address minus the base address gives you the size in bytes, from the base to the specific address.
If you divide that size in bytes with sizeof(ba[0][0]) (or sizeof(int)), you get the number of items.
items / C gives you the first dimension and items % C gives you the second dimension.
Thus:
int ba[R][C];
uintptr_t address = (uintptr_t)&ba[3][2]; // some random item
size_t items = (address - (uintptr_t)ba) / sizeof(ba[0][0]);
size_t i = items / C;
size_t j = items % C;
It is important to carry out the arithmetic with some type that has well-defined behavior, therefore uintptr_t.
If I had done int* address then address - ba would be nonsense, since ba decays into an array pointer of type int(*)[3]. They aren't compatible types.
Use integer division and remainder operators.
If you have the base and a pointer to an element, elt, then there are two things:
In "pure math" terms, you'll have to divide by the size of the elements in the array.
In "C" terms, when you subtract pointers this division is performed for you.
For example:
int a[2];
ptrdiff_t a0 = (ptrdiff_t)&a[0];
ptrdiff_t a1 = (ptrdiff_t)&a[1];
a1 - a0; // likely 4 or 8.
This will likely be 4 or 8 because that's the likely size of int on whatever machine you're using, and because we performed a "pure math" subtraction of two numbers.
But if you let C get involved, it tries to do the math for you:
int a[2];
int * a0 = &a[0];
int * a1 = &a[1];
a1 - a0; // 1
Because C knows the type, and because it's the law, the subtracted numbers get divided by the size of the type automatically, converting the pointer difference into an array-like index or offset.
This is important because it will affect how you do the math.
Now, if you know that the address of elt is base + SIZE * (R * i + j) you can find the answer with integer division (which may be performed automatically for you), subtraction, more integer division, and either modulus or multiply&subtract:
offset or number = elt - base. This will either give you an index (C style) or a numeric (pure math) difference, depending on how you do the computation.
offset = number / SIZE. This will finish the job, if you need it.
i = offset / R. Integer division here - just throw away the remainder.
j = offset - (i*R) OR j = offset % R. Pick what operation you want to use: multiply & subtract, or modulus.
I am trying to immplement big integer addition in CUDA using the following code
__global__ void add(unsigned *A, unsigned *B, unsigned *C/*output*/, int radix){
int id = blockIdx.x * blockDim.x + threadIdx.x;
A[id ] = A[id] + B[id];
C[id ] = A[id]/radix;
__syncthreads();
A[id] = A[id]%radix + ((id>0)?C[id -1]:0);
__syncthreads();
C[id] = A[id];
}
but it does not work properly and also i don't now how to handle the extra carry bit. Thanks
TL;DR build a carry-lookahead adder where each individual additionner adds modulo radix, instead of modulo 2
Additions need incoming carries
The problem in your model is that you have a rippling carry. See Rippling carry adders.
If you were in an FPGA that wouldn't be a problem because they have dedicated logic to do that fast (carry chains, they're cool). But alas, you're on a GPU !
That is, for a given id, you only know the input carry (thus whether you are going to sum A[id]+B[id] or A[id]+B[id]+1) when all the sums with smaller id values have been computed. As a matter of fact, initially, you only know the first carry.
A[3]+B[3] + ? A[2]+B[2] + ? A[1]+B[1] + ? A[0]+B[0] + 0
| | | |
v v v v
C[3] C[2] C[1] C[0]
Characterize the carry output
And each sum also has a carry output, which isn't on the drawing. So you have to think of the addition in this larger scheme as a function with 3 inputs and 2 outputs : (C, c_out) = add(A, B, c_in)
In order to not wait O(n) for the sum to complete (where n is the number of items your sum is cut into), you can precompute all the possible results at each id. That isn't such a huge load of work, since A and B don't change, only the carries. So you have 2 possible outputs : (c_out0, C) = add(A, B, 0) and (c_out1, C') = add(A, B, 1).
Now with all these results, we need to basically implement a carry lookahead unit.
For that, we need to figure out to functions of each sum's carry output P and G :
P a.k.a. all of the following definitions
Propagate
"if a carry comes in, then a carry will go out of this sum"
c_out1 && !c_out0
A + B == radix-1
G a.k.a. all of the following definitions
Generate
"whatever carry comes in, a carry will go out of this sum"
c_out1 && c_out0
c_out0
A + B >= radix
So in other terms, c_out = G or (P and c_in). So now we have a start of an algorithm that can tell us easily for each id the carry output as a function of its carry input directly :
At each id, compute C[id] = A[id]+B[id]+0
Get G[id] = C[id] > radix -1
Get P[id] = C[id] == radix-1
Logarithmic tree
Now we can finish in O(log(n)), even though treeish things are nasty on GPUs, but still shorter than waiting. Indeed, from 2 additions next to each other, we can get a group G and a group P :
For id and id+1 :
step = 2
if id % step == 0, do steps 6 through 10, otherwise, do nothing
group_P = P[id] and P[id+step/2]
group_G = (P[id+step/2] and G[id]) or G[id+step/2]
c_in[id+step/2] = G[id] or (P[id] and c_in[id])
step = step * 2
if step < n, go to 5
At the end (after repeating steps 5-10 for every level of your tree with less ids every time), everything will be expressed in terms of Ps and Gs which you computed, and c_in[0] which is 0. On the wikipedia page there are formulas for the grouping by 4 instead of 2, which will get you an answer in O(log_4(n)) instead of O(log_2(n)).
Hence the end of the algorithm :
At each id, get c_in[id]
return (C[id]+c_in[id]) % radix
Take advantage of hardware
What we really did in this last part, was mimic the circuitry of a carry-lookahead adder with logic. However, we already have additionners in the hardware that do similar things (by definition).
Let us replace our definitions of P and G based on radix by those based on 2 like the logic inside our hardware, mimicking a sum of 2 bits a and b at each stage : if P = a ^ b (xor), and G = a & b (logical and). In other words, a = P or G and b = G. So if we create a intP integer and a intG integer, where each bit is respectively the P and G we computed from each ids sum (limiting us to 64 sums), then the addition (intP | intG) + intG has the exact same carry propagation as our elaborate logical scheme.
The reduction to form these integers will still be a logarithmic operation I guess, but that was to be expected.
The interesting part, is that each bit of the sum is function of its carry input. Indeed, every bit of the sum is eventually function of 3 bits a+b+c_in % 2.
If at that bit P == 1, then a + b == 1, thus a+b+c_in % 2 == !c_in
Otherwise, a+b is either 0 or 2, and a+b+c_in % 2 == c_in
Thus we can trivially form the integer (or rather bit-array) int_cin = ((P|G)+G) ^ P with ^ being xor.
Thus we have an alternate ending to our algorithm, replacing steps 4 and later :
at each id, shift P and G by id : P = P << id and G = G << id
do an OR-reduction to get intG and intP which are the OR of all the P and G for id 0..63
Compute (once) int_cin = ((P|G)+G) ^ P
at each id, get `c_in = int_cin & (1 << id) ? 1 : 0;
return (C[id]+c_in) % radix
PS : Also, watch out for integer overflow in your arrays, if radix is big. If it isn't then the whole thing doesn't really make sense I guess...
PPS : in the alternate ending, if you have more than 64 items, characterize them by their P and G as if radix was 2^64, and re-run the same steps at a higher level (reduction, get c_in) and then get back to the lower level apply 7 with P+G+carry in from higher level
Hi I am new to C programming can anyone please tell me what this line of code would do:
i = (sizeof (X) / sizeof (int))
The code actually works with a case statement when it takes a value of bdata and compares it to different cases.
Generally, such a statement is used to calculate the number of elements in an array.
Let's consider an integer array as below:
int a[4];
Now, when sizeof(a) is done it will return 4*4 = 16 as the size. 4 elements and each element is of 4 bytes.
So, when you do sizeof(a) / sizeof(int), you will get 4 which is the length or size of the array.
It computes the number of elements of the array of int named X.
returns the length of the array X
it computes X's volume in memory divided by the size of an integer in your computer(2 bytes or 4 bytes). If i is integer than it is an integer division. If it is float and X has no even volume, it is real division.
int size can change. X depends on implementation. Division result depends on type of i.
All these means, it computes how many ints fit into X.
Besides common practice or personal experience there is no reason to think that this i = (sizeof (X) / sizeof (int)) computes the size of the array X. Most often probably this is the case but in theory X could be of any type, so the given expression would compute the ratio of the sizes of your var X and an int (how much more memory, in bytes, does your X var occupy with respect to an int)
Moreover, if X was a pointer to an array (float* X, the alternate way of declaring arrays in C) this expression would evaluate to 1 on a 32-bit architecture. The pointer would be 4 bytes and the int also 4 bytes => i = sizeof(X) / sizeof(int) (=1)
How do you partition an array into 2 parts such that the two parts have equal average? Each partition may contain elements that are non-contiguous in the array.
The only algorithm I can think of is exponential can we do better?
You can reduce this problem to the sum-subset problem - also cached here. Here's the idea.
Let A be the array. Compute S = A[0] + ... + A[N-1], where N is the length of A. For k from 1 to N-1, let T_k = S * k / N. If T_k is an integer, then find a subset of A of size k that sums to T_k. If you can do this, then you're done. If you cannot do this for any k, then no such partitioning exists.
Here's the math behind this approach. Suppose there is a partitioning of A such that the two parts have the same average, says X of size x and Y of size y are the partitions, where x+y = N. Then you must have
sum(X)/x = sum(Y)/y = (sum(A)-sum(X)) / (N-x)
so a bit of algebra gives
sum(X) = sum(A) * x / N
Since the array contains integers, the left hand side is an integer, so the right hand side must be as well. This motivates the constraint that T_k = S * k / N must be an integer. The only remaining part is to realize T_k as the sum of a subset of size k.
How do addresses get generated in arrays in C, say how does a [x][y] get to a particular value, i know its not that big a question but just about to actually start coding.
Well it is done depending on your data type of whose array you have considered.
Say for an Integer array, each value holds 4 bytes, thus a row X long will take 4X bytes.
Thus a 2-D matrix of X*Y will be of 4*X*Y Bytes.
Any address say Arry[X][Y] would be
calculated as : (Base Address of Arry)
+ (X * No. of columns) + ( Y // Offset in current row )
2-dimensional arrays in C are rectangular. For example:
int matrix[2][3];
allocates a single block of memory 2*3*sizeof(int) bytes in size. Addressing matrix[0][1] is just a matter of adding 0 * (3 * sizeof(int)) to sizeof(int). Then add that sum to the address at which matrix starts.
A nested array is an array of arrays.
For example, an int[][6] is an array of int[6].
Assuming a 4-byte int, each element in the outer array is 6 * 4 = 24 bytes wide.
Therefore, arr[4] gets the third array in the outer array, or *(arr + 4 * 24).
arr[4] is a normal int[]. arr[4][2] gets the second int in this inner array, or *(arr + 4 * 24 + 2 * 4)
E.g.
char anArray[][13]={"Hello World!","February","John"};
You can visualize it as:
anArray:
H|e|l|l|o| |W|o|r|l|d|!|\0|F|e|b|r|u|a|r|y|\0|\0|\0|\0|\0|J|o|h|n|\0|\0|\0|0|\0
^ ^ ^
0 13 26