I need to pass Two Dimension array to a function as a single pointer. There are different types of approaches are there but due to some constraints(CodeGeneration), I want to pass a single pointer only. I have macros which contain the size of each dimension. I implemented the following way but I am not sure it will work fine for N dimensions also
#define size_1D 3
#define size_2D 3
void fun(int *arr)
{
int i,total_size = size_1D* size_2D;
for(i = 0; i < total_size ; i++)
{
int value = arr[i];
}
}
int main()
{
int arr[size_1D][size_2D] = {{1,2,7},{8,4,9}};
fun(&arr[0][0]);
}
Any loophole is there if I followed the above approach?
void fun(int (*arr)[3]);
or exactly equivalent, but maybe more readable:
void fun(int arr[][3]);
arr is a pointer to two dimensional array with 3 rows and 3 columns. arr decayed to a pointer has the type of a pointer to an array of 3 elements. You need to pass a pointer to an array of 3 elements. You can access the data normally, using arr[a][b].
#define size_1D 3
#define size_2D 3
void fun(int arr[][3])
{
for(int i = 0; i < size_1D ; i++) {
for(int j = 0; j < size_2D ; j++) {
int value = arr[i][j];
}
}
}
int main()
{
int arr[size_1D][size_2D] = {{1,2,7},{8,4,9}};
fun(arr);
}
You can specify the sizes as arguments and use a variable length array declaration inside function parameter list. The compiler will do some job for you.
#include <stdlib.h>
void fun(size_t xmax, size_t ymax, int arr[xmax][ymax]);
// is equivalent to
void fun(size_t xmax, size_t ymax, int arr[][ymax]);
// is equivalent to
void fun(size_t xmax, size_t ymax, int (*arr)[ymax]);
void fun(size_t xmax, size_t ymax, int arr[xmax][ymax])
{
for(int i = 0; i < xmax ; i++) {
for(int j = 0; j < ymax ; j++) {
int value = arr[i][j];
}
}
}
int main()
{
int arr[3][4] = {{1,2,7},{8,4,9}};
fun(3, 4, arr);
}
#edit
We know that the result of array subscript operator is exactly identical to pointer dereference operator of the sum:
a[b] <=> *(a + b)
From pointer arithmetic we know that:
type *pnt;
int a;
pnt + a = (typeof(pnt))(void*)((uintptr_t)(void*)pnt + a * sizeof(*pnt))
pnt + a = (int*)(void*)((uintptr_t)(void*)pnt + a * sizeof(type))
And that the array is equal to the value to the pointer to the first element of an array:
type pnt[A];
assert((uintptr_t)pnt == (uintptr_t)&pnt[0]);
assert((uintptr_t)pnt == (uintptr_t)&*(pnt + 0));
assert((uintptr_t)pnt == (uintptr_t)&*pnt);
So:
int arr[A][B];
then:
arr[x][y]
is equivalent to (ignore warnings, kind-of pseudocode):
*(*(arr + x) + y)
*( *(int[A][B])( (uintptr_t)arr + x * sizeof(int[B]) ) + y )
// ---- x * sizeof(int[B]) = x * B * sizeof(int)
*( *(int[A][B])( (uintptr_t)arr + x * B * sizeof(int) ) + y )
// ---- C11 6.5.2.1p3
*( (int[B])( (uintptr_t)arr + x * B * sizeof(int) ) + y )
*(int[B])( (uintptr_t)( (uintptr_t)arr + x * B * sizeof(int) ) + y * sizeof(int) )
// ---- *(int[B])( ... ) = (int)dereference( ... ) = *(int*)( ... )
// ---- loose braces - conversion from size_t to uintptr_t should be safe
*(int*)( (uintptr_t)arr + x * B * sizeof(int) + y * sizeof(int) )
*(int*)( (uintptr_t)arr + ( x * B + y ) * sizeof(int) )
*(int*)( (uintptr_t)( &*arr ) + ( x * B + y ) * sizeof(int) )
// ---- (uintptr_t)arr = (uintptr_t)&arr[0][0]
*(int*)( (uintptr_t)( &*(*(arr + 0) + 0) ) + ( x * B + y ) * sizeof(int) )
*(int*)( (uintptr_t)( &arr[0][0] ) + ( x * B + y ) * sizeof(int) )
*(int*)( (uintptr_t)&arr[0][0] + ( x * B + y ) * sizeof(int) )
// ---- decayed typeof(&arr[0][0]) = int*
*( &arr[0][0] + ( x * B + y ) )
(&arr[0][0])[x * B + y]
So:
arr[x][y] == (&arr[0][0])[x * B + y]
arr[x][y] == (&arr[0][0])[x * sizeof(*arr)/sizeof(**arr) + y]
On a sane architecture where sizeof(uintptr_t) == sizeof(size_t) == sizeof(int*) == sizeof(int**) and etc., and there is no difference in accessing data behind a int* pointer from accessing data behind int(*)[B] pointer etc. You should be safe with accessing one dimensional array when using a pointer to the first array member, as the operations should be equivalent ("safe" with exception for out-of-bound accesses, that's never safe)
Note, that this is correctly undefined behavior according to C standard and will not work on all architectures. Example: there could be an architecture, where data of the type int[A] are stored in different memory bank then int[A][B] data (by hardware, by design). So the type of the pointer tells the compiler which data bank to choose, so accessing the same data with the same to the value pointer, but with different pointer type, leads to UB, as the compiler chooses different data bank to access the data.
Related
I'm trying to use mkl to compute a equation. But it seem that the array a[] leaks all the time lik this:
44.62 -0.09 -6277438562204192487878988888393020692503707483087375482269988814848.00 -6277438562204192487878988888393020692503707483087375482269988814848.00 -6277438562204192487878988888393020692503707483087375482269988814848.00
-0.09 11.29 -0.09 -6277438562204192487878988888393020692503707483087375482269988814848.00 -6277438562204192487878988888393020692503707483087375482269988814848.00
-6277438562204192487878988888393020692503707483087375482269988814848.00 -0.09 0.18 -0.09 -6277438562204192487878988888393020692503707483087375482269988814848.00
-6277438562204192487878988888393020692503707483087375482269988814848.00 -6277438562204192487878988888393020692503707483087375482269988814848.00 -0.09 11.29 -0.09
-6277438562204192487878988888393020692503707483087375482269988814848.00 -6277438562204192487878988888393020692503707483087375482269988814848.00 -6277438562204192487878988888393020692503707483087375482269988814848.00 -0.09 44.62
And my code is:
#include <stdlib.h>
#include <stdio.h>
#include "mkl_lapacke.h"
/* Auxiliary routines prototypes */
extern void my_print_matrix(char* desc, MKL_INT m, MKL_INT n, double* a, MKL_INT lda, FILE* fpWrite);
extern void print_matrix(char* desc, MKL_INT m, MKL_INT n, double* a, MKL_INT lda);
/* Parameters */
#define N 5//nstep
#define LDA N
#define RMIN -10.0
#define RMAX 10.0
/* Main program */
int main() {
/* Locals */
MKL_INT n = N, lda = LDA, info;
/* Local arrays */
double h = (RMAX - RMIN) / (double(N) + 1.0);;
double xi;
double *w;
double *a;
w= (double*)malloc(sizeof(double) * N);
a = (double*)malloc(sizeof(double) * N*LDA);
for (int i = 0; i < N; i++) {
xi = RMIN + double(1.0+i) * h;
a[i*(N+1)] = 2.0 / h / h+xi * xi;
if (i==0) {
a[1] = -1.0 / h / h;
}
else if (i == N - 1) {
a[LDA * N-2] =- 1.0 / h / h;
}
else {
a[i *(N + 1)+1] = -1.0/h/h;
a[i * (N + 1) - 1] = -1.0/h/h;
}
}
print_matrix("Matrix", n, n, a, lda);
/* Executable statements */
printf("LAPACKE_dsyev (row-major, high-level) Example Program Results\n");
/* Solve eigenproblem */
info = LAPACKE_dsyev(LAPACK_ROW_MAJOR, 'V', 'U', n, a, lda, w);
/* Check for convergence */
if (info > 0) {
printf("The algorithm failed to compute eigenvalues.\n");
exit(1);
}
exit(0);
} /* End of LAPACKE_dsyev Example */
/* Auxiliary routine: printing a matrix */
void print_matrix(char* desc, MKL_INT m, MKL_INT n, double* a, MKL_INT lda) {
MKL_INT i, j;
printf("\n %s\n", desc);
for (i = 0; i < m; i++) {
for (j = 0; j < n; j++) printf(" %6.2f", a[i * lda + j]);
printf("\n");
}
}
The first two elements is seem right but wrong numbers... And the last numbers are a same but very large number. I think it's array leaking,but I don't know how to deal with it. So I ask for help.
And the a[] show is just initialzed,not the result. My problem where is initializing wrong?
Lets take a closer look at how you initialize a:
a[i*(N+1)] = 2.0 / h / h+xi * xi;
if (i==0) {
a[1] = -1.0 / h / h;
}
else if (i == N - 1) {
a[LDA * N-2] =- 1.0 / h / h;
}
else {
a[i *(N + 1)+1] = -1.0/h/h;
a[i * (N + 1) - 1] = -1.0/h/h;
}
Lets take the two cases when i == 0 and i == 1:
i == 0
Here you first do the unconditional initialization
a[i*(N+1)] = 2.0 / h / h+xi * xi;
If we calculate the index i*(N+1) it's 0*(N+1) which is 0. Therefore you will initialize a[0].
Then you have if (i==0) where you initialize a[1].
i == 1
First the unconditional initialization of index i*(N+1), which then is 1*(50+1) which equals 51. So here you initialize a[51].
Then the conditions i==1 and i == N - 1 are both false, so we end up in the final else clause:
a[i *(N + 1)+1] = -1.0/h/h;
a[i * (N + 1) - 1] = -1.0/h/h;
The first index i *(N + 1)+1 will be 1 *(50 + 1)+1 which is 52. So you initialize a[52].
The next index i * (N + 1) - 1 will be 1 * (50 + 1) - 1 which is 50. So you initialize a[50].
This pattern repeats throughout the loop, with ever higher indexes, but never lower.
That means you will never initialize index 2 to 49. These elements will have indeterminate values, and if you're unlucky one of those values could be trap-values which would lead to undefined behavior when using them.
You need to rework your algorithm to initialize all elements of the array a.
regarding memory leakage - please don't forget to free all already allocated memory for *w and * arrays by calling free(a) and free(w) functions.
I am tracing a code but I don't understand how the values are being calculated. My question is on the comments beside the code.
I am tracing code and I understand all the parts except 2. I posted those 2 part below. I will be really glad if someone could help me.
#include <stdio.h>
int function1(int *m, int n)
{
*m = *m + n;
return(*m);
}
int function2(int n, int *m)
{
n = *m + 2;
return(n);
}
int main()
{
int x = 1, y = 3;
int *xptr = &x,*yptr = &y;
x = 1; y = 3;
y = function1(xptr,x);
printf("x = %d, Y = %d\n",x,y);//x=2 but why? shouldn't it be x=1? y=2
x = 1; y = 3;
x = function1(yptr,function2(2,yptr));
printf("x = %d, y = %d\n",x,y);//x=8 y=8 but why? shouldn't y=3?
return 0;
}
So, inside function1:
int function1(int *m, int n) {
*m = *m + n;
return(*m);
} /*
... */
y = function1(xptr,x);
n holds a copy of the value of x,
m holds a copy of the value of xptr, which is the address of x.
*m refers to the contents of the address held by m
...*m = *m + n
That's 'assign to the contents of the address of x: the contents of the address of x, plus the value of x'.
The contents of the address of x (aka it's value) is 1. So we assign 1 + 1 to the contents of address holding the value of x.
In the first printf, x == 2 because when calling function1 the first time *m is a pointer to x, so you assign to the location pointed by m (that is the location of x), *m + n == x + x == 1 + 1 == 2.
In the second printf, y == 8 because in function2 you return its value incremented by 2 (so 3 + 2 == 5) but without updating y, so in function1 you sum the resulting value of function2 to the old value of y (3 + 5 == 8). function1 return also that value, so also x == 8.
I wrote an OpenCL kernel that performs a box blur on an input matrix. The implementation was originally written for a GPU, and uses local memory to store the neighborhoods of work items in a work group. Then, I ran the kernel on a CPU and compared the running times to an implemenation which relied on caching reads from global memory automatically instead of manually storing them in local memory first.
Under the assumption that a CPU has no "local memory" and instead uses RAM, using local memory on a CPU should do more harm than good. However, the "local memory" kernel was faster than the one that relied on caching by 10ms (~112ms vs. ~122ms on a 8192x8192 matrix with Work Item / Work Group / "number of values calculated by each work item" settings deemed optimal for both implementations since they were found by an auto-tuner for both kernels separately).
The kernels were run on a Intel Xeon E5-1620 v2 CPU using an OpenCL intel platform available on the host.
What are reasons for this to happen?
"Local Memory" kernel: Each work item works on a "block" of values. Each block is copied to shared memory, and its neighborhood is copied to local memory depending on where the block is in the work group so no values are copied twice. Then, after the barrier, the final value is calculated.
The code below is the X-direction kernel; the y-direction kernel is exactly the same except for the direction in which the values are inspect to calculate the output value.
__kernel void boxblur_x (__read_only __global float* image,
__local float* localmem,
__write_only __global float* output)
{
// size of input and output matrix
int MATRIX_SIZE_Y = IMAGE_HEIGHT;
int MATRIX_SIZE_X = IMAGE_WIDTH;
int MATRIX_SIZE = MATRIX_SIZE_Y * MATRIX_SIZE_X;
// mask size
int S_L = MASK_SIZE_LEFT;
int S_U = 0;
int S_R = MASK_SIZE_RIGHT;
int S_D = 0;
int SHAPE_SIZE_Y = S_U + S_D + 1;
int SHAPE_SIZE_X = S_L + S_R + 1;
int SHAPE_SIZE = SHAPE_SIZE_Y * SHAPE_SIZE_X;
// tuning parameter
// ---------------------------------------------------------------
//work items in y/x dimension per work group
int NUM_WI_Y = get_local_size(1);
int NUM_WI_X = get_local_size(0);
//size of blocks
int BLOCKHEIGHT = X_BLOCKHEIGHT;
int BLOCKWIDTH = X_BLOCKWIDTH;
//position in matrix
int GLOBAL_POS_X = get_global_id(0) * BLOCKWIDTH;
int GLOBAL_POS_Y = get_global_id(1) * BLOCKHEIGHT;
//localMemory size
int LOCALMEM_WIDTH = S_L + NUM_WI_X * BLOCKWIDTH + S_R;
//position in localmem
int LOCAL_POS_X = S_L + get_local_id(0) * BLOCKWIDTH;
int LOCAL_POS_Y = S_U + get_local_id(1) * BLOCKHEIGHT;
// copy values to shared memory
for (int i = 0; i < BLOCKHEIGHT; i++)
{
for (int j = 0; j < BLOCKWIDTH; j++)
{
localmem[(LOCAL_POS_X + j) + (LOCAL_POS_Y + i) * LOCALMEM_WIDTH] = image[GLOBAL_POS_X + j + (GLOBAL_POS_Y + i) * MATRIX_SIZE_X];
}
}
// only when all work items have arrived here,
// computation continues - otherwise, not all needed
// values might be available in local memory
barrier (CLK_LOCAL_MEM_FENCE);
for (int i = 0; i < BLOCKHEIGHT; i++)
{
for (int j = 0; j < BLOCKWIDTH; j++)
{
float sum = 0;
for (int b = 0; b <= S_L + S_R; b++)
{
sum += localmem[(get_local_id(0) * BLOCKWIDTH + j + b) + (get_local_id(1) * BLOCKHEIGHT + i) * LOCALMEM_WIDTH];
}
// divide by size of mask
float pixelValue = sum / SHAPE_SIZE;
// write new pixel value to output image
output[GLOBAL_POS_X + j + ((GLOBAL_POS_Y + i) * get_global_size(0) * BLOCKWIDTH)] = pixelValue;
}
}
}
"L1 Caching kernel": Despite the many defines, it does exactly the same, but relies on global memory caching of the blocks instead of explicitly managing local memory.
#define WG_BLOCK_SIZE_Y ( OUTPUT_SIZE_Y / NUM_WG_Y )
#define WG_BLOCK_SIZE_X ( OUTPUT_SIZE_X / NUM_WG_X )
#define WI_BLOCK_SIZE_Y ( WG_BLOCK_SIZE_Y / NUM_WI_Y )
#define WI_BLOCK_SIZE_X ( WG_BLOCK_SIZE_X / NUM_WI_X )
#define WG_BLOCK_OFFSET_Y ( WG_BLOCK_SIZE_Y * WG_ID_Y )
#define WG_BLOCK_OFFSET_X ( WG_BLOCK_SIZE_X * WG_ID_X )
#define WI_BLOCK_OFFSET_Y ( WI_BLOCK_SIZE_Y * WI_ID_Y )
#define WI_BLOCK_OFFSET_X ( WI_BLOCK_SIZE_X * WI_ID_X )
#define NUM_CACHE_BLOCKS_Y ( WI_BLOCK_SIZE_Y / CACHE_BLOCK_SIZE_Y )
#define NUM_CACHE_BLOCKS_X ( WI_BLOCK_SIZE_X / CACHE_BLOCK_SIZE_X )
#define CACHE_BLOCK_OFFSET_Y ( CACHE_BLOCK_SIZE_Y * ii )
#define CACHE_BLOCK_OFFSET_X ( CACHE_BLOCK_SIZE_X * jj )
#define reorder(j) ( ( (j) / WI_BLOCK_SIZE_X) + ( (j) % WI_BLOCK_SIZE_X) * NUM_WI_X )
#define reorder_inv(j) reorder(j)
#define view( i, j, x, y ) input[ ((i) + (x)) * INPUT_SIZE_X + ((j) + (y)) ]
#define a_wg( i, j, x, y ) view( WG_BLOCK_OFFSET_Y + (i), WG_BLOCK_OFFSET_X + reorder(j), (x), (y) )
#define a_wi( i, j, x, y ) a_wg( WI_BLOCK_OFFSET_Y + (i), WI_BLOCK_OFFSET_X + (j) , (x), (y) )
#define a_cache( i, j, x, y ) a_wi( CACHE_BLOCK_OFFSET_Y + (i), CACHE_BLOCK_OFFSET_X + (j) , (x), (y) )
#define res_wg( i, j ) output[ (WG_BLOCK_OFFSET_Y + i) * OUTPUT_SIZE_X + WG_BLOCK_OFFSET_X + reorder_inv(j) ]
#define res(i, j) output[ (i) * OUTPUT_SIZE_X + (j) ]
#define res_wg( i, j ) res( WG_BLOCK_OFFSET_Y + (i) , WG_BLOCK_OFFSET_X + reorder_inv(j) )
#define res_wi( i, j ) res_wg( WI_BLOCK_OFFSET_Y + (i) , WI_BLOCK_OFFSET_X + (j) )
#define res_cache( i, j ) res_wi( CACHE_BLOCK_OFFSET_Y + (i), CACHE_BLOCK_OFFSET_X + (j) )
float f_stencil( __global float* input, int ii, int jj, int i, int j )
{
// indices
const int WG_ID_X = get_group_id(0);
const int WG_ID_Y = get_group_id(1);
const int WI_ID_X = get_local_id(0);
const int WI_ID_Y = get_local_id(1);
// computation
float sum = 0;
for( int y = 0 ; y < SHAPE_SIZE_Y ; ++y )
for( int x = 0 ; x < SHAPE_SIZE_X ; ++x)
sum += a_cache(i, j, y, x);
return sum / SHAPE_SIZE;
}
__kernel void stencil( __global float* input,
__global float* output
)
{
//indices
const int WG_ID_X = get_group_id(0);
const int WG_ID_Y = get_group_id(1);
const int WI_ID_X = get_local_id(0);
const int WI_ID_Y = get_local_id(1);
// iteration over cache blocks
for( int ii=0 ; ii < NUM_CACHE_BLOCKS_Y ; ++ii )
for( int jj=0 ; jj < NUM_CACHE_BLOCKS_X ; ++jj )
// iteration within a cache block
for( int i=0 ; i < CACHE_BLOCK_SIZE_Y ; ++i )
for( int j=0 ; j < CACHE_BLOCK_SIZE_X ; ++j )
res_cache( i, j ) = f_stencil( input, ii, jj, i , j );
}
When you combine "L1 cache" version's loops:
for( int ii=0 ; ii < NUM_CACHE_BLOCKS_Y ; ++ii )
for( int jj=0 ; jj < NUM_CACHE_BLOCKS_X ; ++jj )
for( int i=0 ; i < CACHE_BLOCK_SIZE_Y ; ++i )
for( int j=0 ; j < CACHE_BLOCK_SIZE_X ; ++j )
for( int y = 0 ; y < SHAPE_SIZE_Y(SU+SD+1) ; ++y )
for( int x = 0 ; x < SHAPE_SIZE_X(SL+SR+1) ; ++x)
.... += a_cache(i, j, y, x);
and "local" version:
for (int i = 0; i < BLOCKHEIGHT; i++)
for (int j = 0; j < BLOCKWIDTH; j++)
for (int b = 0; b <= S_L + S_R; b++)
... +=input[...]
"a_cache" has a lot of compute
a_cache(i, j, y, x);
becomes
a_wi( CACHE_BLOCK_OFFSET_Y + (i), CACHE_BLOCK_OFFSET_X + (j), x, y )
and that becomes
view( WG_BLOCK_OFFSET_Y + (CACHE_BLOCK_OFFSET_Y + (i)), WG_BLOCK_OFFSET_X + reorder(CACHE_BLOCK_OFFSET_X + (j)), (x), (y) )
and that becomes
view( WG_BLOCK_OFFSET_Y + (CACHE_BLOCK_OFFSET_Y + (i)), WG_BLOCK_OFFSET_X + ( ( (CACHE_BLOCK_OFFSET_X + (j)) / WI_BLOCK_SIZE_X) + ( (CACHE_BLOCK_OFFSET_X + (j)) % WI_BLOCK_SIZE_X) * NUM_WI_X )
, (x), (y) )
and that becomes
input[ ((WG_BLOCK_OFFSET_Y + (CACHE_BLOCK_OFFSET_Y + (i))) + (x)) * INPUT_SIZE_X + ((WG_BLOCK_OFFSET_X + ( ( (CACHE_BLOCK_OFFSET_X + (j)) / WI_BLOCK_SIZE_X) + ( (CACHE_BLOCK_OFFSET_X + (j)) % WI_BLOCK_SIZE_X) * NUM_WI_X) + (y)) ]
this is 9 additions + 2 multiplications + 1 modulo + 1 division.
"local" version has
sum += localmem[(get_local_id(0) * BLOCKWIDTH + j + b) + (get_local_id(1) * BLOCKHEIGHT + i) * LOCALMEM_WIDTH];
which is 4 additions + 3 multiplications but no modulo and no division.
"L1 cache" version needs to keep loop counters for 6 loops and they could be using more cpu-registers or even L1 cache. Data cache size is 128 kB per core or 64 kB per thread. If you launch 1024 threads per core(each core is a work group right?) then 1024 * 6 * 4 = 24kB L1 is needed just for loop counters. This leaves 40kB to use. When you add "const int WG_ID_X" and other variables (5 of them), only 20kB is left. Now add the "f_stencil" functions temporary "stack" variables for its arguments, there may be no L1 cache left, decreasing efficiency. "local" version has about 10-12 variables used(not-used variables maybe optimized out?) and no functions so it may be better for L1.
https://software.intel.com/en-us/node/540486
saying
To reduce the overhead of maintaining a workgroup, you should create
work-groups that are as large as possible, which means 64 and more
work-items. One upper bound is the size of the accessed data set as it
is better not to exceed the size of the L1 cache in a single work
group.
and
If your kernel code contains the barrier instruction, the issue of
work-group size becomes a tradeoff. The more local and private memory
each work-item in the work-group requires, the smaller the optimal
work-group size is. The reason is that a barrier also issues copy
instructions for the total amount of private and local memory used by
all work-items in the work-group in the work-group since the state of
each work-item that arrived at the barrier is saved before proceeding
with another work-item.
you have only 1 barrier in "local" version and before that point, 8 variables are used so not much memory needed to copy?
Let ib be the input base and ob the output base. str is the ASCII representation of some arbitrary large integer x. I need to define f such as:
f(str="1234567890", ib=10, ob=16) = {4, 9, 9, 6, 0, 2, 13, 2}
... where the return type of f is an int array containing the base ob digits of this integer. We assume that 2 >= ob <= MAX_INT and 2 >= ib <= 10, and str will always be a valid string (no negative needed).
Something to get OP started, but enough to leave OP to enjoy the coding experience.
// form (*d) = (*d)*a + b
static void mult_add(int *d, size_t *width, int ob, int a, int b) {
// set b as the carry
// for *width elements,
// x = (Multiply d[] by `a` (using wider than int math) and add carry)
// d[] = x mod ob
// carry = x/ob
// while (carry <> 0)
// widen d
// x = carry
// d[] = x mod ob
// carry = x/ob
}
int *ql_f(const char *src, int ib, int ob) {
// Validate input
assert(ib >= 2 && ib <= 10);
assert(ob >= 2 && ob <= INT_MAX);
assert(src);
// Allocate space
size_t length = strlen(src);
// + 2 + 4 is overkill, OP to validate and right-size later
size_t dsize = (size_t) (log(ib)/log(ob)*length + 2 + 4);
int *d = malloc(sizeof *d * dsize);
assert(d);
// Initialize d to zero
d[0] = 0;
size_t width = 1;
while (*src) {
mult_add(d, &width, ob, ib, *src - '0');
src++;
}
// add -1 to end, TBD code
return d;
}
I wrote this with older specifications, so it's not valid any more, but it might be useful as a starting point.
The code can handle long long magnitudes. Going to arbitrary precision numbers in C is a big leap!
Note using -1 as the ending marker instead of 0. Can accept ib from 2 to 36 and any ob.
Includes example main.
Function f is not reentrant as-is. To make it thread-safe, it could allocate the required memory then return a pointer to it. The simplest protocol would be having the caller responsible for freeing the memory afterwards.
#include <stdlib.h>
#include <limits.h>
#include <stdio.h>
int *f(const char *str, int ib, int ob) {
static int result[CHAR_BIT * sizeof(long long) + 1];
int i = sizeof(result) / sizeof(int) - 1;
long long l = strtoll(str, NULL, ib);
result[i--] = -1;
while (l) {
result[i] = l % ob;
l /= ob;
i--;
}
return result + i + 1;
}
int main()
{
int *x = f("1234567890", 16, 10);
while (*x > -1) {
printf("%d ", *x);
x++;
}
return 0;
}
following this previous question Malloc Memory Corruption in C, now i have another problem.
I have the same code. Now I am trying to multiply the values contained in the arrays A * vc
and store in res. Then A is set to zero and i do a second multiplication with res and vc and i store the values in A. (A and Q are square matrices and mc and vc are N lines two columns matrices or arrays).
Here is my code :
int jacobi_gpu(double A[], double Q[],
double tol, long int dim){
int nrot, p, q, k, tid;
double c, s;
double *mc, *vc, *res;
int i,kc;
double vc1, vc2;
mc = (double *)malloc(2 * dim * sizeof(double));
vc = (double *)malloc(2 * dim * sizeof(double));
vc = (double *)malloc(dim * dim * sizeof(double));
if( mc == NULL || vc == NULL){
fprintf(stderr, "pb allocation matricre\n");
exit(1);
}
nrot = 0;
for(k = 0; k < dim - 1; k++){
eye(mc, dim);
eye(vc, dim);
for(tid = 0; tid < floor(dim /2); tid++){
p = (tid + k)%(dim - 1);
if(tid != 0)
q = (dim - tid + k - 1)%(dim - 1);
else
q = dim - 1;
printf("p = %d | q = %d\n", p, q);
if(fabs(A[p + q*dim]) > tol){
nrot++;
symschur2(A, dim, p, q, &c, &s);
mc[2*tid] = p; vc[2 * tid] = c;
mc[2*tid + 1] = q; vc[2*tid + 1] = -s;
mc[2*tid + 2*(dim - 2*tid) - 2] = p; vc[2*tid + 2*(dim - 2*tid) - 2 ] = s;
mc[2*tid + 2*(dim - 2*tid) - 1] = q; vc[2 * tid + 2*(dim - 2*tid) - 1 ] = c;
}
}
for( i = 0; i< dim; i++){
for(kc=0; kc < dim; kc++){
if( kc < floor(dim/2)) {
vc1 = vc[2*kc + i*dim];
vc2 = vc[2*kc + 2*(dim - 2*kc) - 2];
}else {
vc1 = vc[2*kc+1 + i*dim];
vc2 = vc[2*kc - 2*(dim - 2*kc) - 1];
}
res[kc + i*dim] = A[mc[2*kc] + i*dim]*vc1 + A[mc[2*kc + 1] + i*dim]*vc2;
}
}
zero(A, dim);
for( i = 0; i< dim; i++){
for(kc=0; kc < dim; k++){
if( k < floor(dim/2)){
vc1 = vc[2*kc + i*dim];
vc2 = vc[2*kc + 2*(dim - 2*kc) - 2];
}else {
vc1 = vc[2*kc+1 + i*dim];
vc2 = vc[2*kc - 2*(dim - 2*kc) - 1];
}
A[kc + i*dim] = res[mc[2*kc] + i*dim]*vc1 + res[mc[2*kc + 1] + i*dim]*vc2;
}
}
affiche(mc,dim,2,"Matrice creuse");
affiche(vc,dim,2,"Valeur creuse");
}
free(mc);
free(vc);
free(res);
return nrot;
}
When i try to compile, i have this error :
jacobi_gpu.c: In function ‘jacobi_gpu’:
jacobi_gpu.c:103: error: array subscript is not an integer
jacobi_gpu.c:103: error: array subscript is not an integer
jacobi_gpu.c:118: error: array subscript is not an integer
jacobi_gpu.c:118: error: array subscript is not an integer
make: *** [jacobi_gpu.o] Erreur 1
The corresponding lines are where I store the results in res and A :
res[kc + i*dim] = A[mc[2*kc] + i*dim]*vc1 + A[mc[2*kc + 1] + i*dim]*vc2;
and
A[kc + i*dim] = res[mc[2*kc] + i*dim]*vc1 + res[mc[2*kc + 1] + i*dim]*vc2;
Can someone explain me what is this error and how can i correct it?
Thanks for your help. ;)
mc is of type double. It has to be integral type
mc is pointer to double.
A[mc[2*kc + 1]
In above, you are indexing A with a value in mc (double array). And, there are other similar cases. If you are sure of the values, cast to int
Your declaration of mc:
mc = (double *)malloc(2 * dim * sizeof(double));
And then you use mc multiple times in your array access. For example:
A[mc[2*kc + 1] ...]
Can you change mc to be an int array instead of a double?
Looks like you're using entries in mc, which are doubles, as a part of array subscripts, thus making the entire subscript a double.
If you meant to do this, try casting back to an integer. I don't know what the context of this problem is, but I'd take a real good look at what you're doing to ensure you really want to use the contents of mc as a subscript.
The compiler is complaining because the expression you use as an array index evaluates to type double.
In other words, the expression:
mc[2*kc] + i*dim
...will give you a result which is of type double. You may want to look into the rules for usual arithmetic type conversions in C if you don't understand why this expression evaluates to a double.
The problem is that array indices must be integral types, like int or long. This is because the array subscript operator in C is basically shorthand for pointer arithmetic. In other words, saying array[N] is the same as saying *(array + N). But you can't do pointer arithmetic with non-integral types like float or double, so of course the array subscript operator won't work that way either.
To fix this, you'll need to cast the result of your array-indexing expression to an integral type.
mc is an array of doubles, and floating point values cannot be used to index arrays. I notice that nowhere in your code do you assign anything other than integers to mc. You should consider changing mc's type to an array of integers.