Scaling a bitmap image getting segfault - c

Sup guys, learning C and working on a C programming assignment where I am to scale a given bitmap image and I have been stuck on this all day. this is my code thus far but I am getting a segfault and can't figure out why. I've been tracing through the code all day and am just stuck. here is my code of the function to scale, any help would be appreciated
int enlarge(PIXEL* original, int rows, int cols, int scale,
PIXEL** new, int* newrows, int* newcols)
{
int ncols, nrows;
ncols = cols * scale;
nrows = rows * scale;
double xratio =(double) rows / nrows;
double yratio =(double) cols / ncols;
int px, py;
int auxw, cnt;
int i, j;
*new = (PIXEL*)malloc(nrows * ncols * sizeof(PIXEL));
for (i = 0; i < nrows; i++){
auxw = 0;
cnt = 0;
int m = i * 3;
for (j = 0; j < ncols; j++){
px = (int)floor( j * xratio);
py = (int)floor( i * yratio);
PIXEL* o1 = original + ((py*rows + px) *3);
PIXEL* n1 = (*new) + m*ncols + j + auxw;
*n1 = *o1;
PIXEL* o2 = original + ((py*rows + px) *3) + 1;
PIXEL* n2 = (*new) + m*ncols + j + 1 + auxw;
*n2 = *o2;
PIXEL* o3 = original + ((py*rows + px) *3) + 2;
PIXEL* n3 = (*new) + m*ncols + j + 2 + auxw;
*n3 = *o3;
auxw += 2;
cnt++;
}
}
return 0;
}
using the GDB, i get the following :
Program received signal SIGSEGV, Segmentation fault.
0x00000000004013ff in enlarge (original=0x7ffff7f1e010, rows=512, cols=512, scale=2, new=0x7fffffffdeb8,
newrows=0x7fffffffdfb0, newcols=0x0) at maind.c:53
53 *n3 = *o3;
however, I can't understand what exactly the problem is
thanks
EDIT:
Working off code our professor provided for us, a PIXEL is defined as such:
typedef struct {
unsigned char r;
unsigned char g;
unsigned char b;
} PIXEL;
From my understanding i have a 2 dimensional array where each element of that array contains a 3 element PIXEL array.
Also, when tracing my code on paper, I added the auxw logic in order to advance down the array. It works somewhat in the same way as multiplying by 3.

Is your array a cols X rows array of PIXEL objects -- or is it actually an cols X rows X 3 array of PIXEL objects where what you call a pixel is actually really a component channel of a pixel? Your code isn't clear. When accessing the original array, you multiply by 3, suggesting an array of 3 channels:
PIXEL* o1 = original + ((py*rows + px) *3);
But when accessing the (*new) array there is no multiplication by 3, instead there's some logic I cannot follow with auxw:
PIXEL* n1 = (*new) + m*ncols + j + auxw;
auxw += 2;
Anyway, assuming that what you call a pixel is actually a channel, and that there are the standard 3 RGB channels in each pixel, you need to allocate 3 times as much memory for your array:
*new = (PIXEL*)malloc(nrows * ncols * 3*sizeof(PIXEL));
Some additional issues:
int* newrows and int* newcols are never initialized. You probably want to initialize them to the values of nrows and ncols
If PIXEL is really a CHANNEL, then rename it to correctly express its meaning.
Rather than copying logic for multidimensional array pointer arithmetic all over the place, protect yourself from indexing off your pixel/channel/whatever arrays by using a function:
#include "assert.h"
PIXEL *get_channel(PIXEL *pixelArray, int nRows, int nCols, int nChannels, int iRow, int iCol, int iChannel)
{
if (iRow < 0 || iRow >= nRows)
{
assert(!(iRow < 0 || iRow >= nRows));
return NULL;
}
if (iCol < 0 || iCol >= nCols)
{
assert(!(iRow < 0 || iRow >= nRows));
return NULL;
}
if (iChannel < 0 || iChannel >= nChannels)
{
assert(!(iChannel < 0 || iChannel >= nChannels));
return NULL;
}
return pixelArray + (iRow * nCols + iCol) * nChannels + iChannel;
}
Later, once your code is fully debugged, if performance is a problem you can replace the function with a macro in release mode:
#define get_channel(pixelArray, nRows, nCols, nChannels, iRow, iCol, iChannel)\
((pixelArray) + ((iRow) * (nCols) + (iCol)) * (nChannels) + (iChannel))
Another reason to use a standard get_channel() function is that your pointer arithmetic is inconsistent:
PIXEL* o1 = original + ((py*rows + px) *3);
PIXEL* n1 = (*new) + m*ncols + j + auxw;
to access the original pixel, you do array + iCol * nRows + iRow, which looks good. But to access the *new array, you do array + iCol * nCols + iRow, which looks wrong. Make a single function to access any pixel array, debug it, and use it.
Update
Given your definition of the PIXEL struct, it is unnecessary for you to be "adding those +1 and +2 values allowed me to reach the second and third element of the PIXEL struct." Since PIXEL is a struct, if you have a pointer to one you access its fields using the -> operator:
PIXEL *p_oldPixel = get_pixel(old, nRowsOld, nColsOld, iRowOld, iColOld);
PIXEL *p_newPixel = get_pixel(*p_new, nRowsNew, nColsNew, iRowNew, iColNew);
p_newPixel->r = p_oldPixel->r;
p_newPixel->g = p_oldPixel->g;
p_newPixel->b = p_oldPixel->b;
Or, in this case you can use the assignment operator to copy the struct:
*p_newPixel = *p_oldPixel;
As for indexing through the PIXEL array, since your pointers are correctly declared as PIXEL *, the C compiler's arithmetic will multiply offsets by the size of the struct.
Also, I'd recommend clarifying your code by using clear and consistent naming conventions:
Use consistent and descriptive names for loop iterators and boundaries. Is i a row or a column? Why use i in one place but py in another? A consistent naming convention helps to ensure you never mix up your rows and columns.
Distinguish pointers from variables or structures by prepending "p_" or appending "_ptr". A naming convention that clearly distinguishes pointers can make instances of pass-by-reference more clear, so (e.g.) you don't forget to initialize output arguments.
Use the same syllable for all variables corresponding to the old and new bitmaps. E.g. if you have arguments named old, nRowsOld and nColsOld you are less likely to accidentally use nColsOld with the new bitmap.
Thus your code becomes:
#include "assert.h"
typedef struct _pixel {
unsigned char r;
unsigned char g;
unsigned char b;
} PIXEL;
PIXEL *get_pixel(PIXEL *pixelArray, int nRows, int nCols, int iRow, int iCol)
{
if (iRow < 0 || iRow >= nRows)
{
assert(!(iRow < 0 || iRow >= nRows));
return NULL;
}
if (iCol < 0 || iCol >= nCols)
{
assert(!(iRow < 0 || iRow >= nRows));
return NULL;
}
return pixelArray + iRow * nCols + iCol;
}
int enlarge(PIXEL* old, int nRowsOld, int nColsOld, int scale,
PIXEL **p_new, int *p_nRowsNew, int *p_nColsNew)
{
int nColsNew = nColsOld * scale;
int nRowsNew = nRowsOld * scale;
double xratio =(double) nRowsOld / nRowsNew;
double yratio =(double) nColsOld / nColsNew;
int iRowNew, iColNew;
*p_new = malloc(nRowsNew * nColsNew * sizeof(PIXEL));
*p_nRowsNew = nRowsNew;
*p_nColsNew = nColsNew;
for (iRowNew = 0; iRowNew < nRowsNew; iRowNew++){
for (iColNew = 0; iColNew < nColsNew; iColNew++){
int iColOld = (int)floor( iColNew * xratio);
int iRowOld = (int)floor( iRowNew * yratio);
PIXEL *p_oldPixel = get_pixel(old, nRowsOld, nColsOld, iRowOld, iColOld);
PIXEL *p_newPixel = get_pixel(*p_new, nRowsNew, nColsNew, iRowNew, iColNew);
*p_newPixel = *p_oldPixel;
}
}
return 0;
}
I haven't tested this code yet, but by using consistent naming conventions one can clearly see what it is doing and why it should work.

Related

Accessing 2D data in a CUDA kernel [duplicate]

This question already has an answer here:
2D array with CUDA and cudaMallocPitch
(1 answer)
Closed 1 year ago.
I'm doing an assignment for my university, and the main Idea is to compare CUDA Data parallelism with CUDA Task parallelism. I came up with an idea to parallelize the Conway's game of life. The problem is, I cannot figure out how to navigate through an 2D array in CUDA in multiple directions, i.e. above/under/right/left and the corners around the cell, which the kernel evaluates.
So far I came up with following:
The first Kernel Code
//determines the alive cell and save value of each cell into an array
__global__ void numAliveAround(int *oldBoard, int *newBoard, int xSize, int ySize, size_t pitchOld, size_t pitchNew)
{
int x = (blockIdx.x * blockDim.x) + threadIdx.x;
int y = (blockIdx.y * blockDim.y) + threadIdx.y;
if(x < xSize && y < ySize)
{
//cell above
//xMod is to make sure the number wraps when it overflows the board
xMod = ((x + 1) % xSize + xSize) % xSize;
//idx calculation
idx = xMod * xSize + y;
outputNumber += board[idx];
//more of the same code, just for cell under, left, right, and corners
newBoard[x * xSize + y] = outputNumber;
}
}
The second Kernel code
//sets new cell status according to the number of alive cells around
__global__ void determineNextState(int *board, int *newBoard, int xSize, int ySize, size_t pitchOld, size_t pitchNew)
{
//getting threads
int x = (blockIdx.x * blockDim.x) + threadIdx.x;
int y = (blockIdx.y * blockDim.y) + threadIdx.y;
if (x < xSize && y < ySize)
{
int idxNew = x * xSize + y;
int idxOld = x * xSize + y;
int state = board[idxOld];
//ALIVE = 1, DEAD = 0;
int output = DEAD;
//checking if any alive condition is met
if (state == ALIVE)
{
if ((newBoard[idxNew] == 2 || newBoard[idxNew] == 3))
{
output = ALIVE;
}
}
else
{
if (newBoard[idxNew] == 3)
{
output = ALIVE;
}
}
newBoard[idxNew] = output;
}
}
Kernel calling function
void SendToCUDA(int oldBoard[COLUMNS][ROWS], int newBoard[COLUMNS][ROWS])
{
//CUDA pointers
int *d_oldBoard;
int *d_newBoard;
size_t pitchOld;
size_t pitchNew;
cudaMallocPitch(&d_oldBoard, &pitchOld, COLUMNS * sizeof(int), ROWS);
cudaMallocPitch(&d_newBoard, &pitchNew, COLUMNS * sizeof(int), ROWS);
cudaMemcpy2D(d_oldBoard, pitchOld, oldBoard, COLUMNS * sizeof(int), COLUMNS * sizeof(int), ROWS, cudaMemcpyHostToDevice);
dim3 grid(divideAndRound(COLUMNS, BLOCKSIZE_X), divideAndRound(ROWS, BLOCKSIZE_Y));
dim3 block(BLOCKSIZE_Y, BLOCKSIZE_X);
printf("counting \n");
numberAliveAround <<<block, grid>>> (d_oldBoard, d_newBoard, COLUMNS, ROWS, pitchOld, pitchNew);
cudaDeviceSynchronize();
printf("determining \n");
determineNextState <<<block, grid>>> (d_oldBoard, d_newBoard, COLUMNS, ROWS, pitchOld, pitchNew);
cudaDeviceSynchronize();
//using newBoard later (outside the function) to display the Board
cudaMemcpy2D(newBoard, COLUMNS * sizeof(int), d_newBoard, pitchNew, COLUMNS * sizeof(int), ROWS, cudaMemcpyDeviceToHost);
cudaFree(d_oldBoard);
cudaFree(d_newBoard);
}
I found multiple ways of accessing flattened 2d array, of which some contradict each other, like:
//what is usually used as an exmplanation
idx = x * widht + y;
//sometimes x and y are swapped
idx = y * width + x;
//what works with simple access
int *value = (int *)((char *)(d_matrix + y * pitch)) + x;
//or
idx = x * xDim + y + pitch;
the funny thing is that 2 later ones work when I just access a single point in the array (for example increase all the values in it by 1) but completely do not work with more complex navigation. I've been sitting on this Problem for quite some time at this point. So any kind of insight would be extremely helpful.
I did figured out the answer, namely the correct way of accessing a 2D array after cudaMalloc2D is:
board[y * (pitch / sizeof(int)) + x]
because pitch is the length in bytes, therefore when one indexes an array through [] operator, one must first align it with the data type.
pitch / sizeof(datatype)
Later I found even more Issues with this code, so please don't just copy it.

Find k out of n subset with maximal area

I have n points and have to find the maximum united area between k points (k <= n). So, its the sum of those points area minus the common area between them.
]1
Suppose we have n=4, k=2. As illustrated in the image above, the areas are calculated from each point to the origin and, the final area is the sum of the B area with the D are (only counting the area of their intersection once). No point is dominated
I have implemented a bottom-up dynamic programming algorithm, but it has an error somewhere. Here is the code, that prints out the best result:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct point {
double x, y;
} point;
struct point *point_ptr;
int n, k;
point points_array[1201];
point result_points[1201];
void qsort(void *base, size_t nitems, size_t size,
int (*compar)(const void *, const void *));
int cmpfunc(const void *a, const void *b) {
point *order_a = (point *)a;
point *order_b = (point *)b;
if (order_a->x > order_b->x) {
return 1;
}
return -1;
}
double max(double a, double b) {
if (a > b) {
return a;
}
return b;
}
double getSingleArea(point p) {
return p.x * p.y;
}
double getCommonAreaX(point biggest_x, point new_point) {
double new_x;
new_x = new_point.x - biggest_x.x;
return new_x * new_point.y;
}
double algo() {
double T[k][n], value;
int i, j, d;
for (i = 0; i < n; i++) {
T[0][i] = getSingleArea(points_array[i]);
}
for (j = 0; j < k; j++) {
T[j][0] = getSingleArea(points_array[0]);
}
for (i = 1; i < k; i++) {
for (j = 1; j < n; j++) {
for (d = 0; d < j; d++) {
value = getCommonAreaX(points_array[j - 1], points_array[j]);
T[i][j] = max(T[i - 1][j], value + T[i - 1][d]);
}
}
}
return T[k - 1][n - 1];
}
void read_input() {
int i;
fscanf(stdin, "%d %d\n", &n, &k);
for (i = 0; i < n; i++) {
fscanf(stdin, "%lf %lf\n", &points_array[i].x, &points_array[i].y);
}
}
int main() {
read_input();
qsort(points_array, n, sizeof(point), cmpfunc);
printf("%.12lf\n", algo());
return 0;
}
with the input:
5 3
0.376508963445 0.437693410334
0.948798695015 0.352125307881
0.176318878234 0.493630156084
0.029394902328 0.951299438575
0.235041868262 0.438197791997
where the first number equals n, the second k and the following lines the x and y coordinates of every point respectively, the result should be: 0.381410589193,
whereas mine is 0.366431740966. So I am missing a point?
This is a neat little problem, thanks for posting! In the remainder, I'm going to assume no point is dominated, that is, there are no points c such that there exists a point d with c.x < d.x and c.y < d.y. If there are, then it is never optimal to use c (why?), so we can safely ignore any dominated points. None of your example points are dominated.
Your problem exhibits optimal substructure: once we have decided which item is to be included in the first iteration, we have the same problem again with k - 1, and n - 1 (we remove the selected item from the set of allowed points). Of course the pay-off depends on the set we choose - we do not want to count areas twice.
I propose we pre-sort all point by their x-value, in increasing order. This ensures the value of a selection of points can be computed as piece-wise areas. I'll illustrate with an example: suppose we have three points, (x1, y1), ..., (x3, y3) with values (2, 3), (3, 1), (4, .5). Then the total area covered by these points is (4 - 3) * .5 + (3 - 2) * 1 + (2 - 0) * 3. I hope it makes sense in a graph:
By our assumption that there are no dominated points, we will always have such a weakly decreasing figure. Thus, pre-sorting solves the entire problem of "counting areas twice"!
Let us turn this into a dynamic programming algorithm. Consider a set of n points, labelled {p_1, p_2, ..., p_n}. Let d[k][m] be the maximum area of a subset of size k + 1 where the (k + 1)-th point in the subset is point p_m. Clearly, m cannot be chosen as the (k + 1)-th point if m < k + 1, since then we would have a subset of size less than k + 1, which is never optimal. We have the following recursion,
d[k][m] = max {d[k - 1][l] + (p_m.x - p_l.x) * p_m.y, for all k <= l < m}.
The initial cases where k = 1 are the rectangular areas of each point. The initial cases together with the updating equation suffice to solve the problem. I estimate the following code as O(n^2 * k). The term squared in n can probably be lowered as well, as we have an ordered collection and might be able to apply a binary search to find the best subset in log n time, reducing n^2 to n log n. I leave this to you.
In the code, I have re-used my notation above where possible. It is a bit terse, but hopefully clear with the explanation given.
#include <stdio.h>
typedef struct point
{
double x;
double y;
} point_t;
double maxAreaSubset(point_t const *points, size_t numPoints, size_t subsetSize)
{
// This should probably be heap allocated in your program.
double d[subsetSize][numPoints];
for (size_t m = 0; m != numPoints; ++m)
d[0][m] = points[m].x * points[m].y;
for (size_t k = 1; k != subsetSize; ++k)
for (size_t m = k; m != numPoints; ++m)
for (size_t l = k - 1; l != m; ++l)
{
point_t const curr = points[m];
point_t const prev = points[l];
double const area = d[k - 1][l] + (curr.x - prev.x) * curr.y;
if (area > d[k][m]) // is a better subset
d[k][m] = area;
}
// The maximum area subset is now one of the subsets on the last row.
double result = 0.;
for (size_t m = subsetSize; m != numPoints; ++m)
if (d[subsetSize - 1][m] > result)
result = d[subsetSize - 1][m];
return result;
}
int main()
{
// I assume these are entered in sorted order, as explained in the answer.
point_t const points[5] = {
{0.029394902328, 0.951299438575},
{0.176318878234, 0.493630156084},
{0.235041868262, 0.438197791997},
{0.376508963445, 0.437693410334},
{0.948798695015, 0.352125307881},
};
printf("%f\n", maxAreaSubset(points, 5, 3));
}
Using the example data you've provided, I find an optimal result of 0.381411, as desired.
From what I can tell, you and I both use the same method to calculate the area, as well as the overall concept, but my code seems to be returning a correct result. Perhaps reviewing it can help you find a discrepancy.
JavaScript code:
function f(pts, k){
// Sort the points by x
pts.sort(([a1, b1], [a2, b2]) => a1 - a2);
const n = pts.length;
let best = 0;
// m[k][j] represents the optimal
// value if the jth point is chosen
// as rightmost for k points
let m = new Array(k + 1);
// Initialise m
for (let i=1; i<=k; i++)
m[i] = new Array(n);
for (let i=0; i<n; i++)
m[1][i] = pts[i][0] * pts[i][1];
// Build the table
for (let i=2; i<=k; i++){
for (let j=i-1; j<n; j++){
m[i][j] = 0;
for (let jj=j-1; jj>=i-2; jj--){
const area = (pts[j][0] - pts[jj][0]) * pts[j][1];
m[i][j] = Math.max(m[i][j], area + m[i-1][jj]);
}
best = Math.max(best, m[i][j]);
}
}
return best;
}
var pts = [
[0.376508963445, 0.437693410334],
[0.948798695015, 0.352125307881],
[0.176318878234, 0.493630156084],
[0.029394902328, 0.951299438575],
[0.235041868262, 0.438197791997]
];
var k = 3;
console.log(f(pts, k));

Porting C to Go, having trouble understanding some pointer syntax

I'm currently porting some C (as part of a wider R package) to Go. Because the C in question is used as part of an R package, it has to make extensive use of pointers. The R package is changepoint.np.
As somebody who isn't experienced in C, I've managed to understand most of it. However, the following code has me a bit stumped:
double *sumstat; /* matrix in R: nquantile rows, n cols */
int *n; /* length of data */
int *minseglen; /* minimum segment length */
int *nquantiles; /* num. quantiles in empirical distribution */
...[abridged for brevity]...
int j;
int isum;
double *sumstatout;
sumstatout = (double *)calloc(*nquantiles,sizeof(double));
for (j = *minseglen; j < (2*(*minseglen)); j++) {
for (isum = 0; isum < *nquantiles; isum++) {
*(sumstatout+isum) = *(sumstat+isum+(*nquantiles*(j))) - *(sumstat+isum+(*nquantiles*(0)));
}
}
Specifically, this line (in the inner for loop):
*(sumstatout+isum) = *(sumstat+isum+(*nquantiles*(j))) - *(sumstat+isum+(*nquantiles*(0)));
I've read various pages and Stackoverflow questions/answers about C pointers and arrays, and if I understood them correctly, this line would be translated into Go as:
n := len(data)
nquantiles := int(4 * math.Log(float64(len(data))))
sumstatout[isum] = sumstat[isum*n + nquantiles*j] - sumstat[isum*n + nquantiles*0]
Where n is the number of columns (*n in the C code), and nquantiles is the number of rows (*nquantiles in the C code).
However this produces an error (index out of range, obviously) where the original code does not.
Where am I going wrong?
In the line:
sumstatout[isum] = sumstat[isum*n + nquantiles*j] - sumstat[isum*n + nquantiles*0]
I see two strange things:
1) Where did the n in isum*n come from? The n is not part of the orginal expression.
2) nquantiles is a pointer in the original code so it can't be used that way.
In C it should rather be:
sumstatout[isum] = sumstat[isum + *nquantiles*j] - sumstat[isum]
The original C code treats a (contiguous) memory area as a 2D matrix. Like this:
int i, j;
int cols = ..some number..;
int rows = ..some number..;
double* matrix = malloc(cols * rows * sizeof *matrix);
for (i = 0; i < rows; ++i)
for (j = 0; j < rows; ++j)
*(matrix + i*cols + j) = ... some thing ...;
^^^^^^ ^^^
Move to row i Move to column j
That is equivalent to:
int i, j;
int cols = ..some number..;
int rows = ..some number..;
double matrix[rows][cols];
for (i = 0; i < rows; ++i)
for (j = 0; j < cols; ++j)
matrix[i][j] = ... some thing ...;

Multiply by supernodal L in CHOLMOD?

How can I multiply by the cholmod_factor L in a supernodal L L^T factorisation? I'd prefer not to convert to simplicial since the supernodal representation results in faster backsolves, and I'd prefer not to make a copy of the factor since two copies might not fit in RAM.
I wound up understanding the supernodal representation from a nice comment in the supernodal-to-simplicial helper function in t_cholmod_change_factor.c. I paraphrase the comment and add some details below:
A supernodal Cholesky factorisation is represented as a collection of supernodal blocks. The entries of a supernodal block are arranged in column-major order like this 6x4 supernode:
t - - - (row s[pi[snode+0]])
t t - - (row s[pi[snode+1]])
t t t - (row s[pi[snode+2]])
t t t t (row s[pi[snode+3]])
r r r r (row s[pi[snode+4]])
r r r r (row s[pi[snode+5]])
There are unused entries (indicated by the hyphens) in order to make the matrix rectangular.
The column indices are consecutive.
The first ncols row indices are those same consecutive column indices. Later row indices can refer to any row below the t triangle.
The super member has one entry for each supernode; it refers to the first column represented by the supernode.
The pi member has one entry for each supernode; it refers to the first index in the s member where you can look up the row numbers.
The px member has one entry for each supernode; it refers to the first index in the x member where the entries are stored. Again, this is not packed storage.
The following code for multiplication by a cholmod_factor *L appears to work (I only care about int indices and double-precision real entries):
cholmod_dense *mul_L(cholmod_factor *L, cholmod_dense *d) {
int rows = d->nrow, cols = d->ncol;
cholmod_dense *ans = cholmod_allocate_dense(rows, cols, rows,
CHOLMOD_REAL, &comm);
memset(ans->x, 0, 8 * rows * cols);
FOR(i, L->nsuper) {
int *sup = (int *)L->super;
int *pi = (int *)L->pi;
int *px = (int *)L->px;
double *x = (double *)L->x;
int *ss = (int *)L->s;
int r0 = pi[i], r1 = pi[i+1], nrow = r1 - r0;
int c0 = sup[i], c1 = sup[i+1], ncol = c1 - c0;
int px0 = px[i];
/* TODO: Use BLAS instead. */
for (int j = 0; j < ncol; j++) {
for (int k = j; k < nrow; k++) {
for (int l = 0; l < cols; l++) {
((double *)ans->x)[l * rows + ss[r0 + k]] +=
x[px0 + k + j * nrow] * ((double *)d->x)[l*rows+c0 + j];
}
}
}
}
return ans;
}

Making pascal's triangle with mpz_t's

Hey, I'm trying to convert a function I wrote to generate an array of longs that respresents Pascal's triangles into a function that returns an array of mpz_t's. However with the following code:
mpz_t* make_triangle(int rows, int* count) {
//compute triangle size using 1 + 2 + 3 + ... n = n(n + 1) / 2
*count = (rows * (rows + 1)) / 2;
mpz_t* triangle = malloc((*count) * sizeof(mpz_t));
//fill in first two rows
mpz_t one;
mpz_init(one);
mpz_set_si(one, 1);
triangle[0] = one; triangle[1] = one; triangle[2] = one;
int nums_to_fill = 1;
int position = 3;
int last_row_pos;
int r, i;
for(r = 3; r <= rows; r++) {
//left most side
triangle[position] = one;
position++;
//inner numbers
mpz_t new_num;
mpz_init(new_num);
last_row_pos = ((r - 1) * (r - 2)) / 2;
for(i = 0; i < nums_to_fill; i++) {
mpz_add(new_num, triangle[last_row_pos + i], triangle[last_row_pos + i + 1]);
triangle[position] = new_num;
mpz_clear(new_num);
position++;
}
nums_to_fill++;
//right most side
triangle[position] = one;
position++;
}
return triangle;
}
I'm getting errors saying: incompatible types in assignment for all lines where a position in the triangle is being set (i.e.: triangle[position] = one;).
Does anyone know what I might be doing wrong?
mpz_t is define as an array of length 1 of struct __mpz_struct, which prevents assignment. This is done because normal C assignment is a shallow copy and the various gmp numeric types store pointers to arrays of "limbs" that need to be deep copied. You need to use mpz_set or mpz_init_set (or even mpz_init_set_si) to assign MP integers, making sure you initialize the destination before using the former.
Also, you should call mpz_clear at most once for every mpz_init (they're like malloc and free in this regard, and for the same reasons). By calling mpz_init(new_nom) in the outer loop mpz_clear(new_num) in the inner loop, you're introducing a bug which will be evident when you examine the results of make_triangle. However, you don't even need new_num; initialize the next element of triangle and use it as the destination of mpz_add.
mpz_init(triangle[position]);
mpz_add(triangle[position++], triangle[last_row_pos + i], triangle[last_row_pos + i + 1]);
Small numeric optimization: you can update last_row_pos using an addition and subtraction rather than two subtractions, a multiplication and division. See if you can figure out how.

Resources