How to use malloc in a c function? - c

I want to make a C function for FIR filter, It has a two input arrays and one output array.
both input arrays are constant numbers, I want to use them for computation of output of filter,and after computation delete them and just store the output array of function this is my code but it does not work
#include <stdlib.h>
float * filter(float *PATIENTSIGNAL,float *FILTERCOEF, int lengthofpatient , int lengthoffilter ){
static float FIROUT[8000];
int i,j;
float temp=0;
float* SIGNAL;
float* COEF;
SIGNAL = malloc(lengthofpatient *sizeof(float));
COEF = malloc(lengthoffilter*sizeof(float));
}
for (j = 0; j <= lengthofpatient; j++){
temp = SIGNAL[j] * COEF[0];
for (i = 1; i <= lengthoffilter; i++){
if ((j - i) >= 0){
temp += SIGNAL[j - i] * COEF[i];
}
FIROUT[j] = temp;
}
}
free(SIGNAL);
free(COEF);
free(PATIENTSIGNAL);
return FIROUT;
}

There are several problems in your code,
Unnecessary } after line COEF = malloc(lengthoffilter*sizeof(float));.
for (j = 0; j <= lengthofpatient; j++). This will loop once more than required. The same for the i loop. pmg mentioned it in the comment.
temp += SIGNAL[j - i] * COEF[i]; will not give you the desired outcome, as you do not initialized both SIGNAL or COEF.
Whats the purpose of float *PATIENTSIGNAL,float *FILTERCOEF in the function parameter?
From a wild guess, I think you need this two line to initialize SIGNAL and/or COEF.
memccpy(SIGNAL, PATIENTSIGNAL, lengthofpatient);
memccpy(COEF, FILTERCOEF, lengthoffilter);
Don't free PATIENTSIGNAL in your local function. Let this be done by the function caller.

Related

How to print the sum of a passing a int array as a parameter

#include <stdio.h>
int sumofArrayNum(int numList[]);
int main(){
int result,numList[]={23,32,54,23,54,32,3,35};
result = sumofArrayNum(numList);
printf("sum= %d", result);
return 0;
}
int sumofArrayNum(int numList[]){
int sum = 0;
for(int i = 0; i < 10; ++i){
sum += numList[i];
}
return sum;
}
Output is different each time I build and run it.
E.g. output is sum = 1032918821
Expected output I would like is sum = 256
Parameters like int numList[] is the same as int* numList, compiler will not know elements count of it if it was not explicitly defined. By the way, int numList[8] is also the same as int* numList. C language does not check the range of array.
There are some ways to get and check the array size.
size/count parameter
int sumofArrayNum(int numList[], int listSize){
int sum = 0;
for(int i = 0; i < listSize; ++i){
sum += numList[i];
}
return sum;
}
Here listSize should be the count of elements.
And you can use macro to hide the count parameter:
#define sumofArray(array) sumofArrayNum((array), sizeof(array)/sizeof(*array))
point to the whole array
int sumofArrayNum(int (*numList)[8]){
int sum = 0;
for(int i = 0; i < sizeof(*numList)/sizeof(**numList); ++i){
sum += (*numList)[i];
}
return sum;
}
Call it by sending pointer of array:
result = sumofArrayNum(&numList);
Compiler(such as gcc) can do a weak check for this: give a warning if you send an array which are not int (*)[8].
Note that you have to ensure validity of array, and array size must be constant.
Besides,
Output is different each time I build and run it.
It is because only 8 elements has been defined, index range is 0〜7. numList[8] and numList[9] is undefined, mean any value is possible. Maybe used, changed by other process, random and dangerous.
In numlist there are 8 element that means for loop must execute code 8 times.
Your code must be:
for(int i = 0; i < 8; ++i)
{
sum += numList[i];
}
This code iterate until i=7, when i=8 it will end the loop.
Information on for loop

Efficiently print every x iterations in for loop

I am writing a program in which a certain for-loop gets iterated over many many times.
One single iteration doesn't take to long but since the program iterates the loop so often it takes quite some time to compute.
In an effort to get more information on the progress of the program without slowing it down to much I would like to print the progress every xth step.
Is there a different way to do this, than a conditional with a modulo like so:
for(int i = 0; i < some_large_number; i++){
if(i % x == 0)
printf("%f%%\r", percent);
//some other code
.
.
.
}
?
Thanks is advance
This code:
for(int i = 0; i < some_large_number; i++){
if(i % x == 0)
printf("%f%%\r", percent);
//some other code
.
.
.
}
can be restructured as:
/* Partition the execution into blocks of x iterations, possibly including a
final fragmentary block. The expression (some_large_number+(x-1))/x
calculates some_large_number/x with any fraction rounded up.
*/
for (int block = 0, i = 0; block < (some_large_number+(x-1))/x; ++block)
{
printf("%f%%\r", percent);
// Set limit to the lesser of the end of the current block or some_large_number.
int limit = (block+1) * x;
if (some_large_number < limit) limit = some_large_number;
// Iterate the original code.
for (; i < limit; ++i)
{
//some other code
}
}
With the following caveats and properties:
The inner loop has no more work than the original loop (it has no extra variable to count or test) and has the i % x == 0 test completely removed. This is optimal for the inner loop in the sense it reduces the nominal amount of work as much as possible, although real-world hardware sometimes has finicky behaviors that can result in more compute time for less actual work.
New identifiers block and limit are introduced but can be changed to avoid any conflicts with uses in the original code.
Other than the above, the inner loop operates identically to the original code: It sees the same values of i in the same order as the original code, so no changes are needed in that code.
some_large_number+(x-1) could overflow int.
I would do it like this:
int j = x;
for (int i = 0; i < some_large_number; i++){
if(--j == 0) {
printf("%f%%\r", percent);
j = x;
}
//some other code
.
.
.
}
Divide the some_large_number by x. Now loop for x times and nest it with the new integer and then print the percent. I meant this:
int temp = some_large_number/x;
for (int i = 0; i < x; i++){
for (int j = 0; j < temp; j++){
//some code
}
printf("%f%%\r", percent);
}
The fastest approach regarding your performance concern would be to use a nested loop:
unsigned int x = 6;
unsigned int segments = some_large_number / x;
unsigned int y;
for ( unsigned int i = 0; i < segments; i++ ) {
printf("%f%%\r", percent);
for ( unsigned int j = 0; j < x; j++ ) {
/* some code here */
}
}
// If some_large_number can´t be divided evenly through `x`:
if (( y = (some_large_number % x)) != 0 )
{
for ( unsigned int i = 0; i < y; i++ ) {
/* same code as inside of the former inner loop. */
}
}
Another example would be to use a different counting variable for the check to execute the print process by comparing that to x - 1 and reset the variable to -1 if it matches:
unsigned int x = 6;
unsigned int some_large_number = 100000000;
for ( unsigned int i = 0, int j = 0; i < some_large_number; i++, j++ ) {
if(j == (x - 1))
{
printf("%f%%\r", percent);
j = -1;
}
/* some code here */
}

Fisher Yates shuffling algorithm in C

I have been asked for an assignment to use FisherYates shuffle on an array to be taken in from a file (that, I managed to do) using functions.
int FisherYates(int *player, int n) { //implementation of Fisher
int i, j, tmp; // create local variables to hold values for shuffle
for (i = n - 1; i > 0; i--) { // for loop to shuffle
j = rand(); //randomise j for shuffle with Fisher Yates
tmp = player[j];
player[j] = player[i];
player[i] = tmp;
}
return player;
}
It basically just needs to shuffle the list of players and return me the output so I can print it out in main().
I would very much appreciate it if anyone could show me how to modify the code to make it work, since with this version, I get a error at compile time:
invalid conversion from 'int*' to 'int' [-fpermissive]
You already have the result in player, so returning void should work.
Reference for Fisher-Yates
void FisherYates(int *player, int n) { //implementation of Fisher
int i, j, tmp; // create local variables to hold values for shuffle
for (i = n - 1; i > 0; i--) { // for loop to shuffle
j = rand() % (i + 1); //randomise j for shuffle with Fisher Yates
tmp = player[j];
player[j] = player[i];
player[i] = tmp;
}
}
Two quick things about your function:
rand() requires that srand(...) be called to seed the number generator.
...
srand(clock());
for (i=n-1; i>0; i--){ // for loop to shuffle
j = rand()%n; //randomise j for shuffle with Fisher Yates
...
int FisherYates(int *player, int n) is prototyped to return an int, but you are returning pointer to int Options are to do as Tectrendz suggested and just change the prototype to return void (since player is returned in the arguments), or change the function to return an int *. But this would be redundant because (int *player,... allows the value to be returned via the argument.

C programming Pointer and String operations

So I have an assignment where I need to change certain functions by substituting pointer operations for array operations, and by substituting string operations for character operations. Now I have a basic understanding of pointers, arrays, strings, etc. but I cant understand what it is I have to do, and how I should go about doing it. Here is the code:
#include <stdio.h>
#pragma warning(disable: 4996)
// This program exercises the operations of pointers and arrays
#define maxrow 50
#define maxcolumn 50
char maze[maxrow][maxcolumn]; // Define a static array of arrays of characters.
int lastrow = 0;
// Forward Declarations
#define triple(x) x % 3 == 0
void initialization(int, int);
void randommaze(int, int);
void printmaze(int, int);
void initialization(int r, int c) {
int i, j;
for (i = 0; i < r; i++){
maze[i][0] = 'X'; // add border
maze[i][c - 1] = 'X'; // add border
maze[i][c] = '\0'; // add string terminator
for (j = 1; j < c - 1; j++)
{
if ((i == 0) || (i == r - 1))
maze[i][j] = 'X'; // add border
else
maze[i][j] = ' '; // initialize with space
}
}
}
// Add 'X' into the maze at random positions
void randommaze(int r, int c) {
int i, j, d;
for (i = 1; i < r - 1; i++) {
for (j = 1; j < c - 2; j++) {
d = rand();
if (triple(d))
{
maze[i][j] = 'X';
}
}
}
i = rand() % (r - 2) + 1;
j = rand() % (c - 3) + 1;
maze[i][j] = 'S'; // define Starting point
do
{
i = rand() % (r - 2) + 1;
j = rand() % (c - 3) + 1;
} while (maze[i][j] == 'S');
maze[i][j] = 'G'; // define Goal point
}
// Print the maze
void printmaze(int r, int c) {
int i, j;
for (i = 0; i < r; i++) {
for (j = 0; j < c; j++)
printf("%c", maze[i][j]);
printf("\n");
}
}
void main() {
int row, column;
printf("Please enter two integers, which must be greater than 3 and less than maxrow and maxcolomn, respectively\n");
scanf("%d\n%d", &row, &column);
while ((row <= 3) || (column <= 3) || (row >= maxrow) || (column >= maxcolumn)) {
printf("both integers must be greater than 3. Row must be less than %d, and column less than %d. Please reenter\n", maxrow, maxcolumn);
scanf("%d\n%d", &row, &column);
}
initialization(row, column);
randommaze(row, column);
printmaze(row, column);
//encryptmaze(row, column);
//printmaze(row, column);
//decryptmaze(row, column);
//printmaze(row, column);
}
Here are the questions I am struggling on:
Rewrite the function randommaze(row, column) by substituting pointer operations for all array operations. You may not use indexed operation like maze[i][j], except getting the initial value of the pointer.
Rewrite the function printmaze(row, column) by substituting string operations for all character operations.
If someone could please explain to me what I should be doing and how I should be doing it I would really appreciate it. Thanks!
Question 2.:
An array can be used as a pointer to it's first member. So, for example, array[0] and *array return the same thing - the value of the first element of the array. Since arrays are contiguous blocks of memory, if you increment (or add an offset to) a pointer that's pointing to the beginning of an array, you point to the next element of the array. That means that array[1] and *(array + 1) are the same thing.
If you a have a for loop that iterates indexing an array, you could just as well write it using pointer increments. Example:
/* Loop indexing an array */
int my_array [10];
int i = 0;
for(; i < 10; ++i) {
my_array[i] = 0;
}
/* Loop by offsetting a pointer */
int my_array [10];
int i = 0;
int *ptr = my_array; /* Make it point to the first member of the array*/
for(; i < 10; ++i) [
*(ptr + i) = 0;
}
/* Looping by incrementing the pointer */
int my_array [10];
int *ptr = my_array; /* Make it point to the first member of the array */
int *end_ptr = my_array + 10; /* Make a pointer pointing to one past the end of the array */
for(; ptr != end; ++ptr) [
*ptr = 0;
}
All these code examples do the same thing. Assign 0 to all members of the array. If you a have a multidimensional array, just remember that it's still just a contiguous block of memory.
Question 3.:
This question is not so clear to me, so my interpretation of what you're expected to do may be a bit off, but since you're just using printf to print single chars, I'm guessing that you should use a function to output a single char instead. Something like putchar.
Hopefully, this will steer you in the right direction.
It sounds as though you are engaged in a data structures course. The first challenge is to build an array mapping function. For example:
int main(int argc, char **argv)
{
int values[20][40];
values[0][0] = 1;
values[10][10] = 20;
/* Let's print these two ways */
printf("0,0: %d 10,10: %d\n", values[0][0], values[10][10]);
printf("0,0: %d 10,10: %d\n", *((*values) + (sizeof(int) * 0) + sizeof(int) * 0)), *((*values) + (sizeof(int) * 10) + sizeof(int) * 10)));
}
What we are doing is obtaining the address of the very first byte of memory in the 2d array (*values) and then adding a raw number of bytes as an offset to it to locate the value from the "array" that we'd like to access.
One of the main points of an exercise like this is to show you how the language actually works under the hood. This his how array mapping functions work generally and can be used as the basis, for example, for a language or compiler design course later, in addition to fast implementations of far more complex memory structures.
As to the second piece, I'm not super clear on this since there are no actual "string" operations built into C. I'd need a bit more detail there.

Using shared memory in CUDA without reducing threads

Looking at Mark Harris's reduction example, I am trying to see if I can have threads store intermediate values without reduction operation:
For example CPU code:
for(int i = 0; i < ntr; i++)
{
for(int j = 0; j < pos* posdir; j++)
{
val = x[i] * arr[j];
if(val > 0.0)
{
out[xcount] = val*x[i];
xcount += 1;
}
}
}
Equivalent GPU code:
const int threads = 64;
num_blocks = ntr/threads;
__global__ void test_g(float *in1, float *in2, float *out1, int *ct, int posdir, int pos)
{
int tid = threadIdx.x + blockIdx.x*blockDim.x;
__shared__ float t1[threads];
__shared__ float t2[threads];
int gcount = 0;
for(int i = 0; i < posdir*pos; i += 32) {
if (threadIdx.x < 32) {
t1[threadIdx.x] = in2[i%posdir];
}
__syncthreads();
for(int i = 0; i < 32; i++)
{
t2[i] = t1[i] * in1[tid];
if(t2[i] > 0){
out1[gcount] = t2[i] * in1[tid];
gcount = gcount + 1;
}
}
}
ct[0] = gcount;
}
what I am trying to do here is the following steps:
(1)Store 32 values of in2 in shared memory variable t1,
(2)For each value of i and in1[tid], calculate t2[i],
(3)if t2[i] > 0 for that particular combination of i, write t2[i]*in1[tid] to out1[gcount]
But my output is all wrong. I am not even able to get a count of all the times t2[i] is greater than 0.
Any suggestions on how to save the value of gcount for each i and tid ?? As I debug, I find that for block (0,0,0) and thread(0,0,0) I can sequentially see the values of t2 updated. After the CUDA kernel switches focus to block(0,0,0) and thread(32,0,0), the values of out1[0] are re-written again. How can I get/store the values of out1 for each thread and write it to the output?
I tried two approaches so far: (suggested by #paseolatis on NVIDIA forums)
(1) defined offset=tid*32; and replace out1[gcount] with out1[offset+gcount],
(2) defined
__device__ int totgcount=0; // this line before main()
atomicAdd(&totgcount,1);
out1[totgcount]=t2[i] * in1[tid];
int *h_xc = (int*) malloc(sizeof(int) * 1);
cudaMemcpyFromSymbol(h_xc, totgcount, sizeof(int)*1, cudaMemcpyDeviceToHost);
printf("GPU: xcount = %d\n", h_xc[0]); // Output looks like this: GPU: xcount = 1928669800
Any suggestions? Thanks in advance !
OK let's compare your description of what the code should do with what you have posted (this is sometimes called rubber duck debugging).
Store 32 values of in2 in shared memory variable t1
Your kernel contains this:
if (threadIdx.x < 32) {
t1[threadIdx.x] = in2[i%posdir];
}
which is effectively loading the same value from in2 into every value of t1. I suspect you want something more like this:
if (threadIdx.x < 32) {
t1[threadIdx.x] = in2[i+threadIdx.x];
}
For each value of i and in1[tid], calculate t2[i],
This part is OK, but why is t2 needed in shared memory at all? It is only an intermediate result which can be discarded after the inner iteration is completed. You could easily have something like:
float inval = in1[tid];
.......
for(int i = 0; i < 32; i++)
{
float result = t1[i] * inval;
......
if t2[i] > 0 for that particular combination of i, write
t2[i]*in1[tid] to out1[gcount]
This is where the problems really start. Here you do this:
if(t2[i] > 0){
out1[gcount] = t2[i] * in1[tid];
gcount = gcount + 1;
}
This is a memory race. gcount is a thread local variable, so each thread will, at different times, overwrite any given out1[gcount] with its own value. What you must have, for this code to work correctly as written, is to have gcount as a global memory variable and use atomic memory updates to ensure that each thread uses a unique value of gcount each time it outputs a value. But be warned that atomic memory access is very expensive if it is used often (this is why I asked about how many output points there are per kernel launch in a comment).
The resulting kernel might look something like this:
__device__ int gcount; // must be set to zero before the kernel launch
__global__ void test_g(float *in1, float *in2, float *out1, int posdir, int pos)
{
int tid = threadIdx.x + blockIdx.x*blockDim.x;
__shared__ float t1[32];
float ival = in1[tid];
for(int i = 0; i < posdir*pos; i += 32) {
if (threadIdx.x < 32) {
t1[threadIdx.x] = in2[i+threadIdx.x];
}
__syncthreads();
for(int j = 0; j < 32; j++)
{
float tval = t1[j] * ival;
if(tval > 0){
int idx = atomicAdd(&gcount, 1);
out1[idx] = tval * ival
}
}
}
}
Disclaimer: written in browser, never been compiled or tested, use at own risk.....
Note that your write to ct was also a memory race, but with gcount now a global value, you can read the value after the kernel without the need for ct.
EDIT: It seems that you are having some problems with zeroing gcount before running the kernel. To do this, you will need to use something like cudaMemcpyToSymbol or perhaps cudaGetSymbolAddress and cudaMemset. It might look something like:
const int zero = 0;
cudaMemcpyToSymbol("gcount", &zero, sizeof(int), 0, cudaMemcpyHostToDevice);
Again, usual disclaimer: written in browser, never been compiled or tested, use at own risk.....
A better way to do what you are doing is to give each thread its own output, and let it increment its own count and enter values - this way, the double-for loop can happen in parallel in any order, which is what the GPU does well. The output is wrong because the threads share the out1 array, so they'll all overwrite on it.
You should also move the code to copy into shared memory into a separate loop, with a __syncthreads() after. With the __syncthreads() out of the loop, you should get better performance - this means that your shared array will have to be the size of in2 - if this is a problem, there's a better way to deal with this at the end of this answer.
You also should move the threadIdx.x < 32 check to the outside. So your code will look something like this:
if (threadIdx.x < 32) {
for(int i = threadIdx.x; i < posdir*pos; i+=32) {
t1[i] = in2[i];
}
}
__syncthreads();
for(int i = threadIdx.x; i < posdir*pos; i += 32) {
for(int j = 0; j < 32; j++)
{
...
}
}
Then put a __syncthreads(), an atomic addition of gcount += count, and a copy from the local output array to a global one - this part is sequential, and will hurt performance. If you can, I would just have a global list of pointers to the arrays for each local one, and put them together on the CPU.
Another change is that you don't need shared memory for t2 - it doesn't help you. And the way you are doing this, it seems like it works only if you are using a single block. To get good performance out of most NVIDIA GPUs, you should partition this into multiple blocks. You can tailor this to your shared memory constraint. Of course, you don't have a __syncthreads() between blocks, so the threads in each block have to go over the whole range for the inner loop, and a partition of the outer loop.

Resources