I am writing in C this simple program, using xcode.
#include <stdio.h>
int main()
{
double A[200][200][2][1][2][2];
int B[200][200][2][1][2];
int C[200][200][2][1][2];
double D[200][200][2][1][2];
double E[200][200][2][1][2];
double F[200][200][2][1][2];
double G[200][200];
double H[200][200][2];
double I[200][200];
double L[50];
printf("%d",B);
return 0;
}
I get the following message attached to printf("%d",B);
Thread 1: EXC_BAD_ACCESS (code=2, address= ….)
So basically is telling me that I messed up with the memory. How can that be possible?
BUT, if I comment
// int C[200][200][2][1][2];
it works perfectly.
Any clue? It should not be a problem with Xcode since in Eclipse does not printf anything in any case.
The default stack size on Mac OS X is 8 MiB (8,192 KiB) — try ulimit -s or ulimit -a in a terminal.
You have an array of doubles running at about 2.5 MiB (200 x 200 x 2 x 1 x 2 x 2 x sizeof(double)). You have 3 other arrays of double that are half that size; you have 2 arrays of int that are 1/4 the size. These add up to 7.560 MB (7.4 MiB). Even G, H and I are using a moderate amount of stack space: in aggregate, they're as big as D, so they use about 1 MiB.
The sum of these sizes is too big for the stack. You can make them file scope arrays, or you can dynamically allocate them with malloc() et al.
Why on earth would you have a dimension of [1]? You can only ever write 0 as a valid subscript, but then why bother?
I'm not quite sure why you observe EXC_BAD_ACCESS. But your code is quite broken. For a start, you pass B to printf and ask it to format it as an integer. It is not an integer. You should use %p if you want to treat it as a pointer.
The other problem is that your local variables will be allocated on the stack. And they are so big that they will overflow the stack. The biggest is A which is sizeof(double)*200*200*2*1*2*2 which is 2,560,000 bytes. You cannot expect to allocate such large arrays on the stack. You'll need to switch to using dynamically allocated arrays.
Related
#include <stdio.h>
#define max_size 100
float array[max_size];
int n,counter;
int main(){
printf("Enter the size of the array...\n");
scanf("%d",&n);
for (counter=0; counter<n; counter++){
printf("%p\t",&array[counter]);
}
printf("\n\n");
return 0;
}
I am just experimenting with this C program and I am trying to verify that the size of a float is 8 bytes. But upon running this code with 5 elements in array, I get the address of these elements as following:
Enter the size of the array...
5
0x555555755040 0x555555755044 0x555555755048 0x55555575504c 0x555555755050
As you can see for the first float number, my system has allocated memory space ...40,41,42,43 which is 4 bits of space if I am not wrong. But the float data type is supposed to have 8 bytes of space for it. I am thinking that the program should have allocated memory space ...40,41,...4F for 2 bytes of space. So
...40-...4F //for first 2 bytes
...50-...5F //for second 2 bytes
...60-...6F //for third 2 bytes
...70-...7F //for last 2 bytes
So the second address would start at ...80. But this is not the result I am obtaining. What am I missing in this process? Thank you for the help !!
C standard does not say anything about the storage size of float and it has been purposely left out to the implementer.
Maybe on your system and compiler the size is 4. You can check that out by using sizeof(float). See also this discussion.
This question already has answers here:
Getting a stack overflow exception when declaring a large array
(8 answers)
Closed 4 years ago.
I want to write a program in which I want to initialize integer array of size 987654321 for storing values of 1 and 0 only,
here is my program
#include <stdio.h>
#include <stdlib.h>
int main(){
int x,y,z;
int limit = 987654321;
int arr[limit];
for (x = 0;x < limit;x++){
printf("%d \n",arr[x]);
}
return 0;
}
but it gives segmentation fault
987654321 is certainly too big for a local variable.
If you need a dynamically sized array of that size you need to use malloc like:
int limit = 987654321;
int *arr = malloc(limit * sizeof(*arr));
if (arr == NULL)
{
... display error message and quit
}
...
free(arr); // free it once you're dont with the array
BTW are you aware that your array uses roughly 4 gigabytes of memory assuming the size of int is 4 on your platform?
Since you want to store values of 1 and 0 only, and these values require only one bit, you can use a bit array instead of an integer array.
The size of int is 4 bytes (32 bits) usually, so you can reduce the memory required by a factor of 32.
So instead of about 4 GB, you will only need about 128 MB of memory. Resources on how to implement a bit array can be found online. One such implementation is here.
I wrote an OpenCL matrix multiplication kernel, which will do the multiplication of two square matrices.
The kernel code is
void kernel product(global const float* A, global const float* B, global float* C, int n){
size_t kx=get_global_id(0);
size_t ky=get_global_id(1);
for(int i=0; i<n; i++){
C[n*kx+ky]=C[n*kx+ky]+A[n*kx+i]*B[n*i+ky];
}
}
The host code that launches the kernel is
// create buffer on the context
int n=1000;
cl::Buffer buffer_A(context,CL_MEM_READ_ONLY,sizeof(float)*(n*n));
cl::Buffer buffer_B(context,CL_MEM_READ_ONLY,sizeof(float)*(n*n));
cl::Buffer buffer_C(context,CL_MEM_READ_WRITE,sizeof(float)*(n*n));
float* A=new float[n*n];
float* B=new float[n*n];
float* C=new float[n*n];
for (int i=0; i<n; i++) {
for (int j=0; j<n; j++) {
A[n*i+j]=2.0;
B[n*i+j]=2.0;
}
}
//create the kernel, and set the buffer argument
cl::Kernel kernel(program,"product");
kernel.setArg(0, buffer_A);
kernel.setArg(1, buffer_B);
kernel.setArg(2, buffer_C);
kernel.setArg(3, n);
//build the queue
cl::Device device_use=all_devices[0];
cl::CommandQueue queue(context,device_use);
// queue manipulation: step 1: write the input buffer
queue.enqueueWriteBuffer(buffer_A, CL_TRUE, 0, sizeof(float)*(n*n), A);
queue.finish();
queue.enqueueWriteBuffer(buffer_B, CL_TRUE, 0, sizeof(float)*(n*n), B);
queue.finish();
// queue manipulation: Step 2 run kernel
queue.enqueueNDRangeKernel(kernel, cl::NullRange, cl::NDRange(n,n), cl::NullRange);
queue.finish();
Notice that A,B,C are square matrices with dimension n*n. I tried to run this kernel on the Intel Iris graphics card on Macbook pro. It works well when n is small. However, when n is 2000 or larger, it will give the wrong result. The maximum global work size is (512,512,512) for this gpu. So 2000*2000 certainly doesn't exceed the maximum. When I tried to run the kernel on cpu, I can always get the right result no matter how large n is. So the kernel should be right. Any ideas on what happened?
It seems there are several problems here. I'll try to address all of them (some might already be addressed in my comments).
OpenCL does not guarantee proper initialization of global memory. Some devices may initalize with zero, but some won't. However your code does rely on that because you're reading from global memory before a single value has been written to it: C[n*kx+ky]=C[n*kx+ky]+A[n*kx+i]*B[n*i+ky];. Additionally you're needlessly accessing global memory. You should not save the intermediate result in global memory, but rather in fast private memory (see improved kernel code which also handles the fact that C is not initialized).
You seem to be rather unclear about how OpenCL local and global work-sizes are handled, so I'll talk about this a bit.
Work-size limitations (your work-size must fulfuill all these requirements):
CL_DEVICE_MAX_WORK_ITEM_SIZES returns the maximum local work-size per dimension. So each dimension of your local work-size must be equal or smaller to the corresponding values. Example: CL_DEVICE_MAX_WORK_ITEM_SIZES returns [512,512,512], so a local work-size of [512,2,1] is legal, as is [2,512,1]. However [1024,1,1] would be illegal as it violates the maximum size for the first dimension.
CL_DEVICE_MAX_WORK_GROUP_SIZE returns the number of maximum work-items per work-group your device supports, i.e. the maximum number of work-items within your local work-size. If CL_DEVICE_MAX_WORK_GROUP_SIZE returns 1024, [512,2,1] is legal, as is [1024,1,1] but [1024,2,1] is illegal as 1024*2 > 1024.
CL_KERNEL_WORK_GROUP_SIZE returns the number of maximum work-items per work-group your device supports for this specific kernel. This usually is the same as CL_DEVICE_MAX_WORK_GROUP_SIZE, but it can be lower with kernels that use a lot of private and/or local memory.
Your global work-size must be a multiple of your local work-size. This may seem a trivial thing if the size of your matrix is [2000,2000]. You choose your global to be the same size, OpenCL calculates the local work-size for you. I'll probably be [16,16] because those are the biggest divisors for 2000 and still yield a local work-size below 512. But consider this: your matrix is of size [905,905]. OpenCL will have to choose a local work-size of [1,1], which is the worst case ever in regard of performance (unless your device is smart enough to compensate for this bad working-size). 905 can't be evenly divided by any integer other than 1. Note that I could be wrong about this, but after reading a lot about OpenCL I suspect this is how it "has to" calculate the working-sizes. So, in order to get a high performance the work-groups generally should be no smaller than 64, but on modern devices 256 is a very good value. So you should calculate the global work-size from these values and adjust your kernel so it can handle more work-items than elements that need to be processed. Example: You want a work-group with size [16,16] = 256, but your matrix has 1000 rows and columns. Thus your global work-size should be [1024,1024] and your kernel should discard all work-items that are not needed. If you still want OpenCL to choose the local work-size, just change the global work-size to a multiple of 128 or 256 to avoid degenerate local work-group sizes.
Kernel code:
void kernel product(global const float* A, global const float* B, global float* C, int n)
{
size_t kx=get_global_id(0);
size_t ky=get_global_id(1);
// Discard work-items that are not needed.
if(kx >= n || ky >= n)
return;
float result = 0.f;
int idxC = n*kx+ky;
for(int i=0; i<n; ++i)
{
int idxA = n*kx+i;
int idxB = n*i+ky;
result += A[idxA]*B[idxB];
}
C[idxC] = result;
}
Kernel code end
I've always experienced the same problems with the Intel integrated graphics on my own Macbook Pro, as do my colleagues. This could be due to the kernel execution taking too long and thus being killed by the driver in order to free up the GPU for other tasks (such as rendering to the display). Alternatively, it could just be a bug in Apple's OpenCL implementation (which has always been pretty flaky in our experience).
I am trying to get code that was working on Linux to also work on my Windows 7.
When I retried the same code, it crashed with stack overflow. I then removed everything I could to find out the line which is causing it to crash, and it left me with this:
#include <stdio.h>
#include <stdlib.h>
#include <cuda_runtime.h>
/* 256k == 2^18 */
#define ARRAY_SIZE 262144
#define ARRAY_SIZE_IN_BYTES (sizeof(float) * (ARRAY_SIZE))
int main(void)
{
float a[ARRAY_SIZE] = { };
float result = 0;
printf("sum was: %f (should be around 1 300 000 with even random distribution)\n", result);
return 0;
}
If I change ARRAY_SIZE to 256, the code runs fine. However with the current value, the float a[ARRAY_SIZE] line crashes runtime with stack overflow. It doesn't matter if I use float a[ARRAY_SIZE]; or float a[ARRAY_SIZE] = { };, they both crash the same way.
Any ideas what could be wrong?
Using Visual Studio 2010 for compiling.
Ok, the stack sizes seem to be explained here, saying 1M is the default on Windows.
Apparently it can be increased in VS 2010 by going Properties -> Linker -> System -> Stack Reserve Size and giving it some more. I tested and the code works by pumping up the stack to 8M.
In the long run I should probably go the malloc way.
Your array is too large to fit into the stack, try using the heap:
float *a;
a = malloc(sizeof(float) * ARRAY_SIZE);
Segmentation fault when allocating large arrays on the stack
Well, let me guess. I've heard default stack size on Windows is 1 MB. Your ARRAY_SIZE_IN_BYTES is exactly 1 MB btw (assuming float is 4 bytes). So probably that's the reason
See this link: C/C++ maximum stack size of program
I'm currently learning to program in C. In one of the tasks in my assignment, I have to make a histogram (drawn by basic console output, like this: http://img703.imageshack.us/img703/448/histogram.jpg) to measure the number of characters in a text file (standard for this assignment is 1.3 MB). I did make a function like this:
int *yAxisAverageMethod(int average, int max, int min)
{
int *yAxis;
int i=0;
for (i=0;i<20;i++)
{
*(yAxis+i)=0;
}
/*
int length=sizeof(data)/sizeof(int);
*/
int lower_half_interval=average/10;
int upper_half_interval=(max-average)/10;
int current_y_value=min;
for (i=0;i<11;i++)
{
if (i==10){
*(yAxis+10)=average;
break;
}
*(yAxis+i)=current_y_value;
current_y_value+=lower_half_interval;
}
current_y_value+=average+upper_half_interval;
printf("Current y value:%d\n",current_y_value);
printf("Current max value:%d\n",max);
for (i=11;i<20;i++)
{
*(yAxis+i)=current_y_value;
current_y_value+=upper_half_interval;
}
return yAxis;
}
In this function, I intend to return an array of 20 integers, in order to make a y axis. I find the average of all characters, then used 20 lines of the console to display it. The lower 10 lines are used to display the lower than average values of the total amount of characters, and 10 lines are used to display the upper part. Each step in the y axis of the lower half is calculated by (average - min)/10, and each step in the y axis of the upper part is calculated by (max - average)/10. This is my method to draw the histogram, because I want to display the variants between values.
In the main method, I have this function call:
int *yAxis;
yAxis=yAxisAverageMethod(average,max,min);
I got a segmentation fault when I ran the function. In netbean GCC++ compiler, it works fine. Howerver, when I ran it on the university machines (which I have to compile it on command line and edit in Vi), I got the error. I guess it is because Netbean has its own memory manager? I don't understand.
Edited: I will ask about merge sort in anothe question.
*yAxis is a wild pointer. You never allocate memory for the int array you want to use.
int *yAxis = malloc(sizeof(int) * 20);
You are returning a pointer to nothing.
Where inside the function do you tell the computer to reserve some memory for *yAxis?
yAxis is a point, and you did not initialize it. it will point to unknown space what depends on the compiler. you should apply some memory for it firstly.
yAxis = malloc(sizeof(int)*20);
Don't forget to free() it in the caller.