MPI - Parallel dot product calculation

MPI - Parallel dot product calculation - c

I'm struggling to modify a program that takes two files as input (each representing a vector) and calculates the dot product between them. It's supposed to be done in parallel, but I was told that the number of points in each file might not be evenly divisible by the number of available processors and each process might read from incorrect positions within the files. What I mean is that, if there are four processors, the first 250 points might be correctly read and calculated but the second processor might read over those same 250 points and provide an incorrect result. This is what I've done so far. Any modifications I've made are noted.
#include "fstream"
#include "stdlib.h"
#include "stdio.h"
#include "iostream"
#include "mpi.h"
int main(int argc, char *argv[]){
MPI_Init(&argc, argv);
//parse command line arguments
if( argc < 3 || argc > 3){
std::cout << "*** syntax: " << argv[0] << " vecFile1.txt vecFile2.txt" << std::endl;
return(0);
}
//get input file names
std::string vecFile1(argv[1]);
std::string vecFile2(argv[2]);
//open file streams
std::ifstream vecStream1(vecFile1.c_str());
std::ifstream vecStream2(vecFile2.c_str());
//check that streams opened properly
if(!vecStream1.is_open() || !vecStream2.is_open()){
std::cout << "*** Could not open Files ***" << std::endl;
return(0);
}
//if files are open read their lengths and make sure they are compatible
long vecLength1 = 0; vecStream1 >> vecLength1;
long vecLength2 = 0; vecStream2 >> vecLength2;
if( vecLength1 != vecLength2){
std::cout << "*** Vectors are no the same length ***" << std::endl;
return(0);
}
int numProc; //New variable for managing number of processors
MPI_Comm_size(&numProc,MPI_COMM_WORLD); //Added line to obtain number of processors
int subDomainSize = (vecLength1+numProc-1)/numProc; //Not sure if this is correct calculation; meant to account for divisibility with remainders
//read in the vector components and perform dot product
double dotSum = 0.;
for(long i = 0; i < subDomainSize; i++){ //Original parameter used was vecLength1; subDomainSize used instead for each process
double ind1 = 0.; vecStream1 >> ind1;
double ind2 = 0.; vecStream2 >> ind2;
dotSum += ind1*ind2;
}
std::cout << "VECTOR DOT PRODUCT: " << dotSum << std::endl;
MPI_Finalize();
}
Aside from those changes, I don't know where to go from here. What can I do to make this program properly calculate a dot product of two vectors using paralleling processing with two text files as input? Each contains 100000 points so it's impractical to manually modify the files.

I wont write the code here as it seems to be an assignment problem but I would try to give you some tips to go into right direction.
Each processor has an assigned rank that can be found out using the MPI_Comm_rank API. So for parallel processing you can divide the vectors of the files among the processors such that processor with rank r processes the vectors r*subdomainsize to (r+1)*subdomainsize - 1.
You need to make sure that the vector from correct position is read from the file by a particular processor. Use seek api to go to the right offset and then call the read(>>) operator of your filestream.
For calculating subdomainsize I am not sure whether the equation you mentioned works or not. There can be several approaches. The simplest is to use vectorlength/numProc as subdomainsize. Each processor can handle subdomainsize elements, however the last processor (rank == numProc) will handle the remaining elements.
After the for loop, you should use a reduction operation to collect the individual sums from the processors and sum it up globally for the final result. See MPI_Reduce.
Use Barrier for synchronization between the processors. A barrier must be placed after the for loop and before calling reduction.

Related

What's a performant and clean way to parse a binary file in C?

I'm parsing a custom binary file structure for which I know the format.
The general idea is that each file is broken up into blocks of sequential bytes, which I want to separate and decode in parallel.
I'm looking for a readable, performant alternative to decode_block()
Here's what I'm currently working with:
#include <stdio.h>
int decode_block(uint8_t buffer[]);
int main(){
FILE *ptr;
ptr = fopen("example_file.bin", "rb");
if (!ptr){
printf("can't open.\n");
return 1;
}
int block1_size = 2404;
uint8_t block1_buffer[block1_size];
fread(block1_buffer, sizeof(char), block1_size, ptr);
int block2_size = 3422;
uint8_t block2_buffer[block2_size];
fread(block2_buffer, sizeof(char), block2_size, ptr);
fclose(ptr);
//Do these in parallel
decode_block(block1_buffer);
decode_block(block2_buffer);
return 0;
}
int decode_block(uint8_t buffer[]){
unsigned int size_of_block = (buffer[3] << 24) + (buffer[2] << 16) + (buffer[1] << 8) + buffer[0];
unsigned int format_version = buffer[4];
unsigned int number_of_values = (buffer[8] << 24) + (buffer[7] << 16) + (buffer[6] << 8) + buffer[5];
unsigned int first_value = (buffer[10] << 8) + buffer[9];
// On and on and on
int ptr = first_value
int values[number_of_values];
for(int i = 0; i < number_of_values; i++){
values[i] = (buffer[ptr + 3] << 24) + (buffer[ptr + 2] << 16) + (buffer[ptr + 1] << 8) + buffer[ptr];
ptr += 4
}
// On and on and on
return 0
}
It feels a little redundant to be reading the entire file into a byte array and then interpreting the array byte by byte. Also it makes for very bulky code.
But since I need to operate on multiple parts of the file in parallel I can't think of another way to do this. Also, is there a simpler or faster way to convert the early bytes in buffer into their respected metadata values?

I'd:
use "memory mapped files" to avoid loading the raw data (e.g. mmap() in POSIX systems). Note that this is not portable "plain C", but almost every OS supports a way to do this.
make sure that the file format specification requires that the values are aligned to a 4-byte boundary in the file and (if you actually do need to support signed integers) that the values are stored in "2's compliment" format (and not "sign and magnitude" or anything else)
check that the file complies with the specification as much as possible (not just the alignment requirement, but including things like "data can't start in middle of header", "data start + entries * entry_size can't exceed file's size", "version not recognized", etc).
have different code for little-endian machines (e.g. where which code is used may be selected at compile time with an #ifdef), where you can cast the memory mapped file's data to int32_t (or uint32_t). Note that the code you've shown (e.g. (buffer[ptr + 3] << 24) + (buffer[ptr + 2] << 16) + (buffer[ptr + 1] << 8) + buffer[ptr]) is broken for negative numbers (even on "2's compliment" machines); so the alternative code (for "not little-endian" cases) will be more complicated (and slower) than yours is. Of course if you don't need to support negative numbers you should not be using any signed integer type (e.g. int), and quite frankly you shouldn't be using "possibly 16 bit" int for 32-bit values anyway.
determine how many threads you should use (maybe command line argument; maybe by asking OS how many CPUs the computer actually has). Start the threads and tell them which "thread number" they are (where existing thread is number 0, first spawned thread is number 1, etc).
let the threads calculate their starting and ending offset (in the memory mapped file) from their "thread number", a global "total threads", a global "total entries" and a global "offset of first entry". This is mostly just division with special care for rounding. Note that (to avoid global variables) you could pass a structure containing the details to each thread instead. No safeguards (e.g. locks, critical sections) will be needed for this data because threads only read it.
let each thread parse its section of the data in parallel; then wait for them all to finish (e.g. maybe "thread number 0" does "pthread_join()" if you don't want to keep the threads for later use).
You will probably also need to check that all values (parsed by all threads) are within an allowed range (to comply with the file format specification); and have some kind of error handling for when they don't (e.g. when the file is corrupt or has been maliciously tampered with). This could be as simple as a (global, atomically incremented) "number of dodgy values found so far" counter; which could allow you to display an "N dodgy values found" error message after all values are parsed.
Note 1: If you don't want to use a memory mapped file (or can't); you can have one "file reader thread" and multiple "file parser threads". This takes a lot more synchronization (it devolves into a FIFO queue with flow control - e.g. with provider thread doing some kind of "while queue full { wait }" and consumer threads doing some kind of "while queue empty { wait }"). This extra synchronization will increase overhead and make it slower (in addition to being more complex), compared to using memory mapped files.
Note 2: If the file's data isn't cached by the operating system's file data cache, then you'll probably be bottlenecked by file IO regardless of what you do and using multiple threads probably won't help performance for that case.

Save and restart random chain (drand48) from checkpoint in C

I'm trying to write a program that gives the same result either if is executed entirely or if is stopped and restarted from some checkpoint. To do that I need to be able to repeat exactly the same random number sequence in any scenario. So, here a piece of code where I tried to do that, but of course, I'm not successful. Could you help me to fix this code?
int main(){
int i;
long int seed;
// Initial seed
srand48(3);
// Print 5 random numbers
for(i=0;i<5;i++) printf("%d %f\n",i,drand48());
// CHECKPOINT: HOW TO PROPERLY SET seed?
seed=mrand48(); // <--- FIXME
// 5 numbers more
for(i=5;i<10;i++) printf("%d %f\n",i,drand48());
// Restart from the CHECKPOINT.
srand48(seed);
// Last 5 numbers again
for(i=5;i<10;i++) printf("%d %f\n",i,drand48());
}

If you need to be able to resume the random number sequence, you can't let the drand48() package hide the seed values from you, so you need to use different functions from the package. Specifically, you should be calling:
double erand48(unsigned short xsubi[3]);
instead of:
double drand48(void);
and you'll keep an array of 3 unsigned short values around, and at each checkpoint, you'll record their values as part of the state. If you need to resume where things left off, you'll restore the values from the saved state into your array, and then go on your merry way.
This is also how you write library code that neither interferes with other code using the random number generators nor is interfered with by other code using the random number generators.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
unsigned short seed[3] = { 0, 0, 3 };
// Print 5 random numbers
for (int i = 0; i < 5; i++)
printf("%d %f\n", i, erand48(seed));
// CHECKPOINT
unsigned short saved[3];
memmove(saved, seed, sizeof(seed));
// 5 numbers more
for (int i = 5; i < 10; i++)
printf("%d %f\n", i, erand48(seed));
// Restart from the CHECKPOINT.
memmove(seed, saved, sizeof(seed));
// Last 5 numbers again
for (int i = 5; i < 10; i++)
printf("%d %f\n", i, erand48(seed));
return 0;
}
Example run:
0 0.700302
1 0.122979
2 0.346792
3 0.290702
4 0.617395
5 0.059760
6 0.783933
7 0.352009
8 0.734377
9 0.124767
5 0.059760
6 0.783933
7 0.352009
8 0.734377
9 0.124767
Clearly, how you set the seed array initially is entirely up to you. You can easily allow the user to specify the seed value, and report the seed you're using so that they can do so. You might use some elements from the PID or the time of day and the sub-seconds component as a default seed, for example. Or you could access a random number device such as /dev/urandom and obtain 6 bytes of random value from that to use as the seed.
How can I allow the user to specify the seed value using only a long int? In this approach, it seems that the user need to define 3 numbers but I would like to ask only 1 number (like a safe prime) in the input file.
You can take a single number and split it up in any way you choose. I have a program that takes option -s to print the random seed, -S to set the seed from a long, and that sometimes splits the long into 3 unsigned short values when using a random Gaussian distribution generator. I mostly work on 64-bit systems, so I simply split the long into three 16-bit components; the code also compiles safely under 32-bit systems but leaves the third number in the seed as 0. Like this:
case 'q':
qflag = true;
break;
case 'r':
check_range(optarg, &min, &max);
perturber = ptb_uniform;
break;
case 's':
sflag = true;
break;
case 't':
delim = optarg;
break;
case 'S':
seed = strtol(optarg, 0, 0);
break;
case 'V':
err_version("PERTURB", &"#(#)$Revision: 1.6 $ ($Date: 2015/08/06 05:05:21 $)"[4]);
/*NOTREACHED*/
default:
err_usage(usestr);
/*NOTREACHED*/
}
}
if (sflag)
printf("Seed: %ld\n", seed);
if (gflag)
{
unsigned short g_seed[3] = { 0, 0, 0 };
g_seed[0] = (unsigned short)(seed & 0xFFFF);
g_seed[2] = (unsigned short)((seed >> 16) & 0xFFFF);
if (sizeof(seed) > 4)
{
/* Avoid 32-bit right shift on 32-bit platform */
g_seed[1] = (unsigned short)(((seed >> 31) >> 1) & 0xFFFF);
}
gaussian_init(&g_control, g_seed);
}
else
srand48(seed);
filter_anon(argc, argv, optind, perturb);
return 0;
}
For my purposes, it is OK (not ideal, but OK) to have the even more restricted seeding values for 32-bit. Yes, I could use unsigned long long and strtoull() etc instead, to get 64-bit numbers even on a 32-bit platform (though I'd have to convert that to a long to satisfy srand48() anyway. An alternative that I considered is to accept an argument -S xxxx:yyyy:zzzz with the three seed components set separately. I'd then have to modify the seed printing code as well as the parsing code. I use a separate program randseed to read numbers from /dev/urandom and format the result so it can be passed to programs which need a random seed:
$ randseed -b 8
0xF45820D2895B88CE
$

Create an array of values from different text files in C

I'm working in C on 64-bit Ubuntu 14.04.
I have a number of .txt files, each containing lines of floating point values (1 value per line). The lines represent parts of a complex sample, and they're stored as real(a1) \n imag(a1) \n real(a2) \n imag(a2), if that makes sense.
In a specific scenario there are 4 text files each containing 32768 samples (thus 65536 values), but I need to make the final version dynamic to accommodate up to 32 files (the maximum samples per file would not exceed 32768 though). I'll only be reading the first 19800 samples (depending on other things) though, since the entire signal is contained in those 39600 points (19800 samples).
A common abstraction is to represent the files / samples as a matrix, where columns represent return signals and rows represent the value of each signal at a sampling instant, up until the maximum duration.
What I'm trying to do is take the first sample from each return signal and move it into an array of double-precision floating point values to do some work on, move on to the second sample for each signal (which will overwrite the previous array) and do some work on them, and so forth, until the last row of samples have been processed.
Is there a way in which I can dynamically open files for each signal (depending on the number of pulses I'm using in that particular instance), read the first sample from each file into a buffer and ship that off to be processed. On the next iteration, the file pointers will all be aligned to the second sample, it would then move those into an array and ship it off again, until the desired amount of samples (19800 in our hypothetical case) has been reached.
I can read samples just fine from the files using fscanf:
rx_length = 19800;
int x;
float buf;
double *range_samples = calloc(num_pulses, 2 * sizeof(range_samples));
for (i=0; i < 2 * rx_length; i++){
x = fscanf(pulse_file, "%f", &buf);
*(range_samples) = buf;
}
All that needs to happen (in my mind) is that I need to cycle both sample# and pulse# (in that order), so when finished with one pulse it would move on to the next set of samples for the next pulse, and so forth. What I don't know how to do is to somehow declare file pointers for all return signal files, when the number of them can vary inbetween calls (e.g. do the whole thing for 4 pulses, and on the next call it can be 16 or 64).
If there are any ideas / comments / suggestions I would love to hear them.
Thanks.

I would make the code you posted a function that takes an array of file names as an argument:
void doPulse( const char **file_names, const int size )
{
FILE *file = 0;
// declare your other variables
for ( int i = 0; i < size; ++i )
{
file = fopen( file_names[i] );
// make sure file is open
// do the work on that file
fclose( file );
file = 0;
}
}

What you need is a generator. It would be reasonably easy in C++, but as you tagged C, I can imagine a function, taking a custom struct (the state of the object) as parameter. It could be something like (pseudo code) :
struct GtorState {
char *files[];
int filesIndex;
FILE *currentFile;
};
void gtorInit(GtorState *state, char **files) {
// loads the array of file into state, set index to 0, and open first file
}
int nextValue(GtorState *state, double *real, double *imag) {
// read 2 values from currentFile and affect them to real and imag
// if eof, close currentFile and open files[++currentIndex]
// if real and imag were found returns 0, else 1 if eof on last file, 2 if error
}
Then you main program could contain :
GtorState state;
// initialize the list of files to process
gtorInit(&state, files);
double real, imag);
int cr;
while (0 == (cr = nextValue(&state, &real, &imag)) {
// process (real, imag)
}
if (cr == 2) {
// process (at least display) error
}
Alternatively, your main program could iterate the values of the different files and call a function with state analog of the above generator that processes the values, and at the end uses the state of the processing function to get the results.

Tried a slightly different approach and it's working really well.
In stead of reading from the different files each time I want to do something, I read the entire contents of each file into a 2D array range_phase_data[sample_number][pulse_number], and then access different parts of the array depending on which range bin I'm currently working on.
Here's an excerpt:
#define REAL(z,i) ((z)[2*(i)])
#define IMAG(z,i) ((z)[2*(i)+1])
for (i=0; i<rx_length; i++){
printf("\t[%s] Range bin %i. Samples %i to %i.\n", __FUNCTION__, i, 2*i, 2*i+1);
for (j=0; j<num_pulses; j++){
REAL(fft_buf, j) = range_phase_data[2*i][j];
IMAG(fft_buf, j) = range_phase_data[2*i+1][j];
}
printf("\t[%s] Range bin %i done, ready to FFT.\n", __FUNCTION__, i);
// do stuff with the data
}
This alleviates the need to dynamically allocate file pointers and in stead just opens the files one at a time and writes the data to the corresponding column in the matrix.
Cheers.

Map Creation Size Error

So, I'm working at inventing my own tile map creation and I got a problem on size. The maximum size (which I did not set) is <700x700, anything higher makes it crash. First, I thought it's something I got wrong when making the "presentation version" which outputs the result on screen -> ScreenShot, but now I just finished making it more compact and tried using 800x800 and it still has the 7 limit, but I have no idea why. Since the code isn't that big I will show it here. If you have some tips I don't mind taking them.
#include <iostream>
#include <string.h>
#include <fstream>
#include <ctime>
#include <cstdlib>
#include <SFML/Graphics.hpp>
#include <SFML/Audio.hpp>
#define _WIN32_WINNT 0x0501
#include <windows.h>
using namespace std;
int main()
{
sf::Vector2i Size;
int Points,rands,PointsCheck=1,x,y,RandX,RandY,CurrentNumber=1;
srand(time(0));
bool Done=false,Expanded,Border;
ofstream Out("txt.txt");
/***/
cout << "Size X-Y = "; cin >> Size.x >> Size.y;cout << endl;
cout << "MAX Points - " << (Size.x*Size.y)/10 << endl;
cout << "Number of POINTS = ";cin >> Points ;cout << endl;
/***/
int PixelMap[Size.x+1][Size.y+1];
/***/
for (x=1;x<=Size.x;x++) for (y=1;y<=Size.y;y++) PixelMap[x][y]=0;
/***/
while(PointsCheck<=Points)
{
rands=1+(rand()%10);
RandX=1+(rand()%(Size.x));RandY=1+(rand()%(Size.y));
if (rands==1 && PointsCheck<=Points && PixelMap[RandX][RandY]==0)
{PixelMap[RandX][RandY]=CurrentNumber;CurrentNumber+=2;PointsCheck++;}
}
/***/
while(Done==false)
{
Done=true;
for(x=1;x<=Size.x;x++)
for(y=1;y<=Size.y;y++)
if(PixelMap[x][y]%2!=0 && PixelMap[x][y]!=-1)
{
if (PixelMap[x+1][y]==0) PixelMap[x+1][y]=PixelMap[x][y]+1;
if (PixelMap[x-1][y]==0) PixelMap[x-1][y]=PixelMap[x][y]+1;
if (PixelMap[x][y+1]==0) PixelMap[x][y+1]=PixelMap[x][y]+1;
if (PixelMap[x][y-1]==0) PixelMap[x][y-1]=PixelMap[x][y]+1;
}
for(x=1;x<=Size.x;x++)
for(y=1;y<=Size.y;y++)
if(PixelMap[x][y]!=0 && PixelMap[x][y]%2==0) {PixelMap[x][y]--;Done=false;}
}
for(x=1;x<=Size.x;x++){
for(y=1;y<=Size.y;y++)
{Out << PixelMap[x][y] << " ";}Out << endl;}
//ShowWindow (GetConsoleWindow(), SW_HIDE);
}

What you have here is the concept from which this site gets its name. You have a stack overflow:
int PixelMap[Size.x+1][Size.y+1];
If you want to allocate a large amount of memory, you need to do it dynamically (on the heap).
You can do this any number of ways. Since you are using C++, I recommend using a std::vector. The only trick is making the array 2-dimensional. Usually this is done in the same way as the one you allocated on the stack, except you don't get language syntax to help you:
vector<int> PixelMap( (Size.x+1) * (Size.y+1) );
Above, you'll need to calculate the linear index from the row/column. Something like:
int someval = PixelMap[ row * (size.y+1) + column ];
If you really want to use the [row][column] indexing syntax, you can either make a vector-of-vectors (not recommended), or you can index your rows:
vector<int> PixelMapData( (Size.x+1) * (Size.y+1) );
vector<int*> PixelMap( Size.x+1 );
PixelMap[0] = &PixelMapData[0];
for( int i = 0; i < Size.x+1; i++ ) {
PixelMap[i+1] = PixelMap[i] + Size.y + 1;
}
Now you can index in 2D:
int someval = PixelMap[row][col];

There's a couple of problems with your code:
First off:
int PixelMap[Size.x+1][Size.y+1];
for (x=1;x<=Size.x;x++)
for (y=1;y<=Size.y;y++)
PixelMap[x][y]=0;
In the above snipped you are never setting the value of PixelMap[0][0], or PixelMap0, etc. Basically those values will be undefined. Arrays in C++ are 0 indexed so you need to be sure you address those. Also, why are you using Size.x+1 and Size.y+1? Something feels wrong about that.
A better loop would be:
int PixelMap[Size.x][Size.y];
for (x=0;x<Size.x;x++)
for (y=0;y<Size.y;y++)
PixelMap[x][y]=0;
Second, this next bit of code is illegible:
while(PointsCheck<=Points)
{
rands=1+(rand()%10);
RandX=1+(rand()%(Size.x));
RandY=1+(rand()%(Size.y));
if (rands==1 && PointsCheck<=Points && PixelMap[RandX][RandY]==0)
{
PixelMap[RandX][RandY]=CurrentNumber;
CurrentNumber+=2;
PointsCheck++;
}
}
You're only incrementing PointsCheck if
PointsCheck <= Points
Why? You test for this to be true in your while condition. PointsCheck doesn't get incremented anywhere before this test.
rands is never guaranteed to be equal to 1 by the way, so your loop could go on for eternity (though unlikely).
The next loop suffers from similar problems as above:
while(Done==false)
{
Done=true;
What's the reason for this? You never break out of the while loop, and you never set Done to false, so the next block of code will only ever be executed once. remove this bit.
Your for-loops that follow should start at 0 and go while < Size(Size.x and Size.y)
for(x=0;x<Size.x;x++)
for(y=0;y<Size.y;y++)
Fix these issues first, and then if you still have a problem we can move on. And for all our sake, please use brackets {} to scope your for loops and if statements so that we can follow. Also, separate commands onto separate lines. It's a lot of work for us to follow more than one semicolon per line.
EDIT
Since you seem unwilling to fix these issues first:
This could be an issue with the amount of memory allocated on the stack for your program. If you're trying to create an array of 800x800 integers, then you're using 800*800*4 bytes = 2.4 MB of data. I know this is higher than visual studio's default limit of 1 MB, but since a 700x700 array uses 1.8 MB, then whatever program you're using has a higher default (or you set visual studio's higher, but not high enough).
See if you can set your limit to at least 3 MB. More is better, though. If this doesn't fix your scaling problem up to 800, then you have other issues.
EDIT2
I just noticed this:
sf::Vector2i Size;
//unimportant stuff
cin >> Size.x >> Size.y;
int PixelMap[Size.x+1][Size.y+1];
Vector2i will probably have default values for x and y. If you want to dynamically allocate more than what those are, you cannot statically say
PixelMap[Size.x][Size.y]
You need to dynamically allocate the array. I strongly suggest using something like a std::vector > for this
e.g.(untested code):
sf::Vector2i Size;
//unimportant stuff
cin >> Size.x >> Size.y;
std::vector<vector<int> > PixelMap;
//Initialize values to 0
for(size_t i=0; i < Size.x; ++i){
vector<int> nextVec;
for(size_t j=0; j < Size.y; ++j){
nextVec.push_back(0);
}
PixelMap.push_back(nextVec);
}

Not sure if this has anything to do with your crash (I would have added a comment, but I don't have the reputation), but here's a problem I noticed:
Your array indexing scheme is not consistent. Since you're using index 1 to indicate the first element, your bounds checking should look like this...
if (y!=1 && y!=Size.y && x!=1 && x!=Size.x && ...
...instead of this...
if (y!=0 && y!=Size.y && x!=0 && x!=Size.x && ...
[EDIT]
I just tried this:
...
cout << "asdf" << endl;
int PixelMap[Size.x+1][Size.y+1];
cout << "asdf" << endl;
...
and verified it's a stack overflow problem. So, as others mentioned above, allocate your pixel map on the heap and it should be fine.
BTW, this code...
int PixelMap[Size.x+1][Size.y+1];
is not standard C++. It's an extension some compilers provide, called 'variable length arrays'. Check this out for more info -> Why aren't variable-length arrays part of the C++ standard?
[/EDIT]

I'm working on my gEDA fork and want to get rid of the existing simple tile-based system1 in favour of a real spatial index2.
An algorithm that efficiently finds points is not enough: I need to find objects with non-zero extent. Think in terms of objects having bounding rectangles, that pretty much captures the level of detail I need in the index. Given a search rectangle, I need to be able to efficiently find all objects whose bounding rectangles are inside, or that intersect, the search rectangle.
The index can't be read-only: gschem is a schematic capture program, and the whole point of it is to move things around the schematic diagram. So things are going to be a'changing. So while I can afford insertion to be a bit more expensive than searching, it can't be too much more expensive, and deleting must also be both possible and reasonably cheap. But the most important requirement is the asymptotic behaviour: searching should be O(log n) if it can't be O(1). Insertion / deletion should preferably be O(log n), but O(n) would be okay. I definitely don't want anything > O(n) (per action; obviously O(n log n) is expected for an all-objects operation).
What are my options? I don't feel clever enough to evaluate the various options. Ideally there'd be some C library that will do all the clever stuff for me, but I'll mechanically implement an algorithm I may or may not fully understand if I have to. gEDA uses glib by the way, if that helps to make a recommendation.
Footnotes:
1 Standard gEDA divides a schematic diagram into a fixed number (currently 100) of "tiles" which serve to speed up searches for objects in a bounding rectangle. This is obviously good enough to make most schematics fast enough to search, but the way it's done causes other problems: far too many functions require a pointer to a de-facto global object. The tiles geometry is also fixed: it would be possible to defeat this tiling system completely simply by panning (and possibly zooming) to an area covered by only one tile.
2 A legitimate answer would be to keep elements of the tiling system, but to fix its weaknesses: teaching it to span the entire space, and to sub-divide when necessary. But I'd like others to add their two cents before I autocratically decide that this is the best way.

A nice data structure for a mix of points and lines would be an R-tree or one of its derivatives (e.g. R*-Tree or a Hilbert R-Tree). Given you want this index to be dynamic and serializable, I think using SQLite's R*-Tree module would be a reasonable approach.
If you can tolerate C++, libspatialindex has a mature and flexible R-tree implementation which supports dynamic inserts/deletes and serialization.

Your needs sound very similar to what is used in collision detection algorithms for games and physics simulations. There are several open source C++ libraries that handle this in 2-D (Box2D) or 3-D (Bullet physics). Although your question is for C, you may find their documentation and implementations useful.
Usually this is split into a two phases:
A fast broad phase that approximates objects by their axis-aligned bounding box (AABB), and determines pairs of AABBs that touch or overlap.
A slower narrow phase that calculates the points of geometric overlap for pairs of objects whose AABBs touch or overlap.
Physics engines also use spatial coherence to further reduce the pairs of objects that are compared, but this optimization probably won't help your application.
The broadphase is usually implemented with an O(N log N) algorithm like Sweep and prune. You may be able to accelerate this by using it in conjunction with the current tile approach (one of Nvidia's GPUGems describes this hybrid approach). The narrow phase is quite costly for each pair, and may be overkill for your needs. The GJK algorithm is often used for convex objects in this step, although faster algorithms exist for more specialized cases (e.g.: box/circle and box/sphere collisions).

This sounds to like an application well-suited to a quadtree (assuming you are interested only in 2D.) The quadtree is hierarchical (good for searching) and it's spatial resolution is dynamic (allowing higher resolution in areas that need it).
I've always rolled my own quadtrees, but here is a library that appears reasonable: http://www.codeproject.com/Articles/30535/A-Simple-QuadTree-Implementation-in-C

It is easy to do. It's hard to do fast. Sounds like a problem I worked on where there was a vast list of min,max values and given a value it had to return how many min,max pairs overlapped that value. You just have it in two dimensions. So you do it with two trees for each direction. Then do a intersection on the results. This is really fast.
#include <iostream>
#include <fstream>
#include <map>
using namespace std;
typedef unsigned int UInt;
class payLoad {
public:
UInt starts;
UInt finishes;
bool isStart;
bool isFinish;
payLoad ()
{
starts = 0;
finishes = 0;
isStart = false;
isFinish = false;
}
};
typedef map<UInt,payLoad> ExtentMap;
//==============================================================================
class Extents
{
ExtentMap myExtentMap;
public:
void ReadAndInsertExtents ( const char* fileName )
{
UInt start, finish;
ExtentMap::iterator EMStart;
ExtentMap::iterator EMFinish;
ifstream efile ( fileName);
cout << fileName << " filename" << endl;
while (!efile.eof()) {
efile >> start >> finish;
//cout << start << " start " << finish << " finish" << endl;
EMStart = myExtentMap.find(start);
if (EMStart==myExtentMap.end()) {
payLoad pay;
pay.isStart = true;
myExtentMap[start] = pay;
EMStart = myExtentMap.find(start);
}
EMFinish = myExtentMap.find(finish);
if (EMFinish==myExtentMap.end()) {
payLoad pay;
pay.isFinish = true;
myExtentMap[finish] = pay;
EMFinish = myExtentMap.find(finish);
}
EMStart->second.starts++;
EMFinish->second.finishes++;
EMStart->second.isStart = true;
EMFinish->second.isFinish = true;
// for (EMStart=myExtentMap.begin(); EMStart!=myExtentMap.end(); EMStart++)
// cout << "| key " << EMStart->first << " count " << EMStart->second.value << " S " << EMStart->second.isStart << " F " << EMStart->second.isFinish << endl;
}
efile.close();
UInt count = 0;
for (EMStart=myExtentMap.begin(); EMStart!=myExtentMap.end(); EMStart++)
{
count += EMStart->second.starts - EMStart->second.finishes;
EMStart->second.starts = count + EMStart->second.finishes;
}
// for (EMStart=myExtentMap.begin(); EMStart!=myExtentMap.end(); EMStart++)
// cout << "||| key " << EMStart->first << " count " << EMStart->second.starts << " S " << EMStart->second.isStart << " F " << EMStart->second.isFinish << endl;
}
void ReadAndCountNumbers ( const char* fileName )
{
UInt number, count;
ExtentMap::iterator EMStart;
ExtentMap::iterator EMTemp;
if (myExtentMap.empty()) return;
ifstream nfile ( fileName);
cout << fileName << " filename" << endl;
while (!nfile.eof())
{
count = 0;
nfile >> number;
//cout << number << " number ";
EMStart = myExtentMap.find(number);
EMTemp = myExtentMap.end();
if (EMStart==myExtentMap.end()) { // if we don't find the number then create one so we can find the nearest number.
payLoad pay;
myExtentMap[ number ] = pay;
EMStart = EMTemp = myExtentMap.find(number);
if ((EMStart!=myExtentMap.begin()) && (!EMStart->second.isStart))
{
EMStart--;
}
}
if (EMStart->first < number) {
while (!EMStart->second.isFinish) {
//cout << "stepped through looking for end - key" << EMStart->first << endl;
EMStart++;
}
if (EMStart->first >= number) {
count = EMStart->second.starts;
//cout << "found " << count << endl;
}
}
else if (EMStart->first==number) {
count = EMStart->second.starts;
}
cout << count << endl;
//cout << "| count " << count << " key " << EMStart->first << " S " << EMStart->second.isStart << " F " << EMStart->second.isFinish<< " V " << EMStart->second.value << endl;
if (EMTemp != myExtentMap.end())
{
myExtentMap.erase(EMTemp->first);
}
}
nfile.close();
}
};
//==============================================================================
int main (int argc, char* argv[]) {
Extents exts;
exts.ReadAndInsertExtents ( "..//..//extents.txt" );
exts.ReadAndCountNumbers ( "..//../numbers.txt" );
return 0;
}
the extents test file was 1.5mb of:
0 200000
1 199999
2 199998
3 199997
4 199996
5 199995
....
99995 100005
99996 100004
99997 100003
99998 100002
99999 100001
The numbers file was like:
102731
104279
109316
104859
102165
105762
101464
100755
101068
108442
107777
101193
104299
107080
100958
.....
Even reading the two files from disk, extents were 1.5mb and numbers were 780k and the really large number of values and lookups, this runs in a fraction of a second. If in memory it would lightning quick.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight