Related
I am measuring the latency of some operations.
There are many scenarios here.
The delay of each scene is roughly distributed in a small interval. For each scenario, I need to measure 500,000 times. Finally I want to output the delay value and its corresponding number of times.
My initial implementation was:
#define range 1000
int rec_array[range];
for (int i = 0; i < 500000; i++) {
int latency = measure_latency();
rec_array[latency]++;
}
for (int i = 0; i < range; i++) {
printf("%d %d\n", i, rec_array[i]);
}
But this approach was fine at first, but as the number of scenes grew, it became problematic.
The delay measured in each scene is concentrated in a small interval. So for most of the data in the rec_array array is 0.
Since each scene is different, the delay value is also different. Some delays are concentrated around 500, and I need to create an array with a length greater than 500. But some are concentrated around 5000, and I need to create an array with a length greater than 5000.
Due to the large number of scenes, I created too many arrays. For example I have ten scenes and I need to create ten rec_arrays. And I also set them to be different lengths.
Is there any efficient and convenient strategy? Since I am using C language, templates like vector cannot be used.
I considered linked lists. However, considering that the interval of the delay value distribution is uncertain, and how many certain delay values are uncertain, and when the same delay occurs, the timing value needs to be increased. It also doesn't seem very convenient.
I'm sorry, I just went out. Thank you for your help. I read the comments carefully. Here are some of my answers.
These data are mainly used to draw pictures,For example, this one below.
The comment area says that data seems small. The main reason why I thought about this problem is that according to the picture, only a few arrays are used each time, and the vast majority are 0. And there are many scenarios where I need to generate an array for each. I have referred to an open source implementation.
According to the comments, it seems that using arrays directly is a good solution, considering fast access. Thanks veru much!
A linked list is probably (and almost always) the least efficient way to store things – both slow as hell, and memory inefficient, since your values use less storage than your pointers. Linked lists are very rarely a good solution for anything that actually stores significant data. The only reason they're so prevalent is that C still has no proper containers, and they're easy wheels to
reinvent for every single C program you write.
#define range 1000
int rec_array[range];
So you're (probably! This depends on your compiler and where you write int rec_array[range];) storing rec_array on the stack, and it's large. (Actually, 4000 Bytes is not "large" by any modern computer's means, but still.) You should not be doing that; instead, this should be heap allocated, once, at initialization.
The solution is to allocate it:
/* SPDX-License-Identifier: LGPL-2.1+ */
/* Copyright Marcus Müller and others */
#include <stdlib.h>
#define N_RUNS 500000
/*
* Call as
* program maximum_latency
*/
unsigned int *run_benchmark(struct task_t task, unsigned int *latencies,
unsigned int *max_latency) {
for (unsigned int run = 0; run < N_RUNS; ++run) {
unsigned int latency = measure_latency();
if (latency >= *max_latency) {
latency = *max_latency - 1;
/*
* alternatively: use realloc to increase the size of the `latencies`,
* and update max_latency as well; that's basically what C++ std::vector
* does
*/
(latencies[latency])++;
}
}
return latencies;
}
int main(int argc, char **argv) {
// check argument
if (argc != 2) {
exit(127);
}
int maximum_latency_raw = atoi(argv[1]);
if (maximum_latency_raw <= 0) {
exit(126);
}
unsigned int maximum_latency = maximum_latency_raw;
/*
* note that the length does no longer have to be a constant
* if you're using calloc/malloc.
*/
unsigned int *latency_counters =
(unsigned int *)calloc(maximum_latency, sizeof(unsigned int));
for (; /* benchmark task in benchmark_tasks */;) {
run_benchmark(task, latency_counters, &maximum_latency);
print_benchmark_result(latency_counters, maximum_latency);
// clear our counters after each run!
memset(latency_counters, 0, maximum_latency * sizeof(unsigned int));
}
}
void print_benchmark_result(unsigned int *array, unsigned int length) {
for (unsigned int index = 0; index < length; ++index) {
printf("%d %d\n", i, rec_array[i]);
}
puts("============================\n");
}
Note especially the "alternatively: realloc" comment in the middle: realloc allows you to increase the size of your array:
unsigned int *run_benchmark(struct task_t task, unsigned int *latencies,
unsigned int *max_latency) {
for (unsigned int run = 0; run < N_RUNS; ++run) {
unsigned int latency = measure_latency();
if (latency >= *max_latency) {
// double the size!
latencies = (unsigned int *)realloc(latencies, (*max_latency) * 2 *
sizeof(unsigned int));
// realloc doesn't zero out the extension, so we need to do that
// ourselves.
memset(latencies + (*max_latency), 0, (*max_latency)*sizeof(unsigned int);
(*max_latency) *= 2;
(latencies[latency])++;
}
}
return latencies;
}
This way, your array grows when you need it to!
how about using a Hash table so we would only save the latency used and maybe the keys in the Hash table can be ranges while the values of said keys be the actual latency?
Just sacrifice some precision in your latencies like 0-15, 16-31, 32-47 ... etc. Now your array will be 16x smaller.
Allocate all latency counter arrays for all scenes in one go
unsigned int *latency_div16_counter = (unsigned int *)calloc((MAX_LATENCY >> 4) * NUM_OF_SCENES, sizeof(unsigned int));
Clamp the values to the max latency, div 16 and store
for (int scene = 0; scene < NUM_OF_SCENES; scene++) {
for (int i = 0; i < 500000; i++) {
int latency = measure_latency();
if(latency >= MAX_LATENCY) latency = MAX_LATENCY - 1;
latency = latency >> 4; // int div 16
latency_div16_counter[(scene * MAX_LATENCY) + latency]++;
}
}
Adjust the data (mul 16) before displaying it
for (int scene = 0; scene < NUM_OF_SCENES; scene++) {
for (int i = 0; i < (MAX_LATENCY >> 4); i++) {
printf("Scene %d Latency %d Total %d\n", scene, i * 16, latency_div16_counter[i]);
}
}
I have a program where I repeat a succession of methods to reproduce time evolution. One of the things I have to do is to write the same value for a long continue subset of elements of a very large array. Knowing which elements are and which value I want, is there any other way rather than doing a loop for setting these values each by each?
EDIT: To be clear, I want to avoid this:
double arr[10000000];
int i;
for (i=0; i<100000; ++i)
arr[i] = 1;
by just one single call if it is possible. Can you assign to a part of an array the values from another array of the same size? Maybe I could have in memory a second array arr2[1000000] with all elements 1 and then do something like copying the memory of arr2 to the first 100.000 elements of arr?
I have a somewhat tongue-in-cheek and non-portable possibility for you to consider. If you tailored your buffer to a size that is a power of 2, you could seed the buffer with a single double, then use memcpy to copy successively larger chunks of the buffer until the buffer is full.
So first you copy the first 8 bytes over the next 8 bytes...(so now you have 2 doubles)
...then you copy the first 16 bytes over the next 16 bytes...(so now you have 4 doubles)
...then you copy the first 32 bytes over the next 32 bytes...(so now you have 8 doubles)
...and so on.
It's plain to see that we won't actually call memcpy all that many times, and if the implementation of memcpy is sufficiently faster than a simple loop we'll see a benefit.
Try building and running this and tell me how it performs on your machine. It's a very scrappy proof of concept...
#include <string.h>
#include <time.h>
#include <stdio.h>
void loop_buffer_init(double* buffer, int buflen, double val)
{
for (int i = 0; i < buflen; i++)
{
buffer[i] = val;
}
}
void memcpy_buffer_init(double* buffer, int buflen, double val)
{
buffer[0] = val;
int half_buf_size = buflen * sizeof(double) / 2;
for (int i = sizeof(double); i <= half_buf_size; i += i)
{
memcpy((unsigned char *)buffer + i, buffer, i);
}
}
void check_success(double* buffer, int buflen, double expected_val)
{
for (int i = 0; i < buflen; i++)
{
if (buffer[i] != expected_val)
{
printf("But your whacky loop failed horribly.\n");
break;
}
}
}
int main()
{
const int TEST_REPS = 500;
const int BUFFER_SIZE = 16777216;
static double buffer[BUFFER_SIZE]; // 2**24 doubles, 128MB
time_t start_time;
time(&start_time);
printf("Normal loop starting...\n");
for (int reps = 0; reps < TEST_REPS; reps++)
{
loop_buffer_init(buffer, BUFFER_SIZE, 1.0);
}
time_t end_time;
time(&end_time);
printf("Normal loop finishing after %.f seconds\n",
difftime(end_time, start_time));
time(&start_time);
printf("Whacky loop starting...\n");
for (int reps = 0; reps < TEST_REPS; reps++)
{
memcpy_buffer_init(buffer, BUFFER_SIZE, 2.5);
}
time(&end_time);
printf("Whacky loop finishing after %.f seconds\n",
difftime(end_time, start_time));
check_success(buffer, BUFFER_SIZE, 2.5);
}
On my machine, the results were:
Normal loop starting...
Normal loop finishing after 21 seconds
Whacky loop starting...
Whacky loop finishing after 9 seconds
To work with a buffer that was less than a perfect power of 2 in size, just go as far as you can with the increasing powers of 2 and then fill out the remainder in one final memcpy.
(Edit: before anyone mentions it, of course this is pointless with a static double (might as well initialize it at compile time) but it'll work just as well with a nice fresh stretch of memory requested at runtime.)
It looks like this solution is very sensitive to your cache size or other hardware optimizations. On my old (circa 2009) laptop the memcpy solution is as slow or slower than the simple loop, until the buffer size drops below 1MB. Below 1MB or so the memcpy solution returns to being twice as fast.
I have a program where I repeat a succession of methods to reproduce
time evolution. One of the things I have to do is to write the same
value for a long continue subset of elements of a very large array.
Knowing which elements are and which value I want, is there any other
way rather than doing a loop for setting these values each by each?
In principle, you can initialize an array however you like without using a loop. If that array has static duration then that initialization might in fact be extremely efficient, as the initial value is stored in the executable image in one way or another.
Otherwise, you have a few options:
if the array elements are of a character type then you can use memset(). Very likely this involves a loop internally, but you won't have one literally in your own code.
if the representation of the value you want to set has all bytes equal, such as is the case for typical representations of 0 in any arithmetic type , then memset() is again a possibility.
as you suggested, if you have another array with suitable contents then you can copy some or all of it into the target array. For this you would use memcpy(), unless there is a chance that the source and destination could overlap, in which case you would want memmove().
more generally, you may be able to read in the data from some external source, such as a file (e.g. via fread()). Don't count on any I/O-based solution to be performant, however.
you can write an analog of memset() that is specific to the data type of the array. Such a function would likely need to use a loop of some form internally, but you could avoid such a loop in the caller.
you can write a macro that expands to the needed loop. This can be type-generic, so you don't need different versions for different data types. It uses a loop, but the loop would not appear literally in your source code at the point of use.
If you know in advance how many elements you want to set, then in principle, you could write that many assignment statements without looping. But I cannot imagine why you would want so badly to avoid looping that you would resort to this for a large number of elements.
All of those except the last actually do loop, however -- they just avoid cluttering your code with a loop construct at the point where you want to set the array elements. Some of them may also be clearer and more immediately understandable to human readers.
Hi: I have been ramping up on C and I have a couple philosophical questions based on arrays and pointers and how make things simple, quick, and small or balance the three at least, I suppose.
I imagine an MCU sampling an input every so often and storing the sample in an array, called "val", of size "NUM_TAPS". The index of 'val' gets decremented for the next sample after the current, so for instance if val[0] just got stored, the next value needs to go into val[NUM_TAPS-1].
At the end of the day I want to be able to refer to the newest sample as x[0] and the oldest sample as x[NUM_TAPS-1] (or equivalent).
It is a slightly different problem than many have solved on this and other forums describing rotating, circular, queue etc. buffers. I don't need (I think) a head and tail pointer because I always have NUM_TAPS data values. I only need to remap the indexes based on a "head pointer".
Below is the code I came up with. It seems to be working fine but it raises a few more questions I'd like to pose to the wider, much more expert community:
Is there a better way to assign indexes than a conditional assignment
(to wrap indexes < 0) with the modulus operator (to wrap indexes >
NUM_TAPS -1)? I can't think of a way that pointers to pointers would
help, but does anyone else have thoughts on this?
Instead of shifting the data itself as in a FIFO to organize the
values of x, I decided here to rotate the indexes. I would guess that
for data structures close to or smaller in size than the pointers
themselves that data moves might be the way to go but for very large
numbers (floats, etc.) perhaps the pointer assignment method is the
most efficient. Thoughts?
Is the modulus operator generally considered close in speed to
conditional statements? For example, which is generally faster?:
offset = (++offset)%N;
*OR**
offset++;
if (NUM_TAPS == offset) { offset = 0; }
Thank you!
#include <stdio.h>
#define NUM_TAPS 10
#define STARTING_VAL 0
#define HALF_PERIOD 3
void main (void) {
register int sample_offset = 0;
int wrap_offset = 0;
int val[NUM_TAPS];
int * pval;
int * x[NUM_TAPS];
int live_sample = 1;
//START WITH 0 IN EVERY LOCATION
pval = val; /* 1st address of val[] */
for (int i = 0; i < NUM_TAPS; i++) { *(pval + i) = STARTING_VAL ; }
//EVENT LOOP (SAMPLE A SQUARE WAVE EVERY PASS)
for (int loop = 0; loop < 30; loop++) {
if (0 == loop%HALF_PERIOD && loop > 0) {live_sample *= -1;}
*(pval + sample_offset) = live_sample; //really stupid square wave generator
//assign pointers in 'x' based on the starting offset:
for (int i = 0; i < NUM_TAPS; i++) { x[i] = pval+(sample_offset + i)%NUM_TAPS; }
//METHOD #1: dump the samples using pval:
//for (int i = 0; i < NUM_TAPS; i++) { printf("%3d ",*(pval+(sample_offset + i)%NUM_TAPS)); }
//printf("\n");
//METHOD #2: dump the samples using x:
for (int i = 0; i < NUM_TAPS; i++) { printf("%3d ",*x[i]); }
printf("\n");
sample_offset = (sample_offset - 1)%NUM_TAPS; //represents the next location of the sample to be stored, relative to pval
sample_offset = (sample_offset < 0 ? NUM_TAPS -1 : sample_offset); //wrap around if the sample_offset goes negative
}
}
The cost of a % operator is the about 26 clock cycles since it is implemented using the DIV instruction. An if statement is likely faster since the instructions will be present in the pipeline and so the process will skip a few instructions but it can do this quickly.
Note that both solutions are slow compared to doing a BITWISE AND operation which takes only 1 clock cycle. For reference, if you want gory detail, check out this chart for the various instruction costs (measured in CPU Clock ticks)
http://www.agner.org/optimize/instruction_tables.pdf
The best way to do a fast modulo on a buffer index is to use a power of 2 value for the number of buffers so then you can use the quick BITWISE AND operator instead.
#define NUM_TAPS 16
With a power of 2 value for the number of buffers, you can use a bitwise AND to implement modulo very efficiently. Recall that bitwise AND with a 1 leaves the bit unchanged, while bitwise AND with a 0 leaves the bit zero.
So by doing a bitwise AND of NUM_TAPS-1 with your incremented index, assuming that NUM_TAPS is 16, then it will cycle through the values 0,1,2,...,14,15,0,1,...
This works because NUM_TAPS-1 equals 15, which is 00001111b in binary. The bitwise AND resulst in a value where only that last 4 bits to be preserved, while any higher bits are zeroed.
So everywhere you use "% NUM_TAPS", you can replace it with "& (NUM_TAPS-1)". For example:
#define NUM_TAPS 16
...
//assign pointers in 'x' based on the starting offset:
for (int i = 0; i < NUM_TAPS; i++)
{ x[i] = pval+(sample_offset + i) & (NUM_TAPS-1); }
Here is your code modified to work with BITWISE AND, which is the fastest solution.
#include <stdio.h>
#define NUM_TAPS 16 // Use a POWER of 2 for speed, 16=2^4
#define MOD_MASK (NUM_TAPS-1) // Saves typing and makes code clearer
#define STARTING_VAL 0
#define HALF_PERIOD 3
void main (void) {
register int sample_offset = 0;
int wrap_offset = 0;
int val[NUM_TAPS];
int * pval;
int * x[NUM_TAPS];
int live_sample = 1;
//START WITH 0 IN EVERY LOCATION
pval = val; /* 1st address of val[] */
for (int i = 0; i < NUM_TAPS; i++) { *(pval + i) = STARTING_VAL ; }
//EVENT LOOP (SAMPLE A SQUARE WAVE EVERY PASS)
for (int loop = 0; loop < 30; loop++) {
if (0 == loop%HALF_PERIOD && loop > 0) {live_sample *= -1;}
*(pval + sample_offset) = live_sample; //really stupid square wave generator
//assign pointers in 'x' based on the starting offset:
for (int i = 0; i < NUM_TAPS; i++) { x[i] = pval+(sample_offset + i) & MOD_MASK; }
//METHOD #1: dump the samples using pval:
//for (int i = 0; i < NUM_TAPS; i++) { printf("%3d ",*(pval+(sample_offset + i) & MOD_MASK)); }
//printf("\n");
//METHOD #2: dump the samples using x:
for (int i = 0; i < NUM_TAPS; i++) { printf("%3d ",*x[i]); }
printf("\n");
// sample_offset = (sample_offset - 1)%NUM_TAPS; //represents the next location of the sample to be stored, relative to pval
// sample_offset = (sample_offset < 0 ? NUM_TAPS -1 : sample_offset); //wrap around if the sample_offset goes negative
// MOD_MASK works faster than the above
sample_offset = (sample_offset - 1) & MOD_MASK;
}
}
At the end of the day I want to be able to refer to the newest sample as x[0] and the oldest sample as x[NUM_TAPS-1] (or equivalent).
Any way you implement this is very expensive, because each time you record a new sample, you have to move all the other samples (or pointers to them, or an equivalent). Pointers don't really help you here. In fact, using pointers as you do is probably a little more costly than just working directly with the buffer.
My suggestion would be to give up the idea of "remapping" indices persistently, and instead do it only virtually, as needed. I'd probably ease that and ensure it is done consistently by writing data access macros to use in place of direct access to the buffer. For example,
// expands to an expression designating the sample at the specified
// (virtual) index
#define SAMPLE(index) (val[((index) + sample_offset) % NUM_TAPS])
You would then use SAMPLE(n) instead of x[n] to read the samples.
I might consider also providing a macro for adding new samples, such as
// Updates sample_offset and records the given sample at the new offset
#define RECORD_SAMPLE(sample) do { \
sample_offset = (sample_offset + NUM_TAPS - 1) % NUM_TAPS; \
val[sample_offset] = sample; \
} while (0)
With regard to your specific questions:
Is there a better way to assign indexes than a conditional assignment (to wrap indexes < 0) with the modulus operator (to wrap
indexes > NUM_TAPS -1)? I can't think of a way that pointers to
pointers would help, but does anyone else have thoughts on this?
I would choose modulus over a conditional every time. Do, however, watch out for taking the modulus of a negative number (see above for an example of how to avoid doing so); such a computation may not mean what you think it means. For example -1 % 2 == -1, because C specifies that (a/b)*b + a%b == a for any a and b such that the quotient is representable.
Instead of shifting the data itself as in a FIFO to organize the values of x, I decided here to rotate the indexes. I would guess that
for data structures close to or smaller in size than the pointers
themselves that data moves might be the way to go but for very large
numbers (floats, etc.) perhaps the pointer assignment method is the
most efficient. Thoughts?
But your implementation does not rotate the indices. Instead, it shifts pointers. Not only is this about as expensive as shifting the data themselves, but it also adds the cost of indirection for access to the data.
Additionally, you seem to have the impression that pointer representations are small compared to representations of other built-in data types. This is rarely the case. Pointers are usually among the largest of a given C implementation's built-in data types. In any event, neither shifting around the data nor shifting around pointers is efficient.
Is the modulus operator generally considered close in speed to conditional statements? For example, which is generally faster?:
On modern machines, the modulus operator is much faster on average than a conditional whose result is difficult for the CPU to predict. CPUs these days have long instruction pipelines, and they perform branch prediction and corresponding speculative computation to enable them to keep these full when a conditional instruction is encountered, but when they discover that they have predicted incorrectly, they need to flush the whole pipeline and redo several computations. When that happens, it's a lot more expensive than a small number of unconditional arithmetical operations.
Structure of the file and description of the system
The stream I want to analyze (a large binary file) is composed as follows:
40-bytes header
A stream of 10-bytes signals:
The first 8 bytes represent the time when the signal was registered
The last 2 bytes describe the channel where the signal was registered
The signal is emitted by a source which sends an impulse every SIGNAL_INTERVAL, and it may or may not be retrieved by the detector. If a detector counts, it sends the result to a counter's channel, which prints the count as shown above. The counter has 8 channels in total.
Multiplexing
In order to increase the number of detectors, a multiplexing approach is used. Two detectors send their counts to the same channel (say, detectors 1 and 9 are coupled on channel 1 of the counter). One of the signals (for example, 9) is delayed by DELAY, so that the delayed counts are shifted with respect to the non delayed ones.
Demultiplexing
The idea would be to divide the delayed data from the non delayed ones, then subtract the delay (adding 8 to the channel value so that a delayed count on channel 1 will be shown as a count on channel 9) and then rejoin the two arrays.
If SIGNAL_INTERVAL is constant, this is relatively easy: I define a "mask" [0, DELAY, SIGNAL_INTERVAL] and, taking a reference timestamp value, see where every count falls in the mask.
Trying different masks and counting which one gives the most counts, one can identify the delayed counts from the non delayed ones. This last part is done because we allow an error to the time count, so the stream will not be perfectly clustered. Moreover, it's impossible to know a priori if the first count is a delayed one, a non delayed one or even a casual count.
This is done channel by channel, as the channels may have a different response time from each other.
With this approach, the code is quite simple:
uint64_t maskCheck(struct count *data, int ch_num, int elements){
const int MAX_NUM = 27; // Maximum number of masks checked
int ref = 0; // The reference variable used as starting point at every cycle
uint64_t sing_count[2][MAX_NUM]; // The array containing the singles counts
uint64_t max_count; // Variable used to find the maximum in the array
int t = 0; // Time index for the following loop
uint64_t result = 0; // The final result, i.e. the index with the most singles counts
// Initializing max_count (it has MAX_NUM as length, so it must be initialized after being declared)
for(int i = 0; i < MAX_NUM; i++)sing_count[0][i] = 0;
for(int i = 0; i < MAX_NUM; i++)sing_count[1][i] = 0;
while(getChannel(data[t])!=ch_num){
t++;
if(t == elements - 1){
printf("%s\n", "Nothing found");
return 0;
}
}
if(getChannel(data[t]) == ch_num) ref = getTimestamp(data[t]);
uint64_t ref_indexed = ref;
for(int index = 0; index < MAX_NUM; index++){
sing_count[1][index] = ref + nsToBins(index) - nsToBins(MAX_NUM/2);
ref_indexed = sing_count[1][index];
for(t = 0; t < elements; t++){
// Skips the counts not occurring at ch_num
if(getChannel(data[t]) != ch_num) {
continue;
}
if(longAbs(getTimestamp(data[t]), ref_indexed) % nsToBins(SIGNAL_INTERVAL) <= nsToBins(MASK) + nsToBins(COUNT_ERROR) &&
longAbs(getTimestamp(data[t]), ref_indexed) % nsToBins(SIGNAL_INTERVAL) >= nsToBins(MASK) - nsToBins(COUNT_ERROR)){
sing_count[0][index]++;
}
else if(longAbs(getTimestamp(data[t]), ref_indexed) % nsToBins(SIGNAL_INTERVAL) <= nsToBins(COUNT_ERROR) ||
longAbs(getTimestamp(data[t]), ref_indexed) % nsToBins(SIGNAL_INTERVAL) >= nsToBins(SIGNAL_INTERVAL) - nsToBins(COUNT_ERROR)){
sing_count[0][index]++;
}
}
}
// This last part maximizes the array.
max_count = sing_count[0][0];
result = sing_count[1][0];
for(int i = 1; i < MAX_NUM; i++){
if(sing_count[0][i] > max_count)
{
max_count = sing_count[0][i];
result = sing_count[1][i];
}
}
where struct count is defined as a 10-byte array read by the functions getTimestamp() and getChannel(), and nsToBins() simply converts the time units.
Having the "best mask", I can divide the array through it and then perform all the other needed operations.
The problem
Now, here comes the problem. SIGNAL_INTERVAL is not constant, and it's not even well determined (to give you an idea, the frequency oscillates between 75.6 Mhz and 76.3 MHz).
The above approach turns out to be very unsuccessful now:
SIGNAL_INTERVAL has an error of about 0.3 ns
The measurement is performed over 30 seconds
Keeping in mind that the order of magnitude of SIGNAL_INTERVAL is of 10 ns, after just one second the error would be too big
This results in the timestamps being incorrectly divided, affecting all the subsequent operations.
What I had in mind was something to analyze the clusters in the data (SIGNAL_INTERVAL is not constant, but the oscillation is much smaller than DELAY so some clustering could in principle be observed) and find another way to divide the two arrays.
But so far I have nothing. Any help would be appreciated.
I have a toy cipher program which is encountering a bus error when given a very long key (I'm using 961168601842738797 to reproduce it), which perplexes me. When I commented out sections to isolate the error, I found it was being caused by this innocent-looking for loop in my Sieve of Eratosthenes.
unsigned long i;
int candidatePrimes[CANDIDATE_PRIMES];
// CANDIDATE_PRIMES is a macro which sets the length of the array to
// two less than the upper bound of the sieve. (2 being the first prime
// and the lower bound.)
for (i=0;i<CANDIDATE_PRIMES;i++)
{
printf("i: %d\n", i); // does not print; bus error occurs first
//candidatePrimes[i] = PRIME;
}
At times this has been a segmentation fault rather than a bus error.
Can anyone help me to understand what is happening and how I can fix it/avoid it in the future?
Thanks in advance!
PS
The full code is available here:
http://pastebin.com/GNEsg8eb
I would say your VLA is too large for your stack, leading to undefined behaviour.
Better to allocate the array dynamically:
int *candidatePrimes = malloc(CANDIDATE_PRIMES * sizeof(int));
And don't forget to free before returning.
If this is Eratosthenes Sieve, then the array is really just flags. It's wasteful to use int if it's just going to hold 0 or 1. At least use char (for speed), or condense to a bit array (for minimal storage).
The problem is that you're blowing the stack away.
unsigned long i;
int candidatePrimes[CANDIDATE_PRIMES];
If CANDIDATE_PRIMES is large, this alters the stack pointer by a massive amount. But it doesn't touch the memory, it just adjusts the stack pointer by a very large amount.
for (i=0;i<CANDIDATE_PRIMES;i++)
{
This adjusts "i" which is way back in the good area of the stack, and sets it to zero. Checks that it's < CANDIDATE_PRIMES, which it is, and so performs the first iteration.
printf("i: %d\n", i); // does not print; bus error occurs first
This attempts to put the parameters for "printf" onto the bottom of the stack. BOOM. Invalid memory location.
What value does CANDIDATE_PRIMES have?
And, do you actually want to store all the primes you're testing or only those that pass? What is the purpose of storing the values 0 thru CANDIDATE_PRIMES sequentially in an array???
If what you just wanted to store the primes, you should use a dynamic allocation and grow it as needed.
size_t g_numSlots = 0;
size_t g_numPrimes = 0;
unsigned long* g_primes = NULL;
void addPrime(unsigned long prime) {
unsigned long* newPrimes;
if (g_numPrimes >= g_numSlots) {
g_numSlots += 256;
newPrimes = realloc(g_primes, g_numSlots * sizeof(unsigned long));
if (newPrimes == NULL) {
die(gracefully);
}
g_primes = newPrimes;
}
g_primes[g_numPrimes++] = prime;
}