I´m currently writing a software in Ansi-C and are struggling to get one of the basic functionality to work.
The software will receive messages over a CAN-network and when these messages arrive, I need to make sure that they are delivered before a expected time and after the previous message.
Only unsigned variables are allowed to be used, so there will be problems with wrap around when the timers reach their maximum value (255 in my case).
It is easy to verify that messages arrive before the expected time, since I know the maximum time between two messages.
This example handles wrap around and discovers messages that are late:
UC_8 arrival = 250;
UC_8 expected = 15;
UC_8 maxInterArrTime = 30;
result = expected - arrival;
if(result <= maxInterArrTime){
// ON TIME!
}
else{
// DELAYED
}
This is the easy part, but I must also check that the arrived message actually have arrived after the previous message. My problem is that I do not know how to solve this with the wrap around problem. I tried to mimic the solution that finds delayed messages, but without any luck.
UC_8 arrival = 10; // Wrapped around
UC_8 lastArrival = 250;
UC_8 expected = 15;
UC_8 maxInterArrTime = 30;
result = expected - arrival;
result2 = lastArrival - arrival; //Problem
if(result2 >= ???){ // How should I compare and with what?
//Message received after previous msg
if(result <= maxInterArrTime){
// ON TIME!
}
else{
// DELAYED
}
else{
//Message received before previous msg - ERROR
}
My problem is when the arrival time value is lower than the previous arrival time, but is actually "larger" since it has wrapped around. I guess I might need to do it i several steps.
Any suggestions how I can solve this? I need to keep the number of if-statements low, the code will be analysed for complexity and other stuff.
If you can GUARANTEE that the delay between packets will not be 256 or more then the following will account for wrap around
if (newerTime >= olderTime)
delay = newerTime - olderTime;
else
delay = 256 - olderTime + newerTime;
If you can't guarantee the delay is less than 256 then unwind is correct, and you can't do what you want to do.
Huh? You can't magically code your way around a case of missing information. If you only have 8-bit unsigned timestamps, then you will not be able to differentiate between something that happened 3 ticks ago, and something that happened 259 ticks ago, and so on.
Look into making larger (more bits) timestamps available.
If you can ensure that the absolute value of the time delta is less than 1/2 of the maximum measurable time span then you can determine the time delta.
int8_t delta_u8(uint8_t a, uint8_t b) {
int8_t delta = a - b;
return delta;
}
...
delta = delta_u8(newerTime, olderTime);
delay = abs( (int) delta ); // or you could have a byte version of abs, since
// I suspect that you may be doing embedded stuff
// and care about such things.
If you can ensure that time always move forward then you can do better. By time moving forward I mean that in your case newerTime is always greater than olderTime, regardless of how they numerically compare. In this case you can measure deltas up to the maximum measurable time span -- which really goes without saying.
uint8_t delta_i8(uint8_t a, uint8_t b) {
return a - b;
}
If you know that two events can't happen during the same tick you can do even better by 1. If you know that two events can't happen closer together than a certain amount of time then you can calculate deltas up to maximum time span representable by your time stamp + the amount of time that must be between events, but then you have to use a larger variable size to do the actual math.
All of these work because the values rap around when you do math on them. You can very easily think of this as turning one of your known time values into the new origin (0) and adjusting the other time value to match this shift.
Related
here's a piece of code of word2vec i've downloaded from google word2vec.c:
// Reduces the vocabulary by removing infrequent tokens
void ReduceVocab() {
int a, b = 0;
unsigned int hash;
for (a = 0; a < vocab_size; a++) if (vocab[a].cn > min_reduce) {
vocab[b].cn = vocab[a].cn;
vocab[b].word = vocab[a].word;
b++;
} else free(vocab[a].word);
vocab_size = b;
for (a = 0; a < vocab_hash_size; a++) vocab_hash[a] = -1;
for (a = 0; a < vocab_size; a++) {
// Hash will be re-computed, as it is not actual
hash = GetWordHash(vocab[a].word);
while (vocab_hash[hash] != -1) hash = (hash + 1) % vocab_hash_size;
vocab_hash[hash] = a;
}
fflush(stdout);
min_reduce++;
}
which is called in LearnVocabFromTrainFile function.
Assume min_reduce=5
So if the input file is not that good, I mean if a word say "hello" that appeared 4 times when ReduceVocab called, and the vocab will remove hello from itself.
Later, when ReduceVocab called again and luckly hello appeared 5 times.. and it seems ReduceVocab will remove hello again.
As in truth, hello appeared 9 times which should be in the vocab, but the code above removed it.
it takes not such matter as it seems the situation happens seldomly. Just wondering my analysis is right or i've missed something in the code.
Thanks for any advice.
A better URL for reviewing the relevant source is:
https://github.com/tmikolov/word2vec/blob/master/word2vec.c#L185
As I understand it, this is not a bug – just a compromise with non-intuitive effects.
This code uses an intentionally rough/approximate method of ensuring the number of tracked vocabulary terms never exceeds 0.7 * vocab_hash_size (21 million). Whenever the number of terms hits that high-water mark, all terms with fewer than min_reduce occurrences are discarded - & min_reduce is increased to take even more, next time.
(And in practice, this escalating-floor, along with the typical long-tail Zipfian distribution of word frequences, can mean that at each triggered ReduceVocab operation, most terms are discarded, bringing the total vocab size to something that's way smaller than 0.7 * vocab_hash_size.)
An unavoidable effect of discarding known counts, in an interim running fashion, is that counts after each discard are no longer complete & exact. The relative position of terms in the corpus can thus have a big effect on which terms are ReduceVocab-pruned - with terms that "just miss" the cutoff each time potentially having far more occurrences, in total, than the final min_reduce. And further, all final counts of less-frequent words might be incomplete, if the term's early occurrence counts didn't survive earlier ReduceVocab steps.
Still, this approach works to keep the vocabulary-survey from taking an arbitrary amount of RAM, and the imprecision in the tail of rarer word counts isn't too big of a concern in typical cases.
If you have the RAM & want to prevent this behavior, you could edit the source to make vocab_hash_size arbitrarily larger, so that either ReduceVocab() is never triggered (and thus your final counts are exact), or happens rarely enough that any words it affects don't concern you.
I am designing a measurement instrument that has a visible user output on a 30-LEDs bar. The program logic acts in this fashion (pseudo-code)
while(1)
{
(1)Sensor_Read();
(2)Transform_counts_into_leds();
(3)Send_to_bar();
{
The relevant function (2) is a simple algorithm that transforms the counts from the I2C sensor to a value serially sent to shift-registers that control the single leds.
The variable sent to function (3) is simply the number of LEDs that have to stay on (0 for al LEDs off, 30 for all LEDs on)
uint8_t Transform_counts_into_leds(uint16_t counts)
{
float on_leds;
on_leds = (uint8_t)(counts * 0.134); /*0.134 is a dummy value*/
return on_leds;
}
using this program logic when counts value is on the threshold between two LEDs, the next led flickers
I think this is a bad user experience for my device and I want the LEDs, once lit, to stay stable in a small range of values.
QUESTION: How a solution to this problem could be implemented in my project?
Hysteresis is useful for a number of applications, but I would suggest not appropriate in this instance. The problem is that if the level genuinely falls from say 8 to 7 for example you would not see any change until at least one sample at 6 and it would jump to 6 and there would have to be a sample of 8 before it went back to 7.
A more appropriate solution in the case is a moving average, although it is simpler and more useful to use a moving sum and use the higher resolution that gives. For example a moving-sum of 16 effectively adds (almost) 4 bits of resolution, making a 8 bit sensor effectively 12 bit - at the cost of bandwidth of course; you don't get something for nothing. In this case lower bandwidth (i.e. less responsive to higher frequencies is exactly what you need)
Moving sum:
#define BUFFER_LEN 16 ;
#define SUM_MAX (255 * BUFFER_LEN)
#define LED_MAX 30
uint8_t buffer[BUFFER_LEN] = {0} ;
int index = 0 ;
uint16_t sum = 0 ;
for(;;)
{
uint8_t sample = Sensor_Read() ;
// Maintain sum of buffered values by
// subtracting oldest buffered value and
// adding the new sample
sum -= buffer[index] ;
sum += sample ;
// Replace oldest sample with new sample
// and increment index to next oldest sample
buffer[index] = sample ;
index = (index + 1) % BUFFER_LEN ;
// Transform to LED bar level
int led_level = (LED_MAX * sum) / SUM_MAX ;
// Show level
setLedBar( led_level ) ;
}
The underlying problem -- displaying sensor data in a human-friendly way -- is very interesting. Here's my approach in pseudocode:
Loop:
Read sensor
If sensor outside valid range:
Enable warning LED
Sleep in a low-power state for a while
Restart loop
Else:
Disable warning LED
Filter sensor value
Compute display value from sensor value with extra precision:
If new display value differs sufficiently from current value:
Update current displayed value
Update display with scaled-down display value
Filtering deals with noise in the measurements. Filtering smoothes out any sudden changes in the measurement, removing sudden spikes. It is like erosion, turning sharp and jagged mountains into rolling fells and hills.
Hysteresis hides small changes, but does not otherwise filter the results. Hysteresis won't affect noisy or jagged data, it only hides small changes.
Thus, the two are separate, but complementary methods, that affect the readout in different ways.
Below, I shall describe two different filters, and two variants of simple hysteresis implementation suitable for numeric and bar graph displays.
If possible, I'd recommend you write some scripts or test programs that output the input data and the variously filtered output data, and plot it in your favourite plotting program (mine is Gnuplot). Or, better yet, experiment! Nothing beats practical experiments for human interface stuff (at least if you use existing suggestions and known theory as your basis, and leap forward from there).
Moving average:
You create an array of N sensor readings, updating them in a round-robin fashion, and using their average as the current reading. This produces very nice (as in human-friendly, intuitive) results, as only the N latest sensor readings affect the average.
When the application is first started, you should copy the very first reading into all N entries in the averaging array. For example:
#define SENSOR_READINGS 32
int sensor_reading[SENSOR_READINGS];
int sensor_reading_index;
void sensor_init(const int reading)
{
int i;
for (i = 0; i < SENSOR_READINGS; i++)
sensor_reading[i] = reading;
sensor_reading_index = 0;
}
int sensor_update(const int reading)
{
int i, sum;
sensor_reading_index = (sensor_reading_index + 1) % SENSOR_READINGS;
sensor_reading[sensor_reading_index] = reading;
sum = sensor_reading[0];
for (i = 1; i < SENSOR_READINGS; i++)
sum += sensor_reading[i];
return sum / SENSOR_READINGS;
}
At start-up, you call sensor_init() with the very first valid sensor reading, and sensor_update() with the following sensor readings. The sensor_update() will return the filtered result.
The above works best when the sensor is regularly polled, and SENSOR_READINGS can be chosen large enough to properly filter out any unwanted noise in the sensor readings. Of course, the array requires RAM, which may be on short supply in some microcontrollers.
Exponential smoothing:
When there is not enough RAM to use a moving average to filter data, an exponential smoothing filter is often applied.
The idea is that we keep an average value, and recalculate the average using each new sensor reading using (A * average + B * reading) / (A + B). The effect of each sensor reading on the average decays exponentially: the weight of the most current sensor reading is always B/(A+B), the weight of the previous one is A*B/(A+B)^2, the weight of the one before that is A^2*B/(A+B)^3, and so on (^ indicating exponentiation); the weight of the n'th sensor reading in the past (with current one being n=0) is A^n*B/(A+B)^(n+1).
The code corresponding to the previous filter is now
#define SENSOR_AVERAGE_WEIGHT 31
#define SENSOR_CURRENT_WEIGHT 1
int sensor_reading;
void sensor_init(const int reading)
{
sensor_reading = reading;
}
int sensor_update(const int reading)
return sensor_reading = (sensor_reading * SENSOR_AVERAGE_WEIGHT +
reading * SENSOR_CURRENT_WEIGHT) /
(SENSOR_AVERAGE_WEIGHT + SENSOR_CURRENT_WEIGHT);
}
Note that if you choose the weights so that their sum is a power of two, most compilers optimize the division into a simple bit shift.
Applying hysteresis:
(This section, including example code, edited on 2016-12-22 for clarity.)
Proper hysteresis support involves having the displayed value in higher precision than is used for output. Otherwise, your output value with hysteresis applied will never change by a single unit, which I would consider a bad design in an user interface. (I'd much prefer a value to flicker between two consecutive values every few seconds, to be honest -- and that's what I see in e.g. the weather stations I like best with good temperature sensors.)
There are two typical variants in how hysteresis is applied to readouts: fixed, and dynamic. Fixed hysteresis means that the displayed value is updated whenever the value differs by a fixed limit; dynamic means the limits are set dynamically. (The dynamic hysteresis is much rarer, but it may be very useful when coupled with the moving average; one can use the standard deviation (or error bars) to set the hysteresis limits, or set asymmetric limits depending on whether the new value is smaller or greater than the previous one.)
The fixed hysteresis is very simple to implement. First, because we need to apply the hysteresis to a higher-precision value than the output, we choose a suitable multiplier. That is, display_value = value / DISPLAY_MULTIPLIER, where value is the possibly filtered sensor value, and display_value is the integer value displayed (number of bars lit, for example).
Note that below, display_value and the value returned by the functions, refer to the integer value displayed, for example the number of lit LED bars. value is the (possibly filtered) sensor reading, and saved_value containing the sensor reading that is currently displayed.
#define DISPLAY_HYSTERESIS 10
#define DISPLAY_MULTIPLIER 32
int saved_value;
void display_init(const int value)
{
saved_value = value;
}
int display_update(const int value)
{
const int delta = value - saved_value;
if (delta < -DISPLAY_HYSTERESIS ||
delta > DISPLAY_HYSTERESIS)
saved_value = value;
return saved_value / DISPLAY_MULTIPLIER;
}
The delta is just the difference between the new sensor value, and the sensor value corresponding to the currently displayed value.
The effective hysteresis, in units of displayed value, is DISPLAY_HYSTERESIS/DISPLAY_MULTIPLIER = 10/32 = 0.3125 here. It means that the displayed value can be updated three times before a visible change is seen (if e.g. slowly decreasing or increasing; more if the value is just fluctuating, of course). This eliminates rapid flickering between two visible values (when the value is in the middle of two displayed values), but ensures the error of the reading is less than half display units (on average; half plus effective hysteresis in the worst case).
In a real life application, you usually use a more complete form return (saved_value * DISPLAY_SCALE + DISPLAY_OFFSET) / DISPLAY_MULTIPLIER, which scales the filtered sensor value by DISPLAY_SCALE/DISPLAY_MULTIPLIER and moves the zero point by DISPLAY_OFFSET/DISPLAY_MULTIPLIER, both evaluated at 1.0/DISPLAY_MULTIPLIER precision, but only using integer operations. However, for simplicity, I'll just assume that to derive the display value value, say the number of lit LED bars, you just divide the sensor value by DISPLAY_MULTIPLIER. In either case, the hysteresis is DISPLAY_HYSTERESIS/DISPLAY_MULTIPLIER of the output unit. Ratios of about 0.1 to 0.5 work fine; and the below test values, 10 and 32, yields 0.3125, which is about midway of the range of ratios that I believe work best.
Dynamic hysteresis is very similar to above:
#define DISPLAY_MULTIPLIER 32
int saved_value_below;
int saved_value;
int saved_value_above;
void display_init(const int value, const int below, const int above)
{
saved_value_below = below;
saved_value = value;
saved_value_above = above;
}
int display_update(const int value, const int below, const int above)
{
if (value < saved_value - saved_value_below ||
value > saved_value + saved_value_above) {
saved_value_below = below;
saved_value = value;
saved_value_above = above;
}
return saved_value / DISPLAY_MULTIPLIER;
}
Note that if DISPLAY_HYSTERESIS*2 <= DISPLAY_MULTIPLIER, the displayed value is always within a display unit of the actual (filtered) sensor value. In other words, hysteresis can easily deal with flickering, but it does not need to add much error to the displayed value.
In many practical cases the best amount of hysteresis applied depends on the amount of short-term variations in the sensor samples. This includes not only noise, but also the types of signals that are to be measured. A hysteresis of just 0.3 (relative to the output unit) is sufficient to completely eliminate the flicker when sensor readings flip the filtered sensor value between two consecutive integers that map to different integer ouputs, as it ensures that the filtered sensor value must change by at least 0.3 (in output display units) before it effects a change in the display.
The maximum error with hysteresis is half display units plus the current hysteresis. The half unit is the minimum error possible (since consecutive units are one unit apart, so when the true value is in the middle, either value shown is correct to within half a unit). With dynamic hysteresis, if you always start with some fixed hysteresis value when a reading changes enough, but when the reading is within the hysteresis, you instead just decrease the hysteresis (if greater than zero). This approach leads to a changing sensor value being tracked correctly (maximum error being half an unit plus the initial hysteresis), but a relatively static value being displayed as accurately as possible (at half an unit maximum error). I don't show an example of this, because it adds another tunable (how the hysteresis decays towards zero), and requires that you verify (calibrate) the sensor (including any filtering) first; otherwise it's like polishing a turd: possible, but not useful.
Also note that if you have 30 bars in the display, you actually have 31 states (zero bars, one bar, .., 30 bars), and thus the proper range for the value is 0 to 31*DISPLAY_MULTIPLIER - 1, inclusive.
I am trying to implement a low pass filter in OpenCL and the theory behind all this has me confused a bit. I have attached my code at the bottom after my explanation of the scenario.
First off, let me try to explain the whole scenario in point form.
For the input, we have a cos signal with a sample size, frequency (Frequency sample obtained by multiplying sample size with frequency) and a step size.
The value of at each step size is stored in an array with the frequency and step size multiplied to the function
This array is then passed into the kernel, which then will execute the low pass filter function.
Kernel returns an output array with the new filtered values.
The cos function is always returning a value from (-1,1), the only thing that modifies this value is the frequency. So it may repeat faster or slower depending on the frequency BUT it is always between (-1,1).
This is where I am confused, I am not sure how to apply a low pass filter to these values. Let say the cutoff was 100Hz for the filter. I can't just say:
if(array[i] > 100 ) { //delete or ignore this value. Else store in a array }
The reason this won't work is because the value of array[i] ranges from (-1,1). So how then would I apply this filter? What values am I going to compare?
From a physical perspective, I can see how it works, a capacitor and a resistor to calculate the cut-off frequency and send the input through the circuit. But programmatically, I do not see how I can implement this. I have seen many implementations of this on-line but the code wasn't documented enough to get a good understanding of what was going on.
Here is the code on my host side:
//Array to hold the information of signal
float *Array;
//Number of sampling points
int sampleSize = 100;
float h = 0;
//Signal Frequency in Hz
float signalFreq = 10;
//Number of points between 0 and max val (T_Sample)
float freqSample = sampleSize*signalFreq;
//Step = max value or T_Sample
float stepSize = 1.0 / freqSample;
//Allocate enough memory for the array
Array = (float*)malloc(sampleSize*sizeof(float));
//Populate the array with modified cosine
for (int i = 0; i < sampleSize; i++) {
Array[0] = cos(2*CL_M_PI*signalFreq*h);
h = h + stepSize;
printf("Value of current sample for cos is: %f \n", Array[0]);
}
My kernel is only as follows: (Obviously this is not the code for the filter, this is where I am confused).
__kernel void lowpass(__global int *Array, __local float *cutOffValue, __global int *Output) {
int idx = get_global_id(0);
Output[idx] = Array[idx];
};
I found this PDF that implements a lot of filters. Near the end of the document you can find a float implementation of the Low Pass Filter.
http://scholar.uwindsor.ca/cgi/viewcontent.cgi?article=6242&context=etd
In the filter implementation in that pdf, the compare data[j] to value. Also I have no idea what numItems or workItems is.
If someone can provide some insight on this that would be great. I have searched a lot of other examples on low pass filters but I just can't wrap my head around the implementation. I hope I made this question clear. Again, I know how/what the low pass filter does. I just have no idea as to what values I need to compare in order for the filtering to take place.
Found this question aswell:
Low Pass filter in C
I have a possible solution. What I am attempting will be a moving average fir (which I am told is the easiest form of a lowpass filter that one can implement).
What is required:
FIFO buffer
Coefficient values (I generated and obtained mine from matlab for a specific cut-off frequency)
Input and Output arrays for the program
I have not implemented this code wise, but I do understand how to use it on a theoretical level. I have created a diagram below to try and explain the process.
Essentially, from another input array values will be passed into the FIFO buffer one at a time. Every time a value is passed in, the kernel will do a multiplication across the FIFO buffer that has 'n' taps. Each tap has a coefficient value associated with it. So the input at a particular element gets multiplied with the coefficient value and all the values are then accumulated and stored in one element of the output buffer.
Note that the coefficients were generated in Matlab. I didn't know how else to grab these values. At first I was going to just use the coefficient of 1/n, but I am pretty sure that is just going to distort the values of the signal.
And that should do the trick, I am going to implement this in the code now, but if there is anything wrong with this theory feel free to correct it.
I have 3 functions that come down to the code below, that runs 800x800 times:
Each while loop below runs for exactly 800 times before iter1 == lim, so the duration was measured as it ran 800x800x800 (512 millions) times.
iter1, iter2 and lim are double pointers. They point to a large enough vector of double's.
sum is a double local variable.
s1 and s2 are local unsigned int's, both equal to 800.
First runs in 2.257 seconds:
while ( iter1 < lim )
{
sum += *iter1 * *iter2;
++iter1;
iter2 += s2;
}
Second runs in 7.364 seconds:
while ( iter1 < lim )
{
sum += *iter1 * *iter2;
iter1 += s1;
iter2 += s2;
}
Third runs in 1.355 seconds:
while ( iter1 < lim )
{
sum += *iter1 * *iter2;
++iter1;
++iter2;
}
If I remove the sum += *iter1 * *iter2; instruction from each of them, they all run in around 1.07 seconds.
If I remove the second multiplication and change the instruction to sum += *iter1;, the first and third run in 1.33 seconds, while the second runs in 1.46 seconds.
If I remove the other iterator, like this: sum += *iter2;, then the first and second run in around 2.2 seconds, while the third runs in 1.35 seconds.
Obviously, the performance drop is tied to the quantity added to iter1 and iter2. I am no expert in how the processor accesses memory and dereferences pointers, so I hope someone in the community knows more than me and is willing to shed some light on my problem.
If you need any information about the hardware I ran these tests on, or anything else that can prove helpful, feel free to ask in the comments.
EDIT: The problem is that the second function was slow, when compared to the others, and wanted to know if there is anything I can do to make it run faster, as it appeared to be doing similar things like the other 2.
EDIT 2: All the measurements were made in Release build
This is just a manifestation of data locality.
It takes less time to look at something at the next page of a book than something at the 800th next page. Try for yourself at home.
The performance difference has nothing to do with the iterators. The difference is in the extra cache misses from advancing through a large amount of data with greater than unit stride.
My guess is the resulting machine code, depending on the sofistication of your compiler/platform.
To retrieve a pointer value, the internal machine will utilize something like a LOAD instruction, lets call the fictional assembler code LD addr0.
addr0 refers to the address register that is used.
A lot of CPUs provide statements like LD addr0+ that increment the address after loading the stored value. Often, this additional increment does not lead to any extra cycles.
I worked with some compilers that could only generate the addr0+ statements if the address increment is done by the increment operator directly after or in the dereferenciation statement.
So the last one could be the example with the most efficient machine code.
It would be interesting if you could post the resulting assembler code of the compilation process for each of the example.
So I'm trying to do what I've said above. The user will enter a precision, such as 3 decimal places, and then using the trapezium rule, the program will keep adding strips on until the 3rd decimal place is no longer changing, and then stop and print the answer.
I'm not sure of the best way to approach this. Due to the function being sinusoidal, one period of 2PI will almost be 0. I feel like this way would be the best way of approaching the problem, but no idea of how to go about it. At the moment I'm checking the y value for each x value to see when that becomes less than the required precision, however it never really goes lower enough. At x = 10 million, for example, y = -0.0002, which is still relatively large for such a large x value.
for (int i = 1; i < 1000000000; i++)
{
sumFirstAndLast += func(z);
z += stripSize;
count++;
printf("%lf\n", func(z));
if(fabs(func(z))<lowestAddition/stripSize){
break;
}
}
So this above is what I'm trying to do currently. Where func is the function. The stripSize is set to 0.01, just something relatively small to make the areas of the trapeziums more accurate. sumFirstAndLast is the sum of the first and last values, set at 0.001 and 1000000. Just a small value and a large value.
As I mentioned, I "think" the best way to do this, would be to check the value of the integral over every 2PI, but once again not sure how to go about this. My current method gives me the correct answer if I take the precision part out, but as soon as I try to put a precision in, it gives a completely wrong answer.
For a non-periodic function that converges to zero you can (sort of) do a check of the function's value and compare to a minimum error value, but this doesn't work for a periodic function as you get an early exit before the integrand sum converges (as you've found out). For a non-periodic function you can simply check the change in the integrand sum on each iteration to a minimum error but that won't work here either.
Instead, you'll have to do like a few comments suggest to check for convergence relative to the period of the function, PI in this case (I found it works better than using 2*PI). To implement this do something like the following code (note I changed your sum to be the actual area instead of doing it at the end):
sumFirstAndLast = (0.5*func(a) + 0.5*func(b)) * stripSize;
double z = a + stripSize;
double CHECK_RANGE = 3.14159265359;
double NextCheck = CHECK_RANGE;
double LastCheckSum = 0;
double MinError = 0.0001;
for (int i = 1; i < 1000000000; i++)
{
sumFirstAndLast += func(z) * stripSize;
if (z >= NextCheck)
{
if (fabs(LastCheckSum - sumFirstAndLast ) < MinError) break;
NextCheck += CheckRange;
LastCheckSum = sumFirstAndLast;
}
z += stripSize;
count++;
}
This seems to work and give the result to the specified accuracy according to the value of MinError. There are probably other (better) ways to check for convergence when numerically integrating a periodic function. A quick Google search reveals this paper for example.
The integral of from 0 to infinity of cos(x)/sqrt(x), or sin(x)/sqrt(x) is well known to be sqrt(pi/2). So evaluating pi to any number of digits is easier problem. Newton did it by integrating a quarter circle to get the area = pi/4. The integrals are evaluated by the methods of complex analysis. They are done in may text books on the subject, and on one of my final exams in graduate school.