I have an MP3 board attached to a ATmega microcontroller which is additionally connected to a potentiometer. The MP3 board plays MP3 data directly through its onboard speaker and therefore I am also able to set the volume of the output.
So, as you might guess, I read the value from the poti and forward it to the microcontroller. Unfortunately, the microcontroller does not increase the volume in a linear way. So, from values 0 to 128 you nearly hear nothing, and from 128 to 255 (max) the volume increases rapidly.
I found out, that the following function could solve this problem:
vol = 1 - (1 - x)^4
but x must be between 0 and 1 and the result is also between 0 and 1.
Since I am on a microcontroller, I would like to
transform this formula, so that I can use it with unsigned integers
optimize it (maybe use some cheap binary functions), because I read the poti value multiple times per second. So this function has to be calculated multiple times per second and I want to use the microcontroller for other stuff too ;-)
Maybe some of you have an idea? Would be great!
uint8_t linearize_volume(uint8_t value) {
// ideas?
// please don't use bigger data types than uint16_t
}
You can "pay" with memory for CPU cycles. If you have 256 bytes of ROM available to you, the cheapest way of computing such function would be building a lookup table.
Make a program that prints a list of 256 8-bit numbers with the values of your non-linear function. It does not matter how fast the program is, because you are going to run it only once. Copy the numbers the program prints into your C program as an array initializer, and perform the lookup instead of calculating the function.
You can get a decent estimate by treating the values as 8.8 fixed-point and raising to the power of four by squaring twice.
uint8_t linearize_volume(uint8_t value) {
// Approximate 255 * (1 - (1 - x/255)^4)
uint16_t x = 0xff - value;
x = (x * x) >> 8;
x = (x * x) >> 8;
return 0xff - x;
}
First, be sure you're using a linear pot, not an audio-taper pot.
This is typical of audio outputs. The data is a sine wave, and therefore negative values are necessary. You can certainly convert negatives to positives for the sole purpose of accessing their power level, but you can't alter the sample without hearing a completely different sound.
Depending upon the output device, lower values may not pack enough power to energize your speaker much at all.
The "MP3 board" should include an ability to control the volume without your having to alter samples.
You state you read the pot and forward it to the micro. Aren't you reading the pot with the micro's ADC?
Related
I'm looking for a fast way in C to hash numbers 32-bit numbers more or less uniformly between 0 and 254. 255 is reserved for a special purpose.
As an added constraint, I'm looking for a method that would map well to being used with ISA-specific vector intrinsics or to a language like OpenCL or CUDA without introducing control flow divergence between the vector lanes/threads.
Ordinarily, I would just use the following code to hash the number between 0 and 255, as this is just a fast way of doing x mod 256.
inline uint8_t hash(uint32_t x){ return x & 255; }
I could just give in and use the following:
inline uint8_t hash(uint32_t x){ return x % 255; }
However, this solution seems unimaginative and unlikely to be the highest performing solution. I found code at this site (http://homepage.cs.uiowa.edu/~jones/bcd/mod.shtml#exmod15) that appears to provide a reasonable solution for scalar code and have inserted it here for your convenience.
uint32_t mod255( uint32_t a ) {
a = (a >> 16) + (a & 0xFFFF); /* sum base 2**16 digits */
a = (a >> 8) + (a & 0xFF); /* sum base 2**8 digits */
if (a < 255) return a;
if (a < (2 * 255)) return a - 255;
return a - (2 * 255);
}
I see two potential performance issues with this code:
The large number of if statements makes me question how easy it will be for a compiler or human :) to effectively vectorize the code without leading to control flow divergence within a warp/wavefront on a SIMT architecture or vectorized execution on a multicore CPU. If such divergence does occur, it will reduce parallel efficiency, as the divergent paths will have to be run in series.
It looks like it could be troublesome for a branch predictor (not applicable on common GPU architectures) as the code path that executes depends on the value of the input. Therefore, if there is a mix of small and large values interspersed with one another, this code will likely sacrifice some performance due to a moderate number of branch mispredictions.
Any recommendations on alternatives that I could use are most welcome. Alternatively, let me know if what I am asking for is unreasonable.
The "if statements on GPU kill performance" is a popular misconception which desperately wants to live on, it seems.
The large number of if statements makes me question how easy it will
be for a compiler or human :) to vectorize the code.
First of all I wouldn't consider 2 if statements a "large number of if statements", and those are so short and trivial that I'm willing to bet the compiler will turn them into branchless conditional moves or predicated instructions. There will be no performance penalty at all. (Do check the generated assembly, however).
It looks like it could be troublesome for a branch predictor as the code path that executes depends on the value of the input. Therefore, if there is a mix of small and large values interspersed with one another, this code will likely sacrifice some performance due to a moderate number of branch mispredictions.
Current GPUs do not have branch predictors. Note however that depending on the underlying hardware, operation on integers (and notably shifting) may be quite costly.
I would just do this:
uchar fast_mod255( uint a32 ) {
ushort a16 = (a32 >> 16) + (a32 & 0xFFFF); /* sum base 2**16 digits */
uchar a8 = (a16 >> 8) + (a16 & 0xFF); /* sum base 2**8 digits */
return (a8 % 255);
}
Another option is to just do:
uchar fast_mod255( uchar4 a ) {
return (dot(a) % 255); // or return (distance(a) % 255);
}
GPUs are very efficient in computing the distances and dot products, even in 4 dimensions. And it is a valid way of hashing as well. Dsicarding the overflowed values.
No branching, and a clever compiler can even optimize it out. Or do you really need that values that fall in the 255 zone have a scattered pattern instead of 1?
I wanted to answer my own question because over the last 2 years I have seen ways to get around a slow integer divide instruction. The easiest way is to make the integer a compile-time constant. Any decent modern compiler should replace the integer divide with an equivalent set of other instructions with typically higher throughput (how many such instructions can be retired per cycle) and reduced latency (how many cycles it takes the instruction to execute). If you're curious, check out Hacker's Delight (an excellent book on low-level computer arithmetic).
I wanted to share another finding, which I found on Daniel Lemire's blog (located here). The code that follows doesn't compute mod 255 but does something similar, which is equally useful in a number of applications and much faster.
Suppose that you have a set of numbers S that are uniformly randomly picked from the range 0 to 2^k - 1 inclusive, where k >= 0. In this case, if you care only about mapping numbers roughly uniformly from 0 to 254 inclusive, you may do the following:
For each number n in a set S, you may map n to one of the 255 candidate values by multiplying n by 255 and then arithmetically shifting the result to the right by k digits.
Here is the function that you call on each n for a fixed value of k:
int map_to_0_to_254(int n, int k){
return (n * 255) >> k;
}
As an example, if the values for the argument n range uniformly randomly from 0 to 4095 (2^12 - 1),
then map_to_0_254(n, 12) will return a value in the range 0 to 254 inclusive.
Here is a more general templated version in C++ for mapping to range from 0 to range_size - 1 inclusive:
template<typename T>
T map_to_0_to_range_size_minus_1(T n, T range_size, T k){
return (n * range_size) >> k;
}
REMEMBER that this code assumes that the inputs for n are roughly uniformly randomly distributed between 0 and 2^k - 1 inclusive. If that property holds, then the outputs will be roughly uniformly distributed between 0 and range_size - 1 inclusive. The larger 2^k is relative to range_size, the more uniform the mapping will be for a fixed set of inputs.
Why This is Useful
This approach has applications to computing hash functions for hash tables where the number of bins is not a power of 2. Those operations would ordinarily require a long-latency integer divide instruction, which is often an order of magnitude slower to execute than an integer multiply, because you often do not know the number of bins in the hash table at compile time.
I have been wondering for a while which of the two following methods are faster or better.
MY CURRENT METHOD
I'm developing a chess game and the pieces are stored as numbers (really bytes to preserve memory) into a one-dimensional array. There is a position for the cursor corresponding to the index in the array. To access the piece at the current position in the array is easy (piece = pieces[cursorPosition]).
The problem is that to get the x and y values for checking if the move is a valid move requires the division and a modulo operators (x = cursorPosition % 8; y = cursorPosition / 8).
Likewise when using x and y to check if moves are valid (you have to do it this way for reasons that would fill the entire page), you have to do something like - purely as an example - if pieces[y * 8 + x] != 0: movePiece = False. The obvious problem is having to do y * 8 + x a bunch of times to access the array.
Ultimately, this means that getting a piece is trivial but then getting the x and y requires another bit of memory and a very small amount of time to compute it each round.
A MORE TRADITIONAL METHOD
Using a two-dimensional array, one can implement the above process a little easier except for the fact that piece lookup is now a little harder and more memory is used. (I.e. piece = pieces[cursorPosition[0]][cursorPosition[1]] or piece = pieces[x][y]).
I don't think this is faster and it definitely doesn't look less memory intensive.
GOAL
My end goal is to have the fastest possible code that uses the least amount of memory. This will be developed for the unix terminal (and potentially Windows CMD if I can figure out how to represent the pieces without color using Ansi escape sequences) and I will either be using a secure (encrypted with protocol and structure) TCP connection to connect people p2p to play chess or something else and I don't know how much memory people will have or how fast their computer will be or how strong of an internet connection they will have.
I also just want to learn to do this the best way possible and see if it can be done.
-
I suppose my question is one of the following:
Which of the above methods is better assuming that there are slightly more computations involving move validation (which means that the y * 8 + x has to be used a lot)?
or
Is there perhaps a method that includes both of the benefits of 1d and 2d arrays with not as many draw backs as I described?
First, you should profile your code to make sure that this is really a bottleneck worth spending time on.
Second, if you're representing your position as an unsigned byte decomposing it into X and Y coordinates will be very fast. If we use the following C code:
int getX(unsigned char pos) {
return pos%8;
}
We get the following assembly with gcc 4.8 -O2:
getX(unsigned char):
shrb $3, %dil
movzbl %dil, %eax
ret
If we get the Y coordinate with:
int getY(unsigned char pos) {
return pos/8;
}
We get the following assembly with gcc 4.8 -O2:
getY(unsigned char):
movl %edi, %eax
andl $7, %eax
ret
There is no short answer to this question; it all depends on how much time you spend optimizing.
On some architectures, two-dimensional arrays might work better than one-dimensional. On other architectures, bitmapped integers might be the best.
Do not worry about division and multiplication.
You're dividing, modulating and multiplying by 8.
This number is in the power of two, thus any computer can use bitwise operations in order to achieve the result.
(x * 8) is the same as (x << 3)
(x % 8) is the same as (x & (8 - 1))
(x / 8) is the same as (x >> 3)
Those operations are normally performed in a single clock cycle. On many modern architectures, they can be performed in less than a single clock cycle (including ARM architectures).
Do not worry about using bitwise operators instead of *, % and /. If you're using a compiler that's less than a decade old, it'll optimize it for you and use bitwise operations.
What you should focus on instead, is how easy it will be for you to find out whether or not a move is legal, for instance. This will help your computer-player to "think quickly".
If you're using an 8*8 array, then it's easy for you to see where a castle can move by checking if only x or y is changed. If checking the queen, then X must either be the same or move the same number of steps as the Y position.
If you use a one-dimensional array, you also have advantages.
But performance-wise, it might be a real good idea to use a 16x16 array or a 1x256 array.
Fill the entire array with 0x80 values (eg. "illegal position"). Then fill the legal fields with 0x00.
If using a 1x256 array, you can check bit 3 and 7 of the index. If any of those are set, then the position is outside the board.
Testing can be done this way:
if(position & 0x88)
{
/* move is illegal */
}
else
{
/* move is legal */
}
... or ...
if(0 == (position & 0x88))
{
/* move is legal */
}
'position' (the index) should be an unsigned byte (uint8_t in C). This way, you'll never have to worry about pointing outside the buffer.
Some people optimize their chess-engines by using 64-bit bitmapped integers.
While this is good for quickly comparing the positions, it has other disadvantages; for instance checking if the knight's move is legal.
It's not easy to say which is better, though.
Personally, I think the one-dimensional array in general might be the best way to do it.
I recommend getting familiar (very familiar) with AND, OR, XOR, bit-shifting and rotating.
See Bit Twiddling Hacks for more information.
I've recently come across a problem where, using a cheap 16 bit uC (MSP430 series), I've had to generate a logarithmically spaced output value based on the 10 bit ADC read. The reason for this is that I require fine grain control at the low end of the integer space, while, at the same time, requiring the use of the larger values, though at less precision, (to me, the difference between 2^15 and 2^16 in my feedback loop is of little consequence). I've never done this before and I had no luck finding examples online, so I came up with a little scheme to do this on my operation-limited uC.
With my method here, the ADC result is linearly interpolated between the two closest integer powers-of-two via only integer multiplication/addition/summation and bitwise shifting, (outlined below).
My question is, is there a better, (faster/less operations), way than this to generate a smooth, (or smooth-ish), set of data logarithmically spaced over the integer resolution? I haven't found anything online, hence my attempt at coming up with something from scratch in the first place.
N is the logarithmic resolution of the micro controller, (here assumed to be 16 bit). M is the integer resolution of the ADC, (here assumed to be 10 bit). ADC_READ is the value read by the ADC at a given time. On a uC that supports floating point operations, doing this is trivial:
x = N / M #16/1024
y = (float) ADC_READ / M #ADC_READ/1024
result = 2 ^ ( x * y )
In all of the plots below, this is the "Ideal" set of values. The "Resultant" values are generated by variations of the following:
unsigned int returnValue( adcRead ){
unsigned int e;
unsigned int a;
unsigned int rise;
unsigned int base;
unsigned int xoffset;
unsigned int yoffset;
e = adcRead >> 6;
a = 1 << e;
rise = ( 1 << (e + 1) ) - ( 1 << e );
base = e << 6;
xoffset = adcRead - base;
yoffset = ( rise >> rise_shift ) * (xoffset >> offset_shift); //this is an operation to prevent rolling over. rise_shift + offset_shift = M/N, here = 6
result = a + yoffset;
return result;
}
The extra declarations and what not are for readability only. Assume the final product is condensed. Basically, it does as intended, with varying degrees of discretization at the low end and smoothness at the high end based on the values of rise_shift and offset_shift. Here, they are both equal to 3:
Here rise_shift = 2, offset_shift = 4
Here rise_shift = 4, offset_shift = 2
I'm interested to see if anyone has come up with or knows of anything better. Currently, I only have to run this code ~20-30 times a second, so I obviously have not encountered any delays. But, with a 16MHz clock, and using information from here, I estimate this entire operation taking at most ~110 clock cycles, or ~7us. This is on the scale the ADC read time, which is ~4us.
Thanks
EDIT: By "better" I do not necessarily just mean faster, (it's already quite fast, apparently). Immediately, one sees that the low end has fairly drastic discretization to the integer powers of two, which results from the shifting operations to prevent roll-ever. Other than a look-up table, (suggested below), the answer to how this could be improved is not immediate.
based on the 10 bit ADC read.
This ADC can output only 1024 different values (0-1023), so you can use a table of 1024 16-Bit values, which would consume 2KB Flash memory:
const uint16_t LogarithmicTable[1024] = { 0, 1, ... , 64380};
Calculating the logarithmic output is now a simple array access:
result = LogarithmicTable[ADC_READ];
You can use a tool like Excel to generate the constants in this Table for you.
It sounds like you want to compute the function 2n/64, which would map 1024 to 65536 just above the high end but maps anything up to 64 to zero (or one, depending on rounding). Other exponential functions could avoid the low-end discretization, but it's not clear whether that would help the functionality.
We can factor 2n/64 into 2floor( n/64 ) × 2(n mod 64)/64. Usually multiplying by an integer power of 2 involves a left shift, but because the other side is a fraction between one and two, we're better off doing a right shift.
uint16_t exp_table[ 64 ] = {
32768u,
pow( 2, 1./64 ) * 32768u,
pow( 2, 2./64 ) * 32768u,
...
};
uint16_t adc_exp( uint16_t linear ) {
return exp_table[ linear % 64 ] >> ( 15 - linear / 64 );
}
This loses no precision against a full, 2-kilobyte table. To save more space, use linear interpolation.
I'm trying to implement a tolerable-quality version of the rand_r interface, which has the unfortunate interface requirement that its entire state is stored in a single object of type unsigned, which for my purposes means exactly 32 bits. In addition, I need its output range to be [0,2³¹-1]. The standard solution is using a LCG and dropping the low bit (which has the shortest period), but this still leaves very poor periods for the next few bits.
My initial thought was to use two or three iterations of the LCG to generate the high/low or high/mid/low bits of the output. However, such an approach does not preserve the non-biased distribution; rather than each output value having equal frequency, many occur multiple times, and some never occur at all.
Since there are only 32 bits of state, the period of the PRNG is bounded by 2³², and in order to be non-biased, the PRNG must output each value exactly twice if it has full period or exactly once if it has period 2³¹. Shorter periods cannot be non-biased.
Is there any good known PRNG algorithm that meets these criteria?
One good (but probably not the fastest) possibility, offering very high quality, would be to use a 32-bit block cipher in CTR mode. Basically, your RNG state would simply be a 32-bit counter that gets incremented by one for each RNG call, and the output would be the encryption of that counter value using the block cipher with some arbitrarily chosen fixed key. For extra randomness, you could even provide a (non-standard) function to let the user set a custom key.
There aren't a lot of 32-bit block ciphers in common use, since such a short block size introduces problems for cryptographic use. (Basically, the birthday paradox lets you distinguish the output of such a cipher from a random function with a non-negligible probability after only about 216 = 65536 outputs, and after 232 outputs the non-randomness obviously becomes certain.) However, some ciphers with an adjustable block size, such as XXTEA or HPC, will let you go down to 32 bits, and should be suitable for your purposes.
(Edit: My bad, XXTEA only goes down to 64 bits. However, as suggested by CodesInChaos in the comments, Skip32 might be another option. Or you could build your own 32-bit Feistel cipher.)
The CTR mode construction guarantees that the RNG will have a full period of 232 outputs, while the standard security claim of (non-broken) block ciphers is essentially that it is not computationally feasible to distinguish their output from a random permutation of the set of 32-bit integers. (Of course, as noted above, such a permutation is still easily distinguished from a random function taking 32-bit values.)
Using CTR mode also provides some extra features you may find convenient (even if they're not part of the official API you're developing against), such as the ability to quickly seek into any point in the RNG output stream just by adding or subtracting from the state.
On the other hand, you probably don't want to follow the common practice of seeding the RNG by just setting the internal state to the seed value, since that would cause the output streams generated from nearby seeds to be highly similar (basically just the same stream shifted by the difference of the seeds). One way to avoid this issue would be to add an extra encryption step to the seeding process, i.e. to encrypt the seed with the cipher and set the internal counter value equal to the result.
A 32-bit maximal-period Galois LFSR might work for you. Try:
r = (r >> 1) ^ (-(r & 1) & 0x80200003);
The one problem with LFSRs is that you can't produce the value 0. So this one has a range of 1 to 2^32-1. You may want to tweak the output or else stick with a good LCG.
Besides using a Lehmer MCG, there's a couple you could use:
32-bit variants of Xorshift have a guaranteed period of 232−1 using a 32-bit state:
uint32_t state;
uint32_t xorshift32(void) {
state ^= state << 13;
state ^= state >> 17;
state ^= state << 5;
return state;
}
That's the original 32-bit recommendation from 2003 (see paper). Depending on your definition of "decent quality", that should be fine. However it fails the binary rank tests of Diehard, and 5/10 tests of SmallCrush.
Alternate version with better mixing and constants (passes SmallCrush and Crush):
uint32_t xorshift32amx(void) {
int s = __builtin_bswap32(state * 1597334677);
state ^= state << 13;
state ^= state >> 17;
state ^= state << 5;
return state + s;
}
Based on research here and here.
There's also Mulberry32 which has a period of exactly 232:
uint32_t mulberry32(void) {
uint32_t z = state += 0x6D2B79F5;
z = (z ^ z >> 15) * (1 | z);
z ^= z + (z ^ z >> 7) * (61 | z);
return z ^ z >> 14;
}
This is probably your best option. It's quite good. Author states "It passes gjrand's 13 tests with no failures and a total P-value
of 0.984 (where 1 is perfect and 0.1 or less is a failure) on 4GB of
generated data. That's a quarter of the full period". It appears to be an improvement over SplitMix32.
"SplitMix32", adopted from xxHash/MurmurHash3 (Weyl sequence):
uint32_t splitmix32(void) {
uint32_t z = state += 0x9e3779b9;
z ^= z >> 15; // 16 for murmur3
z *= 0x85ebca6b;
z ^= z >> 13;
z *= 0xc2b2ae35;
return z ^= z >> 16;
}
The quality might be questionable here, but its 64-bit big brother has a lot of fans (passes BigCrush). So the general structure is worth looking at.
Elaborating on my comment...
A block cipher in counter mode gives a generator in approximately the following form (except using much bigger data types):
uint32_t state = 0;
uint32_t rand()
{
state = next(state);
return temper(state);
}
Since cryptographic security hasn't been specified (and in 32 bits it would be more or less futile), a simpler, ad-hoc tempering function should do the trick.
One approach is where the next() function is simple (eg., return state + 1;) and temper() compensates by being complex (as in the block cipher).
A more balanced approach is to implement LCG in next(), since we know that it also visits all possible states but in a random(ish) order, and to find an implementation of temper() which does just enough work to cover the remaining problems with LCG.
Mersenne Twister includes such a tempering function on its output. That might be suitable. Also, this question asks for operations which fulfill the requirement.
I have a favourite, which is to bit-reverse the word, and then multiply it by some constant (odd) number. That may be overly complex if bit-reverse isn't a native operation on your architecture.
I have one set of continuous integer values and corresponding set of non-continuous values, for example:
0 -> 22
1 -> 712
2 -> 53
3 -> 12323
...
and so on.
Amount of items is very huge (about of 10^9...10^10), so using just plain array is not an option.
Is there data structure that capable of fast mapping from first values to another with moderate memory requirements? For example:
ret = map(0); // returns 22
ret = map(3); // returns 12323
Edit: values in this set are really generated using pseudo-random number generator, so it is not possible to suggest some specific distribution. Question is - is it possible to lower memory requirements (may be in price of lookup speed)? I mean using something like "perfect hashing" - time required for generate such "perfect hash" doesn't matter.
As your range is continuous, the obvious solution is to store your values in a contiguous int[]. Then value i is arr[i]. As the values generated by PRNG, it will be difficult to apply further compression.
Another solution, which trades time for space, is to store the seed of your RNG and recalculate on the fly. This approach could be improved in time, and worsened in space, by storing intermediate seeds. I.e. seed for key 1000, 2000 etc.
You may be able to save some space by using exactly the number of bits required by each value. For example if your values are only 24 bits, you can save a byte over 32-bit integers. That said, there is only so much memory you can save.
Ob 64-bit machines it would be feasible to mmap() a file to a memory address, thus getting over the physical memory limit by using disk storage, at the price of performance.
But since you mentioned using a pseudo-random generator to generate the values, how about just storing the RNG seed for specific indexes and calculating the rest of the values as needed? For example you could store the seed for indexes 0, 100, 200, ... and calculate e.g. 102 by re-seeding the RNG for 100 and calling the generator function three times.
Such an approach would reduce the memory needed by a large factor (100 in this case) and you could lessen the performance cost by bunching or caching your queries.
If the range of your function is the set of numbers generated by a pseudo-random number generator in sequence then you can compress the series down to, well, to the code which generates the sequence plus the state of the PRNG before starting. For example, the (infinite) series of digits comprising the decimal expansion of pi is easily (and, technically, infinitely) compressed to the code to generate that series; your series could be seen as an example of something almost identical.
So, if you are willing to wait for a long time to get the last elements in the series, you can get very good compression, by writing your series not into a data structure but out of a function. That is at one end of your time/space trade-off spectrum.
At the other end of the spectrum is an array of all the numbers; this uses lots of space but gives very quick (O(1)) access to any desired element in the set. This doesn't seem to appeal to you for a variety of reasons, but I'm not sure that a cleverer data structure than an array will offer much space saving, or, for that matter, time saving.
The one obvious solution I see is to save a set of intermediate states of the PRNG at intervals, so your 'data' structure would become:
ret(0) = prng(seed, other_parameters, ...)
ret(10^5-1) = prng(seed', other_parameters, ...)
ret(2*(10^5)-1) = prng(seed'', other_parameters, ...)
etc. then, to get element 9765, say, you read (the state of the PRNG at) ret(0) and generate the 9765-th pseudo-random number thereafter.
Ok, so the intent is to trade speed for less memory usage.
Imagine that you have some sort of loop that fills the array.
int array[intendedArraySize];
seed = 3;
for (size_t z = 0; z < intendedArraySize; z++)
{
array[z] = some_int_psn_generator(seed);
}
After which you can display the values.
for (size_t z = 0; z < intendedArraySize; z++)
{
std::cout << z << " " << array[z] << std::endl;
}
If that is indeed the case, consider discarding the array altogether, by simply recalculating the value each time.
for (size_t z = 0; z < intendedArraySize; z++)
{
std::cout << z << " " << some_int_psn_generator(z) << std::endl;
}