I'm looking for a commonly understandable notation to define a fixed point number representation.
The notation should be able to define both a power-of-two factor (using fractional bits) and a generic factor (sometimes I'm forced to use this, though less efficient). And also an optional offset should be defined.
I already know some possible notations, but all of them seem to be constrained to specific applications.
For example the Simulink notation would perfectly fit my needs, but it's known only in the Simulink world. Furthermore the overloaded usage of the fixdt() function is not so readable.
TI defines a really compact Q Formats, but the sign is implicit, and it doesn't manage a generic factor (i.e. not a power-of-two).
ASAM uses a generic 6-coefficient rational function with 2nd-degree numerator and denominator polynomials (COMPU_METHOD). Very generic, but not so friendly.
See also the Wikipedia discussion.
The question is only about the notation (not efficiency of the representation nor fixed-point manipulation). So it's a matter of code readability, maintenability and testability.
Ah, yes. Having good naming annotations is absolutely critical to not introducing bugs with fixed point arithmetic. I use an explicit version of the Q notation which handles
any division between M and N by appending _Q<M>_<N> to the name of the variable. This also makes it possible to include the signedness as well. There are no run-time performance penalties for this. Example:
uint8_t length_Q2_6; // unsigned, 2 bit integer, 6 bit fraction
int32_t sensor_calibration_Q10_21; // signed (1 bit), 10 bit integer, 21 bit fraction.
/*
* Calculations with the bc program (with '-l' argument):
*
* sqrt(3)
* 1.73205080756887729352
*
* obase=16
* sqrt(3)
* 1.BB67AE8584CAA73B0
*/
const uint32_t SQRT_3_Q7_25 = 1 << 25 | 0xBB67AE85U >> 7; /* Unsigned shift super important here! */
In case someone have not fully understood why such annotation is extremely important,
Can you spot the if there is an bug in the following two examples?
Example 1:
speed_fraction = fix32_udiv(25, speed_percent << 25, 100 << 25);
squared_speed = fix32_umul(25, speed_fraction, speed_fraction);
tmp1 = fix32_umul(25, squared_speed, SQRT_3);
tmp2 = fix32_umul(12, tmp1 >> (25-12), motor_volt << 12);
Example 2:
speed_fraction_Q7_25 = fix32_udiv(25, speed_percent << 25, 100 << 25);
squared_speed_Q7_25 = fix32_umul(25, speed_fraction_Q7_25, speed_fraction_Q7_25);
tmp1_Q7_25 = fix32_umul(25, squared_speed_Q7_25, SQRT_3_Q1_31);
tmp2_Q20_12 = fix32_umul(12, tmp1_Q7_25 >> (25-12), motor_volt << 12);
Imagine if one file contained #define SQRT_3 (1 << 25 | 0xBB67AE85U >> 7) and another file contained #define SQRT_3 (1 << 31 | 0xBB67AE85U >> 1) and code was moved between those files. For example 1 this has a high chance of going unnoticed and introduce the bug that is present in example 2 which here is done deliberately and has a zero chance of being done accidentally.
Actually Q format is the most used representation in commercial applications: you use is when you need to deal with fractional numbers FAST and your processor does not have a FPU (floating point unit) is it cannot use float and double data types natively - it has to emulate instructions for them which are very expensive.
usually you use Q format to represent only the fractional part, though this not a must, you get more precision for your representation. Here's what you need to consider:
number of bits you use (Q15 uses 16 bitdata types, usually short int)
the first bit is the sign bit (out of 16 bits you are left with 15 for data value)
the rest of the bits are used to store the fractional part of your number.
since you are representing fractional numbers your value is somewhere in [0,1)
you can choose to use some bits for the integer part as well, but you would loose precision - e.g if you wanted to represent 3.3 in Q format, you would need 1 bit for sign, 2 bits for the integer part, and are left with 13 bits for the fractional part (assuming you are using 16 bits representation)-> this format is called 2Q13
Example: Say you want to represent 0.3 in Q15 format; you apply the Rule of Three:
1 = 2^15 = 32768 = 0x8000
0.3 = X
-------------
X = 0.3*32768 = 9830 = 0x666
You lost precision by doing this but at least the computation is fast now.
In C, you can't use a user defined type like a builtin one. If you want to do that, you need to use C++. In that language you can define a class for your fixed point type, overload all the arithmetic operators (+, -, *, /, %, +=, -=, *=, /=, %=, --, ++, cast to other types), so that usage of the instances of this class really behave like the builtin types.
In C, you need to do what you want explicitly. There are two basic approaches.
Approach 1: Do the fixed point adjustments in the user code.
This is overhead-free, but you need to remember to do the correct adjustments. I believe, it is easiest to just add the number of past point bits to the end of the variable name, because the type system won't do you much good, even if you typedef'd all the point positions you use. Here is an example:
int64_t a_7 = (int64_t)(7.3*(1<<7)); //a variable with 7 past point bits
int64_t b_5 = (int64_t)(3.78*(1<<5)); //a variable with 5 past point bits
int64_t sum_7 = a_7 + (b_5 << 2); //to add those two variables, we need to adjust the point position in b
int64_t product_12 = a_7 * b_5; //the product produces a number with 12 past point bits
Of course, this is a lot of hassle, but at least you can easily check at every point whether the point adjustment is correct.
Approach 2: Define a struct for your fixed point numbers and encapsulate the arithmetic on it in a bunch of functions. Like this:
typedef struct FixedPoint {
int64_t data;
uint8_t pointPosition;
} FixedPoint;
FixedPoint fixed_add(FixedPoint a, FixedPoint b) {
if(a.pointPosition >= b.PointPosition) {
return (FixedPoint){
.data = a.data + (b.data << a.pointPosition - b.pointPosition),
.pointPosition = a.pointPosition
};
} else {
return (FixedPoint){
.data = (a.data << b.pointPosition - a.pointPosition) + b.data,
.pointPosition = b.pointPosition
};
}
}
This approach is a bit cleaner in the usage, however, it introduces significant overhead. That overhead consists of:
The function calls.
The copying of the structs for parameter and result passing, or the pointer dereferences if you use pointers.
The need to calculate the point adjustments at runtime.
This is pretty much similar to the overhead of a C++ class without templates. Using templates would move some decisions back to compile time, at the cost of loosing flexibility.
This object based approach is probably the most flexible one, and it allows you to add support for non-binary point positions in a transparent way.
Related
I'm trying to implement the following function to be calculated by my STM32
y=0.0006*x^3 - 0.054*x^2 + 2.9094*x - 2.3578
x is in range 0 to 215
To avoid the use of pow or any other functions, I have written the following code
static double tmp = 0;
static unsigned short c_m;
static unsigned short c_m_3;
static unsigned short c_m_2;
c_m_3 = c_m*c_m*c_m;
c_m_2 = c_m*c_m;
tmp = (0.0006*c_m_3) - (0.054*c_m_2) + (2.9094*c_m) - 2.3578;
dati->sch_g = tmp;
For some reason the calculation is totally wrong as, for istane, if c_m = 1 I should have tmp = 0.4982 instead I got 13
Am I missing something?
As denoted by Lundin in the comments your micro controller type (ARM Cortex M0) doesn't provide a floating point unit. This in consequence means you cannot rely on natural floating point math, but need to rely on a floating point software library like e.g. this one (note: I didn't evaluate, has just been the very first I stumbled upon on a quick search!).
Alternatively – and likely preferrably – you might want to do the calculations in plain integers; if you additionally convert your calculation from pattern a*x*x*x + b*x*x + c*x + d to ((a*x + b)*x + c)*x + d you even spare some mulitiplications:
int32_t c_m = ...;
c_m = ((6 * c_m - 540) * c_m + 29094) * c_m - 23578;
Note: unsigned short would be too small to hold the result on STM32, so you need to switch to 32 bit at least! Additionally you need a signed value to be able to hold the negative result arising from c_m == 0.
Your results would now be too large by a factor of 10 000, of course. As the use case is unclear question remains open how you would want to deal with, possibly rounding it (c_m = (c_m + 5000) / 10000) or evaluating the fractional part by other means.
short is 16 bits on all STM32. Thus the value 215 * 215 * 215 will not fit inside one. c_m_3 = c_m*c_m*c_m; truncates the result as per modulus USHRT_MAX+1 (65536):
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
Use uint32_t instead.
short is only 16 bits, the max value it can hold is 65535. Therefore it will overflow if the number you want to calculate the third power for is over 40. This means that you must use a larger variable type like uint32_t.
You can also use ifs to detect overflow for better programming practices.
As another note, it's better to use "uint8_t" and "uint16_t" instead of "unsigned char" and "unsigned short" in embedded programming, because they're more descriptive of the data sizes.
I'm writing some code for an embedded system (MSP430) without hardware floating point support. Unfortunately, I will need to work with fractions in my code as I'm doing ranging, and a short-range sensor with a precision of 1m isn't a very good sensor.
I can do as much of the math as I need in ints, but by the end there are two values that I will definitely need to have fractions on; range and speed. Range will be a value between 2-500 (cm), while speed should be no higher than -10 to 10 (ms^-1). I am unsure how to represent them without floating point values, if it is possible. A simple way of rounding the fractions up or down would be best.
Some sample code I have:
voltage_difference_new = ((memval3_new - memval4_new)*3.3/4096);
where memval3_new and memval4_new are ints, but voltage_difference_new is a float.
Please let me know if more information is needed. Or if there is a blindingly easy fix.
You have rather answered your own question with the statement:
Range will be a value between 2-500 (cm),
Work in centimetre (or even millimetre) rather than metre units.
That said you don't need floating-point hardware to do floating point math; the compiler will support "soft" floating point and generate the code to perform floating point operations - it will be slower than hardware floating point or integer operations, but that may not be an issue in your application.
Nonetheless there are many reasons to avoid floating-point even with hardware support and it does not sound like your case for FP is particularly compelling, but it is hard to tell without seeing your code and a specific example. In 32 years of embedded systems development I have seldom resorted to FP even for trig, log, sqrt and digital signal processing.
A general method is to use a fixed point presentation. My earlier suggestion of using centimetres is an example of decimal fixed point, but for greater efficiency you should use binary fixed point. For example you might represent distance in 1/1024 metre units (giving > 1 mm precision). Because the fixed point is binary, all the necessary rescaling can be done with shifts rather than more expensive multiply/divide operations.
For example, say you have an 8 bit sensor generating linear output 0 to 255 corresponding to a real distance 0 to 0.5 metre.
#define Q10_SHIFT = 10 ; // 10 bits fractional (1/1024)
typedef int q10_t ;
#define ONE_METRE = (1 << Q10_SHIFT)
#define SENSOR_MAX = 255
#define RANGE_MAX = (ONE_METRE/2)
q10_t distance = read_sensor() * RANGE_MAX / SENSOR_MAX ;
distance is in Q10 fixed point representation. Performing addition and subtraction on such is normal integer arithmentic, multiply and divide require scaling:
int q10_add( q10_t a, q10_t b )
{
return a + b ;
}
int q10_sub( q10_t a, q10_t b )
{
return a - b ;
}
int q10_mul( q10_t a, q10_t b )
{
return (a * b) >> Q10_SHIFT ;
}
int q10_div( q10_t a, q10_t b )
{
return (a << Q10_SHIFT) / b ;
}
Of course you may want to be able to mix types and say multiply a q10_t by an int - providing a comprehensive library for fixed-point can get complex. Personally for that I use C++ where you have classes, function overloading and operator overloading to support more natural code. But unless your code has a great deal of general fixed point math, it may be simpler to code specific fixed point operations ad-hoc.
To take the one example you have provided:
double voltage_difference_new = ((memval3_new - memval4_new)*3.3/4096);
The floating-point there is trivially removed using millivolts:
int voltage_difference_new_mv = ((memval3_new - memval4_new) * 3300) /4096 ;
The issue then perhaps becomes one of presentation. For example if you have to present or report the value in volts to a user. In that case:
int volt_fract = abs(voltage_difference_new_mv % 1000) ;
int volt_whole = voltage_difference_new_mv / 1000 ;
printf( "%d.%04d", volt_whole, volt_fract ) ;
Background
I have a simple media client/server I've written, and I want to generate a non-obvious time value I send with each command from the client to the server. The timestamps will have a fair bit of data in them (nano-second resolution, even if it's not truly accurate, due to limitations of timer sampling in modern operating systems), etc.
What I'm trying to do (on Linux, in C), is to generate a one-to-one sequence of n-bit values (let's assume data is store in 128bit array-of-int elements for now) with no overlapping/colliding values. I would then take a pseudo-random 128bit value/number as a "salt", apply it to the timestamp, and then start sending off commands to the server, incrementing the pre-salted/pre-hashed value.
The reason the timestamp size is so large is because the timestamp may have to accommodate a very large duration of time.
Question
How could I accomplish such a sequence (non-colliding) with an initial salt value? The best approach that sounds along the lines of my goal is from this post, which notes:
If option 1 isn't "random" enough for you, use the CRC-32 hash of said
global (32-bit) counter. There is a 1-to-1 mapping (bijection) between
N-bit integers and their CRC-N so uniqueness will still be guaranteed.
However, I do not know:
If that can (efficiently) be extended to 128-bit data.
If some sort of addition-to/multiplication-by salt-value to provide the initial seed for the sequence would disrupt it or introduce collisions.
Follow-up
I realize that I could use a 128bit random hash from libssl or something similar, but I want the remote server, using the same salt value, to be able to convert the hashed timestamps back into their true values.
Thank you.
You could use a linear congruential generator. With the right parameters, it is guaranteed to produce non-repeating sequences [unique] sequences with a full period (i.e. no collisions).
This is what random(3) uses in TYPE_0 mode. I adapted it for a full unsigned int range and the seed can be any unsigned int (See my sample code below).
I believe it can be extended to 64 or 128 bits. I'd have a look at: https://en.wikipedia.org/wiki/Linear_congruential_generator to see about the constraints on parameters to prevent collisions and good randomness.
Following the wiki page guidelines, you could produce one that can take any 128 bit value as the seed and will not repeat until all possible 128 bit numbers have been generated.
You may need to write a program to generate suitable parameter pairs and then test them for the "best" randomness. This would be a one time operation.
Once you've got them, just plug these parameters into your equation in your actual application.
Here's some code of mine that I had been playing with when I was looking for something similar:
// _prngstd -- get random number
static inline u32
_prngstd(prng_p prng)
{
long rhs;
u32 lhs;
// NOTE: random is faster and has a _long_ period, but it _only_ produces
// positive integers but jrand48 produces positive _and_ negative
#if 0
rhs = jrand48(btc->btc_seed);
lhs = rhs;
#endif
// this has collisions
#if 0
rhs = rand();
PRNG_FLIP;
#endif
// this has collisions because it defaults to TYPE_3
#if 0
rhs = random();
PRNG_FLIP;
#endif
// this is random in TYPE_0 (linear congruential) mode
#if 0
prng->prng_state = ((prng->prng_state * 1103515245) + 12345) & 0x7fffffff;
rhs = prng->prng_state;
PRNG_FLIP;
#endif
// this is random in TYPE_0 (linear congruential) mode with the mask
// removed to get full range numbers
// this does _not_ produce overlaps
#if 1
prng->prng_state = ((prng->prng_state * 1103515245) + 12345);
rhs = prng->prng_state;
lhs = rhs;
#endif
return lhs;
}
The short answer is encryption. With a set of 128 bit values feed them into AES and get a different set of 128 bit values out. Because encryption is reversible the outputs are guaranteed unique for unique inputs with a fixed key.
Encryption is a reversible one-to-one mapping of the input values to the output values, each set is a full permutation of the other.
Since you are presumably not repeating your inputs, then ECB mode is probably sufficient, unless you want a greater degree of security. ECB mode is vulnerable if used repeatedly with identical inputs, which does not appear to be the case here.
For inputs shorter than 128 bits, then use a fixed padding method to make them the right length. As long as the uniqueness of inputs is not affected, then padding can be reasonably flexible. Zero padding, at either end (or at the beginning of internal fields) may well be sufficient.
I do not know your detailed requirements, so feel free to modify my advice.
Somewhere between linear congruential generators and encryption functions there are hashes that can convert linear counts into passable pseudorandom numbers.
If you happen to have 128-bit integer types handy (eg., __int128 in GCC when building for a 64-bit target), or are willing to implement such long multiplies by hand, then you could extend on the construction used in SplitMix64. I did a fairly superficial search and came up with the following parameters:
uint128_t mix(uint128_t x) {
uint128_t m0 = (uint128_t)0xecfb1b9bc1f0564f << 64
| 0xc68dd22b9302d18d;
uint128_t m1 = (uint128_t)0x4a4cf0348b717188 << 64
| 0xe2aead7d60f8a0df;
x ^= x >> 59;
x *= m0;
x ^= x >> 60;
x *= m1;
x ^= x >> 84;
return x;
}
and its inverse:
uint128_t unmix(uint128_t x) {
uint128_t im0 = (uint128_t)0x367ce11aef44b547 << 64
| 0x424b0c012b51d945;
uint128_t im1 = (uint128_t)0xef0323293e8f059d << 64
| 0x351690f213b31b1f;
x ^= x >> 84;
x *= im1;
x ^= x >> 60 ^ x >> (2 * 60);
x *= im0;
x ^= x >> 59 ^ x >> (2 * 59);
return x;
}
I'm not sure if you wanted a just a random sequence, or a way to obfuscate an arbitrary timestamp (since you said you wanted to decode the values they must be more interesting than a linear counter), but one derives from the other simply enough:
uint128_t encode(uint128_t time, uint128_t salt) {
return mix((time + 1) * salt);
}
uint128_t generate(uint128_t salt) {
static uint128_t t = 0;
return encode(t++, salt);
}
static uint128_t inv(uint128_t d) {
uint128_t i = d;
while (i * d != 1) {
i *= 2 - i * d;
}
return i;
}
uint128_t decode(uint128_t etime, uint128_t salt) {
return unmix(etime) * inv(salt) - 1;
}
Note that salt chooses one of 2127 sequences of non-repeating 128-bit values (we lose one bit because salt must be odd), but there are (2128)! possible sequences that could have been generated. Elsewhere I'm looking at extending the parameterisation so that more of these sequences can be visited, but I started goofing around with the above method for increasing the randomness of the sequence to hide any problems where the parameters could pick not-so-random (but provably distinct) sequences.
Obviously uint128_t isn't a standard type, and so my answer is not C, but you can use either a bignumber library or a compiler extension to make the arithmetic work. For clarity I relied on the compiler extension. All the operations rely on C-like unsigned overflow behaviour (take the low-order bits of the arbitrary-precision result).
I need to be able to use floating-point arithmetic under my dev environment in C (CPU: ~12 MHz Motorola 68000). The standard library is not present, meaning it is a bare-bones C and no - it isn't gcc due to several other issues
I tried getting the SoftFloat library to compile and one other 68k-specific FP library (name of which escapes me at this moment), but their dependencies cannot be resolved for this particular platform - mostly due to libc deficiencies. I spent about 8 hrs trying to overcome the linking issues, until I got to a point where I know I can't get further.
However, it took mere half an hour to come up with and implement the following set of functions that emulate floating-point functionality sufficiently for my needs.
The basic idea is that both fractional and non-fractional part are 16-bit integers, thus there is no bit manipulation.
The nonfractional part has a range of [-32767, 32767] and the fractional part [-0.9999, +0.9999] - which gives us 4 digits of precision (good enough for my floating-point needs - albeit wasteful).
It looks to me, like this could be used to make a faster, smaller - just 2 Bytes-big - alternative version of a float with ranges [-99, +99] and [-0.9, +0.9]
The question here is, what other techniques - other than IEEE - are there to make an implementation of basic floating-point functionality (+ - * /) using fixed-point functionality?
Later on, I will need some basic trigonometry, but there are lots of resources on net for that.
Since the HW has 2 MBs of RAM, I don't really care if I can save 2 bytes per soft-float (say - by reserving 9 vs 7 bits in an int). Thus - 4 bytes are good enough.
Also, from brief looking at the 68k instruction manual (and the cycle costs of each instruction), I made few early observations:
bit-shifting is slow, and unless performance is of critical importance (not the case here), I'd prefer easy debugging of my soft-float library versus 5-cycles-faster code. Besides, since this is C and not 68k ASM, it is obvious that speed is not a critical factor.
8-bit operands are as slow as 16-bit (give or take a cycle in most cases), thus it looks like it does not make much sense to compress floats for the sake of performance.
What improvements / approaches would you propose to implement floating-point in C using fixed-point without any dependency on other library/code?
Perhaps it would be possible to use a different approach and do the operations on frac & non-frac parts at the same time?
Here is the code (tested only using the calculator), please ignore the C++ - like declaration and initialization in the middle of functions (I will reformat that to C-style later):
inline int Pad (int f) // Pad the fractional part to 4 digits
{
if (f < 10) return f*1000;
else if (f < 100) return f*100;
else if (f < 1000) return f*10;
else return f;
}
// We assume fractional parts are padded to full 4 digits
inline void Add (int & b1, int & f1, int b2, int f2)
{
b1 += b2;
f1 +=f2;
if (f1 > 9999) { b1++; f1 -=10000; }
else if (f1 < -9999) { b1--; f1 +=10000; }
f1 = Pad (f1);
}
inline void Sub (int & b1, int & f1, int b2, int f2)
{
// 123.1652 - 18.9752 = 104.1900
b1 -= b2; // 105
f1 -= f2; // -8100
if (f1 < 0) { b1--; f1 +=10000; }
f1 = Pad (f1);
}
// ToDo: Implement a multiplication by float
inline void Mul (int & b1, int & f1, int num)
{
// 123.9876 * 251 = 31120.8876
b1 *=num; // 30873
long q = f1*num; //2478876
int add = q/10000; // 247
b1+=add; // 31120
f1 = q-(add*10000);//8876
f1 = Pad (f1);
}
// ToDo: Implement a division by float
inline void Div (int & b1, int & f1, int num)
{
// 123.9876 / 25 = 4.959504
int b2 = b1/num; // 4
long q = b1 - (b2*num); // 23
f1 = ((q*10000) + f1) / num; // (23000+9876) / 25 = 9595
b1 = b2;
f1 = Pad (f1);
}
You are thinking in the wrong base for a simple fixed point implementation. It is much easier if you use the bits for the decimal place. e.g. using 16 bits for the integer part and 16 bits for the decimal part (range -32767/32767, precision of 1/2^16 which is a lot higher precision than you have).
The best part is that addition and subtraction are simple (just add the two parts together). Multiplication is a little bit trickier: you have to be aware of overflow and so it helps to do the multiplication in 64 bit. You also have to shift the result after the multiplication (by however many bits are in your decimal).
typedef int fixed16;
fixed16 mult_f(fixed16 op1, fixed16 op2)
{
/* you may need to do something tricky with upper and lower if you don't
* have native 64 bit but the compiler might do it for us if we are lucky
*/
uint64_t tmp;
tmp = (op1 * op2) >> 16;
/* add in error handling for overflow if you wish - this just wraps */
return tmp & 0xFFFFFFFF;
}
Division is similar.
Someone might have implemented almost exactly what you need (or that can be hacked to make it work) that's called libfixmath
If you decide to use fixed-point, the whole number (i.e both int and fractional parts) should be in the same base. Using binary for the int part and decimal for the fractional part as above is not very optimal and slows down the calculation. Using binary fixed-point you'll only need to shift an appropriate amount after each operation instead of long adjustments like your idea. If you want to use Q16.16 then libfixmath as dave mentioned above is a good choice. If you want a different precision or floating point position such as Q14.18, Q19.13 then write your own library or modify some library for your own use. Some examples
BoostGSoC15/fixed_point
https://github.com/johnmcfarlane/cnl
See also What's the best way to do fixed-point math?
If you want a larger range then floating point maybe the better choice. Write a library as your own requirements, choose a format that is easy to implement and easy to achieve good performance in software, no need to follow IEEE 754 specifications (which is only fast with hardware implementations due to the odd number of bits and strange exponent bits' position) unless you intend to exchange data with other devices. For example a format of exp.sign.significand with 7 exponent bits followed by a sign bit and then 24 bits of significand. The exponent doesn't need to be biased, so to get the base only an arithmetic shift by 25 is needed, the sign bit will also be extended. But in case the shift is slower than a subtraction then excess-n is better.
Given a N-dimensional vector of small integers is there any simple way to map it with one-to-one correspondence to a large integer number?
Say, we have N=3 vector space. Can we represent a vector X=[(int16)x1,(int16)x2,(int16)x3] using an integer (int48)y? The obvious answer is "Yes, we can". But the question is: "What is the fastest way to do this and its inverse operation?"
Will this new 1-dimensional space possess some very special useful properties?
For the above example you have 3 * 32 = 96 bits of information, so without any a priori knowledge you need 96 bits for the equivalent long integer.
However, if you know that your x1, x2, x3, values will always fit within, say, 16 bits each, then you can pack them all into a 48 bit integer.
In either case the technique is very simple you just use shift, mask and bitwise or operations to pack/unpack the values.
Just to make this concrete, if you have a 3-dimensional vector of 8-bit numbers, like this:
uint8_t vector[3] = { 1, 2, 3 };
then you can join them into a single (24-bit number) like so:
uint32_t all = (vector[0] << 16) | (vector[1] << 8) | vector[2];
This number would, if printed using this statement:
printf("the vector was packed into %06x", (unsigned int) all);
produce the output
the vector was packed into 010203
The reverse operation would look like this:
uint8_t v2[3];
v2[0] = (all >> 16) & 0xff;
v2[1] = (all >> 8) & 0xff;
v2[2] = all & 0xff;
Of course this all depends on the size of the individual numbers in the vector and the length of the vector together not exceeding the size of an available integer type, otherwise you can't represent the "packed" vector as a single number.
If you have sets Si, i=1..n of size Ci = |Si|, then the cartesian product set S = S1 x S2 x ... x Sn has size C = C1 * C2 * ... * Cn.
This motivates an obvious way to do the packing one-to-one. If you have elements e1,...,en from each set, each in the range 0 to Ci-1, then you give the element e=(e1,...,en) the value e1+C1*(e2 + C2*(e3 + C3*(...Cn*en...))).
You can do any permutation of this packing if you feel like it, but unless the values are perfectly correlated, the size of the full set must be the product of the sizes of the component sets.
In the particular case of three 32 bit integers, if they can take on any value, you should treat them as one 96 bit integer.
If you particularly want to, you can map small values to small values through any number of means (e.g. filling out spheres with the L1 norm), but you have to specify what properties you want to have.
(For example, one can map (n,m) to (max(n,m)-1)^2 + k where k=n if n<=m and k=n+m if n>m--you can draw this as a picture of filling in a square like so:
1 2 5 | draw along the edge of the square this way
4 3 6 v
8 7
if you start counting from 1 and only worry about positive values; for integers, you can spiral around the origin.)
I'm writing this without having time to check details, but I suspect the best way is to represent your long integer via modular arithmetic, using k different integers which are mutually prime. The original integer can then be reconstructed using the Chinese remainder theorem. Sorry this is a bit sketchy, but hope it helps.
To expand on Rex Kerr's generalised form, in C you can pack the numbers like so:
X = e[n];
X *= MAX_E[n-1] + 1;
X += e[n-1];
/* ... */
X *= MAX_E[0] + 1;
X += e[0];
And unpack them with:
e[0] = X % (MAX_E[0] + 1);
X /= (MAX_E[0] + 1);
e[1] = X % (MAX_E[1] + 1);
X /= (MAX_E[1] + 1);
/* ... */
e[n] = X;
(Where MAX_E[n] is the greatest value that e[n] can have). Note that these maximum values are likely to be constants, and may be the same for every e, which will simplify things a little.
The shifting / masking implementations given in the other answers are a generalisation of this, for cases where the MAX_E + 1 values are powers of 2 (and thus the multiplication and division can be done with a shift, the addition with a bitwise-or and the modulus with a bitwise-and).
There is some totally non portable ways to make this real fast using packed unions and direct accesses to memory. That you really need this kind of speed is suspicious. Methods using shifts and masks should be fast enough for most purposes. If not, consider using specialized processors like GPU for wich vector support is optimized (parallel).
This naive storage does not possess any usefull property than I can foresee, except you can perform some computations (add, sub, logical bitwise operators) on the three coordinates at once as long as you use positive integers only and you don't overflow for add and sub.
You'd better be quite sure you won't overflow (or won't go negative for sub) or the vector will become garbage.
#include <stdint.h> // for uint8_t
long x;
uint8_t * p = &x;
or
union X {
long L;
uint8_t A[sizeof(long)/sizeof(uint8_t)];
};
works if you don't care about the endian. In my experience compilers generate better code with the union because it doesn't set of their "you took the address of this, so I must keep it in RAM" rules as quick. These rules will get set off if you try to index the array with stuff that the compiler can't optimize away.
If you do care about the endian then you need to mask and shift.
I think what you want can be solved using multi-dimensional space filling curves. The link gives a lot of references on this, which in turn give different methods and insights. Here's a specific example of an invertible mapping. It works for any dimension N.
As for useful properties, these mappings are related to Gray codes.
Hard to say whether this was what you were looking for, or whether the "pack 3 16-bit ints into a 48-bit int" does the trick for you.