I want to create a struct in swift that has a small fixed number of values (say 16 floats) as instance data. It is required that this struct not store these values on the heap, so that the address of an instance of the struct is the address of the instance vars. It is also a requirement that these values be accessible internally to the struct via subscript, as Arrays are.
In C you would simply define this kind of thing thusly:
struct Matrix4x4 {
float elements[16];
...
} myMatrix;
With this code, sizeof(Matrix4x4) == 64 and also &myMatrix == &myMatrix.elements[0]; In swift, if I analogously define the elements variable as type [Float], the matrix instance only contains a pointer to the array, since the Array<Float> instance is an object stored on the heap.
Is there a way in swift to get static allocation of the instance vars without abandoning the convenience and efficiency of array-like subscripting access?
At present, this is not possible in "pure Swift". There is a long discussion
on the swift-evolution mailing list starting at
[swift-evolution] Proposal: Contiguous Variables (A.K.A. Fixed Sized Array Type)
which asks for such a feature, e.g. to pass a matrix structure to C functions.
As far as I can see, the suggestion was well-received, but nothing concrete is planned
as of now, and it is not listed in the
currently active Swift proposals.
A C array
float elements[16];
is imported to Swift as a tuple with 16 components:
public var elements: (Float, Float, Float, Float, Float, Float, Float, Float, Float, Float, Float, Float, Float, Float, Float, Float)
and at present this seems to be the only way to define a fixed-sized structure with a given memory layout.
Joe Groff from Apple writes at
[swift-users] Mapping C semantics to Swift
Swift structs have unspecified layout. If you depend on a specific layout, you should define the struct in C and import it into Swift for now.
and later in that discussion:
You can leave the struct defined in C and import it into Swift. Swift will respect C's layout.
If the matrix type is defined in a C header file (for the sake of simplicity I am using
a 2x2 matrix as example now)
// matrix.h:
typedef struct Matrix2x2 {
float elements[4];
} Matrix2x2;
then it is imported to Swift as
public struct Matrix2x2 {
public var elements: (Float, Float, Float, Float)
public init()
public init(elements: (Float, Float, Float, Float))
}
As mentioned above, Swift preserves the C memory layout, so that the matrix, its
elements, and the first element all have the same address:
var mat = Matrix2x2(elements: (1, 2, 3, 4))
print(sizeofValue(mat)) // 16
withUnsafePointer(&mat) { print($0) } // 0x00007fff5fbff808
withUnsafePointer(&mat.elements) { print($0) } // 0x00007fff5fbff808
withUnsafePointer(&mat.elements.0) { print($0) } // 0x00007fff5fbff808
However, tuples are not subscriptable, and that makes sense if the tuple members have
different types. There is another discussion on the swift-evolution mailing list
[swift-evolution] CollectionType on uniform tuples [forked off Contiguous Variables]
to treat "uniform tuples" as collections, which would allow subscripting.
Unfortunately, this hasn't been implemented yet.
There are some methods to access tuple members by index, e.g. using Mirror()
or withUnsafe(Mutable)Pointer().
Here is a possible solution for Swift 3 (Xcode 8), which seems to work well
and involves only little overhead. The "trick" is to define C functions which
return a pointer to the element storage:
// matrix.h:
// Constant pointer to the matrix elements:
__attribute__((swift_name("Matrix2x2.pointerToElements(self:)")))
static inline const float * _Nonnull matrix2x2PointerToElements(const Matrix2x2 * _Nonnull mat)
{
return mat->elements;
}
// Mutable pointer to the matrix elements:
__attribute__((swift_name("Matrix2x2.pointerToMutableElements(self:)")))
static inline float * _Nonnull pointerToMutableElements(Matrix2x2 * _Nonnull mat)
{
return mat->elements;
}
We need two variants to make the proper value semantics work (subscript setter requires
a variable, subscript getter works with constant or variable).
The "swift_name" attribute makes the compiler import these functions as member
functions of the Matrix2x2 type, compare
SE-0044 Import as member
Now we can define the subscript methods in Swift:
extension Matrix2x2 {
public subscript(idx: Int) -> Float {
get {
precondition(idx >= 0 && idx < 4)
return pointerToElements()[idx]
}
set(newValue) {
precondition(idx >= 0 && idx < 4)
pointerToMutableElements()[idx] = newValue
}
}
}
and everything works as expected:
// A constant matrix:
let mat = Matrix2x2(elements: (1, 2, 3, 4))
print(mat[0], mat[1], mat[2], mat[3]) // 1.0 2.0 3.0 4.0
// A variable copy:
var mat2 = mat
mat2[0] = 30.0
print(mat2) // Matrix2x2(elements: (30.0, 2.0, 3.0, 4.0))
Of course you could also define matrix-like subscript methods
public subscript(row: Int, col: Int) -> Float
in a similar manner.
As the answer above alludes, you can use combo of withUnsafeMutableBytes() and assumingMemoryBound(to:) to treat the C array as a swift array within the scope of the call.
withUnsafeMutableBytes(of: &mymatrix.elements) { rawPtr in
let floatPtr = rawPtr.baseAddress!.assumingMemoryBound(to: Float.self)
// Use the floats (with no bounds checking)
// ...
for i in 0..<10 {
floatPtr[i] = 42.0
}
}
Related
I need to call a legacy C function (from swift) that expects a 3D array of Doubles as an argument. I am fairly new to Swift and have begun converting a large ObjC and C code base written for iOS and Mac over to Swift. The C code does a lot of complex astronomical math and for which Swift is just too cumbersome. I will not convert those, but I need to use them from Swift
The C function is declared like this and the .H file is visible to swift:
void readSWEDayData(double dData[DATA_ROWS_PER_DAY][NUM_PLANET_ELEMENTS][NUM_ELEMENTS_PER_PLANET]);
The Constants used in the declaration are defined to be:
DATA_ROWS_PER_DAY = 1
NUM_PLANET_ELEMENTS = 35
NUM_ELEMENTS_PER_PLANET = 4
I am struggling with declaring the array of doubles in a way that Swift will allow to be passed to the C function. I've tried several approaches.
First Approach:
I declare the array and call it like so:
var data = Array(repeating: Double(EPHEMERIS_NA), count:Int(DATA_ROWS_PER_DAY * NUM_PLANET_ELEMENTS * NUM_ELEMENTS_PER_PLANET))
readSWEDayData(&data)
I get this error: Cannot convert value of type 'UnsafeMutablePointer' to expected argument type 'UnsafeMutablePointer<((Double, Double, Double, Double),...
Second Approach:
If I declare the array this way:
var data = [(Double, Double, Double, Double)](repeating: (EPHEMERIS_NA, EPHEMERIS_NA, EPHEMERIS_NA, EPHEMERIS_NA), count: Int(NUM_PLANET_ELEMENTS))
readSWEDayData(&data)
I get this error: Cannot convert value of type 'UnsafeMutablePointer<(Double, Double, Double, Double)>' to expected argument type 'UnsafeMutablePointer<((Double, Double, Double, Double),
So, how the heck does one declare a 3D Array in Swift of a specific size so that it can be passed to a C Function?
The function needs an UnsafeMutablePointer to a 35-tuple of things, where each of those things are 4-tuples of Doubles. Yes, C arrays translate to tuples in Swift, because Swift doesn't have fixed size arrays. You could do:
var giantTuple = (
(EPHEMERIS_NA, EPHEMERIS_NA, EPHEMERIS_NA, EPHEMERIS_NA),
(EPHEMERIS_NA, EPHEMERIS_NA, EPHEMERIS_NA, EPHEMERIS_NA),
(EPHEMERIS_NA, EPHEMERIS_NA, EPHEMERIS_NA, EPHEMERIS_NA),
// 32 more times...
)
readSWEDayData(&giantTuple)
But I don't think you'd like that. You can create an array, and use some pointer magic to convert that to a tuple, as discussed in this Swift Forums post. In fact, that post is highly relevant to your situation.
To save some typing, we can write some type aliases first:
typealias Tuple35<T> = (T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T)
typealias Double4x35 = Tuple35<(Double, Double, Double, Double)>
Then we can do:
var giantTuple = Array(repeating: (EPHEMERIS_NA, EPHEMERIS_NA, EPHEMERIS_NA, EPHEMERIS_NA), count: NUM_PLANET_ELEMENTS).withUnsafeBytes { p in
p.bindMemory(to: Double4x35.self)[0]
}
readSWEDayData(&giantTuple)
This works because tuples and arrays have essentially the same "layout" in memory.
Note that I "cheated" a little bit here, since DATA_ROWS_PER_DAY is 1, you can just create one such giantTuple, and get a pointer to it. However, if it is greater than 1, you'd have to do something like:
var giantTuples = Array(repeating:
Array(repeating: (EPHEMERIS_NA, EPHEMERIS_NA, EPHEMERIS_NA, EPHEMERIS_NA), count: NUM_PLANET_ELEMENTS).withUnsafeBytes { p in
p.bindMemory(to: Double4x35.self)[0]
},
count: DATA_ROWS_PER_DAY)
readSWEDayData(&giantTuples)
To convert from the giant tuple back to an array, you can do something like this:
// converting the first giantTuples in "giantTuples" as an example
let arrayOf4Tuples = asCollection(giantTuples[0], Array.init)
let finalArray = arrayOf4Tuples.map { asCollection($0, Array.init) }
// these are adapted from the Swift forum thread
// you'll need two of these, because you have 2 types of tuples
// yes, working with C arrays is hard :(
func asCollection<T, E>(_ tuple: Tuple35<E>, _ perform: (UnsafeBufferPointer<E>)->T) -> T {
return withUnsafeBytes(of: tuple) { ptr in
let buffer = ptr.bindMemory(to: (E.self))
return perform(buffer)
}
}
func asCollection<T, E>(_ tuple: (E, E, E, E), _ perform: (UnsafeBufferPointer<E>)->T) -> T {
return withUnsafeBytes(of: tuple) { ptr in
let buffer = ptr.bindMemory(to: (E.self))
return perform(buffer)
}
}
Because Swift 5 lacks support for interoperability with C language multi-dimensional Arrays of fixed size except via tuples of explicitly declared structure (See Sweeper's answer above) and which is something I wish to avoid to keep my code flexible for future changes to the C Library being used, I opted to write a wrapper for the C function and make it appear to Swift as a 1 dimensional array.
This was necessary because the Constants used in the C Code change when readSWEDayData increases the array sizes to support additional elements and tuple declarations like this:
let Double4x35 = Tuple35<(Double, Double, Double, Double)>
will DEFINITELY break in a way that will be hard to find:
So my C wrapper function looks like so:
void readSWEDayDataForSwift(double *dData) {
readSWEDayData((double (*)[NUM_PLANET_ELEMENTS][NUM_ELEMENTS_PER_PLANET])dData);
}
Making it easy to call it from Swift like so:
var data = Array(repeating: Double(EPHEMERIS_NA), count:Int(DATA_ROWS_PER_DAY * NUM_PLANET_ELEMENTS * NUM_ELEMENTS_PER_PLANET))
I was surprised that this far into Swift's evolution there is no better way to do this!
My two cents for others..Hoping will help.
I got a similar problem, but hope can save time for other.
I had to pass down:
path (from String to char *)
title (from String to char *)
columns ([String] to array of char *)
a counter
to sum up I had to call "C" function:
bool OpenXLSXManager_saveIn(const char * cFullPath,
const char * sheetName,
char *const columnTitles[],
double *const values[],
int columnCount);
I started from excellent:
// https://oleb.net/blog/2016/10/swift-array-of-c-strings/
expanded a bit:
public func withArrayOfCStringsAndValues<R>(
_ args: [String],
_ values: [[Double]],
_ body: ([UnsafeMutablePointer<CChar>?] , [UnsafeMutablePointer<Double>?] ) -> R ) -> R {
var cStrings = args.map { strdup($0) }
cStrings.append(nil)
let cValuesArrr = values.map { (numbers: [Double]) -> UnsafeMutablePointer<Double> in
let pointer = UnsafeMutablePointer<Double>.allocate(capacity: numbers.count)
for (index, value) in numbers.enumerated() {
pointer.advanced(by: index).pointee = value
}
return pointer
}
defer {
cStrings.forEach { free($0) }
for pointer in cValuesArrr{
pointer.deallocate()
}
}
return body(cStrings, cValuesArrr)
}
so I can call:
func passDown(
filePath: String,
sheetName:
String,
colNames: [String],
values: [[Double]]
) -> Bool
{
let columnCount = Int32(colNames.count)
return withArrayOfCStringsAndValues(colNames, values) {
columnTitles, values in
let retval = OpenXLSXManager_saveIn(filePath, sheetName, columnTitles, values, columnCount)
return retval
}
}
(SORRY for formatting, S.O. formatter has BIG issues ..)
Many GSl functions take arguments as doubles or arrays of doubles. However much of my data is nested in arrays of structs instead. Say like arrays of:
struct A
{
double a;
int b;
};
I could write a wrapper that copies the data into an array of pure doubles or ints. But I was interested in something more elegant to get around this.
Not the answer you want. But since you cant change the GSL interface, if you are looking for performance, I think your best solution is probably to chose data structures that matches the job from the start. So maybe something like a struct containing arrays of doubles.
If both the GSL interface and your original data structure is out of your control, then your only option is probably going to be the wrapper that you are thinking about.
If the library functions that you are using could take a 'stride' argument, you could possibly look into structure packing and padding. (But that still wouldn't convert your ints to doubles.)
"...much of my data is nested in arrays of structs instead. ... I
could write a wrapper that copies the data into an array of pure
doubles or ints. But I was interested in something more elegant to get
around this."
There is no need to write a wrapper to copy the data into an an array of pure double or int. The fact that you have an array-of-struct already provides convenient direct access to every stored value. With an array-of-struct accessing each individual struct within the array is a simple matter of indexing the struct you want, e.g. array[n] where n is the wanted element within the array.
In your example array[n].a provides direct access to the double value in member a and array[n].b provides direct access to the int member b for each of the valid index within your array.
A short example of this indexing for direct access to each member of each struct within the array may help. The following initializes array with five struct with the double and int values shown. The int values are then incremented by 1 within the loop before each member of each struct is output, e.g.
#include <stdio.h>
typedef struct A { /* struct A (with a typedef for convenience) */
double a;
int b;
} A;
int main (void) {
/* array of struct A */
A array[] = {{1.1, 1}, {2.2, 2}, {3.3, 3}, {4.4, 4}, {5.5, 5}};
size_t nelem = sizeof array / sizeof *array; /* no. elements */
for (size_t i = 0; i < nelem; i++) {
array[i].b++; /* increment int/output stored values */
printf ("array[%zu]: {%3.1f, %d}\n", i, array[i].a, array[i].b);
}
}
Example Use/Output
Note how the integer value stored within each struc in the array-of-struct is incremented by 1 before the values in each struct with the array is directly used as the parameter being output by printf:
$ ./bin/arraystruct
array[0]: {1.1, 2}
array[1]: {2.2, 3}
array[2]: {3.3, 4}
array[3]: {4.4, 5}
array[4]: {5.5, 6}
Your access of each member regardless how you want to use it would be the same. Look things over and let me know if you have further questions.
This is a 2 part question. To give some background, I have a C code as follows:
int c_func(const char* dir, float a, float b, float c, float d )
{
printf("%s\n", dir);
printf("%f\n",a);
printf("%f\n",b);
printf("%f\n",c);
printf("%f\n",d);
return 0;
}
This is a simple function that takes in a string and 4 floats as arguments and prints them out I am trying to test my phython/C interface. My python code is as follows:
calling_function = ctypes.CDLL("/home/ruven/Documents/Sonar/C interface/Interface.so")
calling_function.c_func("hello",1, 2, 3, 4])
Now since this works, instead of passing 4 individual floats, I would like to pass in a list of 4 floats. I have tried different code online to edit my C function so that it takes in a list as one of its parameters but I cant seem to figure out how to do so as I am a new programmer and I am not experienced with C.
Question 1: How do I code a C function to accept a list as its arguments?
Question 2: This list of four floats is actually coming from a list of lists from my python code. After coding the C function would I be able to use a numpy array called testfv2[0,:] as an input of the C function?testfv2[0,:]is a list of dimensions 1x4 and testfv2 is a list of dimensions 117x4. For now, I would like to into the C function 1 row at a time which is why I thought using testfv2[0,:].
How do I code a C function to accept a list as its arguments?
Short answer, you can't.
Long answer: C does not have lists, but has arrays and pointers.
You have several options then:
int c_func(const char *dir, float abcd[4]) { // using an array with explicit size
int c_func(const char *dir, float abcd[]) { // Using an array (will decay to a pointer at compile-time)
int c_func(const char *dir, float *abcd) { // Using a pointer.
If you will only ever receive 4 floats, I'd suggest the first form, which enforces the size of the array, any user (and mainly your future self) of your function will know to give only an array of four elements.
Calling your function from Python:
floats = [1.0, 2.0, 3.0, 4.0] # Or whatever values you want to pass, but be sure to give only 4
FloatArray4 = ctypes.c_float * 4 # Define a 4-length array of floats
parameter_array = FloatArray4(*floats) # Define the actual array to pass to your C function
I don't know if passing more than 4 floats to FloatArray4 raises an error -- I guess so, but I can't check right now.
As for your second question, if you want dynamic sized arrays (more than 4 elements), you'll have to you one of the other two profiles for your C function, in which case I advise you to put an extra int argument for the size of the array:
int c_func(const char *dir, float floats[], int size) {
or
int c_func(const char *dir, float *floats, int size) {
You can also use the standard size_t instead of int, it's designed just for that.
I you want to pass a multidimensional array, you add another pair of brackets:
int c_func(const char *dir, float floats[][4]) { // Note the explicit size for second dimension
but remember that for a C array, for all dimensions but the first, the dimensions must be explicitly specified. If the value is constant it wont be an issue, however if not you will have to use a pointer instead:
int c_func(const char *dir, float *floats[]) {
or
int c_func(const char *dir, float **floats) {
which are two identical syntaxs (the array will decay to a pointer). Once again, if your dimensions are dynamic, I suggest to add parameters for the size.
If you want supplementary dimensions, just repeat that last step.
I'm trying to pass an three dimensional array to a function like this:
void example( double*** bar ) {
// Stuff
}
int main() {
double[3][2][3] foo;
// Initialize foo
example( foo );
return 0;
}
This causes the gcc to give me "Invalid pointer type". How am I supposed to be doing this? I could just make the entire argument a one-dimensional array and arrange my data to fit with that, but is there a more elegant solution to this?
edit:
In addition, I can't always specify the length of each sub-array, because they may be different sizes. e.g.:
int* foo[] = { { 3, 2, 1 }, { 2, 1 }, { 1 } };
If it helps at all, I'm trying to batch pass inputs for Neurons in a Neural Network. Each Neuron has a different number of inputs.
just use double*. A multidimensional array is stored contiguously in memory so you are quite welcome to give it your own stride. This is how bitmaps are passed on OpenGL.
A one-dimensional int array decays into an int pointer when passing it to a function. A multi-dimensional array decays into a pointer to an array of the next lowest dimension, which is
void example(double (*bar)[2][3]);
This syntax can be a bit baffling, so you might chose the equivalent syntax:
void example(double bar[][2][3]) {
// Stuff
}
int main() {
double foo[3][2][3];
example(foo);
return 0;
}
The first dimension does not have to be given, it's that part that is "decaying". (Note that the dimensions of arrays are not given on the type as in Java, but on the array name.)
This syntax works for variable-length arrays (VLAs) as well, as long as you pass the dimensions before the array:
void example(int x, int y, double (*bar)[x][y]) {
// Stuff
}
int main() {
double foo[3][2][3];
example(2, 3, foo);
return 0;
}
This feature requires C99 and is not compatible with C++.
If the array size is fixed, you can use:
void example(double bar[][2][3]) {
}
Otherwise, you can pass the size along with the array into the function:
void example(size_t x, size_t y, size_t z, double bar[x][y][z]) {
}
That can't be done in C the way you're thinking of. If you need a function that operates on variable-size multidimensional arrays, you'll either have to pass the sizes (all but one) explicitly to the function, or make a structure and pass that. I generally always make a structure when a 2D or 3D array is called for, even if they're of fixed size. I think it's just cleaner that way, since the structure documents itself.
I want to store mixed data types in an array. How could one do that?
You can make the array elements a discriminated union, aka tagged union.
struct {
enum { is_int, is_float, is_char } type;
union {
int ival;
float fval;
char cval;
} val;
} my_array[10];
The type member is used to hold the choice of which member of the union is should be used for each array element. So if you want to store an int in the first element, you would do:
my_array[0].type = is_int;
my_array[0].val.ival = 3;
When you want to access an element of the array, you must first check the type, then use the corresponding member of the union. A switch statement is useful:
switch (my_array[n].type) {
case is_int:
// Do stuff for integer, using my_array[n].ival
break;
case is_float:
// Do stuff for float, using my_array[n].fval
break;
case is_char:
// Do stuff for char, using my_array[n].cvar
break;
default:
// Report an error, this shouldn't happen
}
It's left up to the programmer to ensure that the type member always corresponds to the last value stored in the union.
Use a union:
union {
int ival;
float fval;
void *pval;
} array[10];
You will have to keep track of the type of each element, though.
Array elements need to have the same size, that is why it's not possible. You could work around it by creating a variant type:
#include <stdio.h>
#define SIZE 3
typedef enum __VarType {
V_INT,
V_CHAR,
V_FLOAT,
} VarType;
typedef struct __Var {
VarType type;
union {
int i;
char c;
float f;
};
} Var;
void var_init_int(Var *v, int i) {
v->type = V_INT;
v->i = i;
}
void var_init_char(Var *v, char c) {
v->type = V_CHAR;
v->c = c;
}
void var_init_float(Var *v, float f) {
v->type = V_FLOAT;
v->f = f;
}
int main(int argc, char **argv) {
Var v[SIZE];
int i;
var_init_int(&v[0], 10);
var_init_char(&v[1], 'C');
var_init_float(&v[2], 3.14);
for( i = 0 ; i < SIZE ; i++ ) {
switch( v[i].type ) {
case V_INT : printf("INT %d\n", v[i].i); break;
case V_CHAR : printf("CHAR %c\n", v[i].c); break;
case V_FLOAT: printf("FLOAT %f\n", v[i].f); break;
}
}
return 0;
}
The size of the element of the union is the size of the largest element, 4.
There's a different style of defining the tag-union (by whatever name) that IMO make it much nicer to use, by removing the internal union. This is the style used in the X Window System for things like Events.
The example in Barmar's answer gives the name val to the internal union. The example in Sp.'s answer uses an anonymous union to avoid having to specify the .val. every time you access the variant record. Unfortunately "anonymous" internal structs and unions is not available in C89 or C99. It's a compiler extension, and therefore inherently non-portable.
A better way IMO is to invert the whole definition. Make each data type its own struct, and put the tag (type specifier) into each struct.
typedef struct {
int tag;
int val;
} integer;
typedef struct {
int tag;
float val;
} real;
Then you wrap these in a top-level union.
typedef union {
int tag;
integer int_;
real real_;
} record;
enum types { INVALID, INT, REAL };
Now it may appear that we're repeating ourselves, and we are. But consider that this definition is likely to be isolated to a single file. But we've eliminated the noise of specifiying the intermediate .val. before you get to the data.
record i;
i.tag = INT;
i.int_.val = 12;
record r;
r.tag = REAL;
r.real_.val = 57.0;
Instead, it goes at the end, where it's less obnoxious. :D
Another thing this allows is a form of inheritance. Edit: this part is not standard C, but uses a GNU extension.
if (r.tag == INT) {
integer x = r;
x.val = 36;
} else if (r.tag == REAL) {
real x = r;
x.val = 25.0;
}
integer g = { INT, 100 };
record rg = g;
Up-casting and down-casting.
Edit: One gotcha to be aware of is if you're constructing one of these with C99 designated initializers. All member initializers should be through the same union member.
record problem = { .tag = INT, .int_.val = 3 };
problem.tag; // may not be initialized
The .tag initializer can be ignored by an optimizing compiler, because the .int_ initializer that follows aliases the same data area. Even though we know the layout (!), and it should be ok. No, it ain't. Use the "internal" tag instead (it overlays the outer tag, just like we want, but doesn't confuse the compiler).
record not_a_problem = { .int_.tag = INT, .int_.val = 3 };
not_a_problem.tag; // == INT
You can do a void * array, with a separated array of size_t. But you lose the information type.
If you need to keep information type in some way keep a third array of int (where the int is an enumerated value) Then code the function that casts depending on the enum value.
Union is the standard way to go. But you have other solutions as well. One of those is tagged pointer, which involves storing more information in the "free" bits of a pointer.
Depending on architectures you can use the low or high bits, but the safest and most portable way is using the unused low bits by taking the advantage of aligned memory. For example in 32-bit and 64-bit systems, pointers to int must be multiples of 4 (assuming int is a 32-bit type) and the 2 least significant bits must be 0, hence you can use them to store the type of your values. Of course you need to clear the tag bits before dereferencing the pointer. For example if your data type is limited to 4 different types then you can use it like below
void* tp; // tagged pointer
enum { is_int, is_double, is_char_p, is_char } type;
// ...
uintptr_t addr = (uintptr_t)tp & ~0x03; // clear the 2 low bits in the pointer
switch ((uintptr_t)tp & 0x03) // check the tag (2 low bits) for the type
{
case is_int: // data is int
printf("%d\n", *((int*)addr));
break;
case is_double: // data is double
printf("%f\n", *((double*)addr));
break;
case is_char_p: // data is char*
printf("%s\n", (char*)addr);
break;
case is_char: // data is char
printf("%c\n", *((char*)addr));
break;
}
If you can make sure that the data is 8-byte aligned (like for pointers in 64-bit systems, or long long and uint64_t...), you'll have one more bit for the tag.
This has one disadvantage that you'll need more memory if the data have not been stored in a variable elsewhere. Therefore in case the type and range of your data is limited, you can store the values directly in the pointer. This technique has been used in the 32-bit version of Chrome's V8 engine, where it checks the least significant bit of the address to see if that's a pointer to another object (like double, big integers, string or some object) or a 31-bit signed value (called smi - small integer). If it's an int, Chrome simply does an arithmetic right shift 1 bit to get the value, otherwise the pointer is dereferenced.
On most current 64-bit systems the virtual address space is still much narrower than 64 bits, hence the high most significant bits can also be used as tags. Depending on the architecture you have different ways to use those as tags. ARM, 68k and many others can be configured to ignore the top bits, allowing you to use them freely without worrying about segfault or anything. From the linked Wikipedia article above:
A significant example of the use of tagged pointers is the Objective-C runtime on iOS 7 on ARM64, notably used on the iPhone 5S. In iOS 7, virtual addresses are 33 bits (byte-aligned), so word-aligned addresses only use 30 bits (3 least significant bits are 0), leaving 34 bits for tags. Objective-C class pointers are word-aligned, and the tag fields are used for many purposes, such as storing a reference count and whether the object has a destructor.
Early versions of MacOS used tagged addresses called Handles to store references to data objects. The high bits of the address indicated whether the data object was locked, purgeable, and/or originated from a resource file, respectively. This caused compatibility problems when MacOS addressing advanced from 24 bits to 32 bits in System 7.
https://en.wikipedia.org/wiki/Tagged_pointer#Examples
On x86_64 you can still use the high bits as tags with care. Of course you don't need to use all those 16 bits and can leave out some bits for future proof
In prior versions of Mozilla Firefox they also use small integer optimizations like V8, with the 3 low bits used to store the type (int, string, object... etc.). But since JägerMonkey they took another path (Mozilla’s New JavaScript Value Representation, backup link). The value is now always stored in a 64-bit double precision variable. When the double is a normalized one, it can be used directly in calculations. However if the high 16 bits of it are all 1s, which denote an NaN, the low 32-bits will store the address (in a 32-bit computer) to the value or the value directly, the remaining 16-bits will be used to store the type. This technique is called NaN-boxing or nun-boxing. It's also used in 64-bit WebKit's JavaScriptCore and Mozilla's SpiderMonkey with the pointer being stored in the low 48 bits. If your main data type is floating-point, this is the best solution and delivers very good performance.
Read more about the above techniques: https://wingolog.org/archives/2011/05/18/value-representation-in-javascript-implementations