I have a block of memory, which contains data, how can i access the data, even if it is misaligned, platform indepently? The data is mostly 8 and 32 bit values.
If you want complete platform independence, declare an unsigned char * to point to the memory, pick up bytes, and use '|' and '<<' as needed to assemble the values.
You can memcpy to an aligned one:
void f(void *var_16bit_alignment) {
uint16_t v;
memcpy(&v, var_16bit_alignment, sizeof(v);
// access to v are aligned now!
}
Note that use of uint16_t * as a type for the parameter is already undefined behavior, and that the type of v can be adjusted for your needs (uint32_t, ...). Note also that this does not deal with endianess, but it easy expand with ntohs or similar.
For completeness, 8 bit data is already aligned (unless you are using a very weird platform).
Related
I'm working with vectors and matrices right now and it was suggested to me that I should use SSE instead of using float arrays. However while reading the definition for the C intrinsics and the Assembly instructions it looks like there is a different version of some of the function where the vector has to be "16 byte aligned" and a slower version where the vector isn't aligned. What does having the vector be 16 byte aligned mean? How can I ensure that my vectors are 16 byte aligned?
Alignment ensures that objects are aligned on an address that is a multiple of some power of two. 16-byte-aligned means that the numeric value of the address is a multiple of 16. Alignment is important because CPUs are often less efficient or downright incapable of loading memory that doesn't have the required alignment.
Your ABI determines the natural alignment of types. In general, integer types and floating-point types are aligned to either their own size, or the size of the largest object of that kind that your CPU can treat at once, whichever is smaller. For instance, on 64-bit Intel machines, 32-bit integers are aligned on 4 bytes, 64-bit integers are aligned on 8 bytes, and 128-bit integers are also aligned on 8 bytes.
The alignment of structures and unions is the same as their most aligned field. This means that if your struct contains a field that has a 2-byte alignment and another field that has an 8-byte alignment, the structure will be aligned to 8 bytes.
In C++, you can use the alignof operator, just like the sizeof operator, to get the alignment of a type. In C, the same construct becomes available when you include <stdalign.h>; alternatively, you can use _Alignof without including anything.
AFAIK, there is no standard way to force alignment to be specific value in C or C++, but there are compiler-specific extensions to do it. On Clang and GCC, you can use the __attribute__((aligned(N))) attribute:
struct s_Stuff {
int var1;
short var2;
char padding[10];
} __attribute__((aligned(16)));
(Example.)
(This attribute is not to be confused with __attribute__((align(N))), which sets the alignment of a variable.)
Off the top of my head, I'm not sure for Visual Studio, but according to SoronelHaetir, that would be __declspec(align(N)). Not sure where it goes on the struct declaration.
In the context of vector instructions, alignment is important because people tend to create arrays of floating-point values and operate on them, instead of using types that are known to be aligned. However, __m128, __m256 and __m512 (and all of their variants, like _m128i and such) from <emmintrin.h>, if your compiler environment has it, are guaranteed to be aligned on the proper boundaries for use with aligned intrinsics.
Depending on your platform, malloc may or may not return memory that is aligned on the correct boundary for vector objects. aligned_alloc was introduced in C11 to address these issues, but not all platforms support it.
Apple: does not support aligned_alloc; malloc returns objects on the most exigent alignment that the platform supports;
Windows: does not support aligned_alloc; malloc returns objects aligned on the largest alignment that VC++ will naturally put an object on without an alignment specification; use _aligned_malloc for vector types
Linux: malloc returns objects aligned on an 8- or 16-byte boundary; use aligned_alloc.
In general, it's possible to request slightly more memory and perform alignment yourself with minimal penalties (aside that you're on your own to write a free-like function that will accept a pointer returned by this function):
void* aligned_malloc(size_t size, size_t alignment) {
intptr_t alignment_mask = alignment - 1;
void* memory = malloc(size + alignment_mask);
intptr_t unaligned_ptr = (intptr_t)memory;
intptr_t aligned_ptr = (unaligned_ptr + alignment_mask) & ~alignment_mask;
return (void*)aligned_ptr;
}
Purists might argue that treating pointers as integers is evil, but at the time of writing, they probably won't have a practical cross-platform solution to offer in exchange.
xx-byte alignment means that a the variable's memory address modulo xx is 0.
Ensuring that is a compiler-specific operation, visual c++ for example has __declspec(align(...)), which will work for variables that the compiler allocates (at file or function scope for example), alignment is somewhat harder for dynamic memory, you can use aligned_malloc for that, although your library may already guarantee 16-byte alignment for malloc, it's generally larger alignments that require such a call.
New Edit to improve and focus my answer to the specific query
To ensure data alignment in memory, there are specific functions in C to force this (assuming your data is compatible - where your data matches or discretely fits into your required alignment)
The function to use is [_aligned_malloc][1] instead of vanilla malloc.
// Using _aligned_malloc
// Note alignment should be 2^N where N is any positive int.
int alignment = 16;
ptr = _aligned_malloc('required_size', alignment);
if (ptr == NULL)
{
printf_s( "Error allocation aligned memory.");
return -1;
}
This will (if it succeeds) force your data to align on the 16 byte boundary and should satisfy the requirements for SSE.
Older answer where I waffle on about struct member alignment, which matters - but is not directly answering the query
To ensure struct member byte alignment, you can be careful how you arrange members in your structs (largest first), or you can set this (to some degree) in your compiler settings, member attributes or struct attributes.
Assuming 32 bit machine, 4 byte ints: This is still 4 byte aligned in memory (first largest member is 4 bytes), but padded to be 16 bytes in size.
struct s_Stuff {
int var1; /* 4 bytes */
short var2; /* 2 bytes */
char padding[10]; /* ensure totals struct size is 16 */
}
The compiler usually pads each member to assist with natural alignment, but the padding may be at the end of the struct too. This is struct member data alignment.
Older compiler struct member alignment settings could look similar to these 2 images below...But this is different to data alignment which relates to memory allocation and storage of the data.
It confuses me when Borland uses the phrase (from the images) Data Alignment, and MS uses Struct member alignment. (Although they both refer to specifically struct member alignment)
To maximise efficiency, you need to code for your hardware (or vector processing in this case), so lets assume 32 bit, 4 byte ints, etc. Then you want to use tight structs to save space, but padded structs may improve speed.
struct s_Stuff {
float f1; /* 4 bytes */
float f2; /* 4 bytes */
float f3; /* 4 bytes */
short var2; /* 2 bytes */
}
This struct may be padded to also align the struct members to 4 byte multiples....The compiler will do this unless you specify that it keeps single byte struct member alignment - so the size ON FILE could be 14 bytes, but still in MEMORY an array of this struct would be 16 bytes in size (with 2 bytes wasted), with an unknown data alignment (possibly 8 bytes as default by malloc but not guaranteed. As mentioned above you can force the data alignment in memory with _aligned_malloc on some platforms)
Also regarding member alignment in a struct, the compiler will use multiples of the largest member to set the alignment. Or more specifically:
A struct is always aligned to the largest type’s alignment
requirements
...from here
If you are using a UNION, you are correct that it is forced to the largest possible struct see here
Check that your compiler settings do not contradict your desired struct member alignment / padding too, or else your structs may differ in size to what you expect.
Now, why is it faster? See here which explains how alignment allows the hardware to transmit discrete chunks of data and maximises the use of the hardware that passes around data. That is, the data does not need to be split up or re-arranged at every stage - through the hardware processing
As a rule, its best to set your compiler to resonate with your hardware (and platform OS) so that your alignment (and padding) works best with your hardware processing ability. 32 bit machines usually work best with 4 byte (32 bit) member alignment, but then data written to file with 4 byte member alignment can consume more space than wanted.
Specifically regarding SSE vectors, as this link states, 4 * 4 bytes is they best way to ensure 16 byte alignment, perhaps like this. (And they refer to data alignment here)
struct s_data {
float array[4];
}
or simply an array of floats, or doubles.
I have a function get_picture() that takes a picture. It returns a pointer of type uint8_t (where the pciture is stored) and takes a pointer to a variable
that stores the length of the picture.
Here is the declaration:
uint8_t * get_picture(int *piclength)
Here I call it in main():
unsigned int address, value;
address = (unsigned int)get_picture((int*)& value);
My question is - becuase address is storing an address (which is positive) should I actually define it as an int.
I'm not sure you understand pointers.
If your function returns a uint8_t * then you should be storing it in uint8_t * not an int.
As an example:
uint8_t* get_picture(int* piclength);
int piclength;
uint8_t* address;
address = get_picture(&piclength);
If you really want to convert a data-pointer to an integer, use the dedicated typedef instead of some random (and possibly too small) type:
uintptr_t / intptr_t (Optional typedefs in <stdint.h>)
Still, the need is rare, and I don't see it here.
It depends on what you are really after. Your code is fine as is if you want address to contain the address of where that picture lives. Likewise you could use an int, since bits is bits, int is the same number of bits as unsigned int and whatever consumes address can be fed those bits. It makes more sense as a human to think of addresses as unsigned, but the compiler and hardware don't care, bits is bits.
But depending on what you are doing you may want to as mentioned already, preserve this address using a pointer of the same type. See Dragan's answer.
If you want to "see" the address then it depends on how you want to see it, converting it to an unsigned int is one easy and generic way to do it.
Yes, this is very system dependent and the size of int varies by toolchain and target and may or may not completely hold an address for that system, so some masking may be required by the consumer of that variable.
So your code is fine, I think I understand the question. Signed or unsigned is in the eye of the beholder, it is only unsigned or signed for particular specific operations. Addresses are not themselves signed nor unsigned, they are just bits on an address bus. For a sane compiler unsigned int and int are the same size, store the same number of bits so long as this compiler defines them as at least the size of the address that this compiler uses for a pointer, then this will work just fine with int or unsigned int. Int feels a little wrong, unsigned int feels right, but those are human emotions. The hardware doesn't care, so long as the bits dont change on their way to the address bus. Now if for some reason the code we don't see prints this variable as a decimal for example printf("%d\n",address); (why would you printf on a microcontroller?) then it may look strange to humans but will still be the right decimal interpretation of the bit pattern than is the address. printf("0x%X\n",address); would make more sense and be more generic. if your printf supports it you could just printf("%p",address); using Dragan's uint8_t * address declaration, which is what many folks here are probably thinking based on classical C training. vs bits are bits and have no meaning whatsoever to the hardware until used, and only for that use case, an address is only an address on the address bus, when doing math on it to compute another address it is not an address it is a bit pattern being fed into the alu, signed or unsigned might depend on the operation (add and subtract dont know signed from unsigned, multiply and divide do).
If you choose to not to use uint8_t * address as a declaration, then unsigned int "feels" better, less likely to mess you up (if you have enough bits in an (unsigned) int for that compiler to store an address in the first place). A signed int feels a little wrong, but technically should work. My rule is only use signed when you specifically need signed, otherwise use unsigned everywhere else, saves on a lot of bugs. Unfortunately traditionally C libraries do it the other way around, making a big mess before the stdint.h stuff came about.
I am working on an 8051 platform which has a 16 bit pointer width.
I have a common code module for handling flash emulation and there's a function that returns the 16 bit start address of a page:
volatile u16_t start_address = find_start_address_of_page( page );
I think want to pass this 'address' to a CRC function that wants a u8_t* as a parameter so I cast it in the function call like so:
(u8_t *)start_address
This generates the warning
Warning[Pe1053]: conversion from integer to smaller pointer
Which confuses me a bit, because a u8_t* is 16 bits wide, and my variable is a 16 bit variable.. Is it simply that the compiler is warning about an "integer to pointer" conversion in general?
The code works fine, I just want to be sure I'm not missing something silly here...
You write that your 8051 platform has a 16 bit pointer width.
As far as I know, the 8051 has different address ranges for
- the internal RAM in the processor (max 256 Byte)
- external RAM (max 64k)
- Program Memory (max 64k)
The compiler I have worked with (Keil) therefore had at least four different pointer types.
An 8 bit wide 'data' pointer for the internal RAM.
A 16 bit wide 'xdata' pointer for the external RAM.
A 16 bit wide 'code' pointer for the Program Memory.
A 24 bit wide universal pointer which could be set to point to any of the three memory types. The first byte was used to select the memory type.
The warning text could mean that the compiler wants to convert your 16 bit value to an address in internal RAM which is only 8 bits wide.
If you want to silence your warning you could use a union to move your information into a different type, i.e.
union {
u16_t origType;
u8_t *newtype;
} u;
u.origType = start_address;
Assuming they are the same size you can then pass u.newtype into you function.
Since start_address is a variable that holds a memory address, you should declare and use it as such, meaning a pointer:
volatile u16_t *start_address = find_start_address_of_page( page );
Of course this also means that your function find_start_address_of_page(); has to return a pointer.
By the way, the fact that int and int * are both 16 bits wide (on your processor) is not enough. For example, a pointer to an int on most (all?) 16 bit processors has to be aligned to an even (multiple of 2) address, because of limitations of the assembler instructions and/or the data bus implementation.
In the same way, things like start_address++; increments differently depending if it is an int or an int * (or even a char *). If it is an int (or a char *) it will increment by one, but if it is an int * it will increment by two.
By this I'm trying to show that the compiler makes a lot of checks beyond the number of bits, depending on the type of variable (and the processor abilities).
start_address is of type u16_t, and not a pointer.
if you want to pass his address to CRC, then try this:
((u8_t *)((u16_t *)start_address))
Let's consider that I have defined a memory area like (Note: uint8 means unsigned char):
uint8 myMemoryArea[1024];
And I have a struct like:
typedef struct
{
uint8 * ptrToMyVar;
uint8 otherVar;
} myStruct_type;
I want to consider myMemoryArea as being an array of myStruct_type, so I would want to perform a random access to the memory area like, for example:
myStruct_type * myPtrToStruct = (* myStruct_type)(&(myMemoryArea[ELEMENT_TO_ACCESS * sizeof(myStruct_type)]));
myPtrToStruct->otherVar = 2;
Is this machine independent code? Should I expect troubles with alignment or padding?
I guess padding is OK here as long as I use sizeof.
Should I ensure that myMemoryArea starts from an address divisible by sizeof(* char) - perhaps defining it as an array of pointers ?
There's no guarantee that myMemoryArea will be appropriately aligned. Depending on your CPU and O/S and compiler, you may get crashes or very slow access to misaligned data. (See also: Solve the memory alignment in C interview question that stumped me).
Consider what happens if your variable is declared in this context:
double d1;
uint8 c1;
uint8 myMemoryArea[1024];
uint8 c2;
douebl d2;
There's every reason to expect d1 to be properly aligned; the compiler will be failing you horribly if it is not. There's no reason to expect any unusual treatment for c1; a single byte can be stored on any alignment. The myMemoryArea data also does not have to be aligned specially; there might be no space around it, and it may well be at an odd address. The c2 variable doesn't need special treatment; d2 will be properly aligned (and there's likely to be 6 bytes unused space in the data.
If myMemoryArea is on an odd-byte alignment, and you use a RISC machine to access the memory structure, you will most likely get a SIGBUS error. On an Intel machine, you may get very slow access instead.
I wonder why bitfields work with unions/structs but not with a normal variable like int or short.
This works:
struct foo {
int bar : 10;
};
But this fails:
int bar : 10; // "Expected ';' at end of declaration"
Why is this feature only available in unions/structs and not with variables? Isn't it technical the same?
Edit:
If it would be allowed you could make a variable with 3 bytes for instance without using the struct/union member each time. This is how I would to it with a struct:
struct int24_t {
int x : 24 __attribute__((packed));
};
struct int24_t var; // sizeof(var) is now 3
// access the value would be easier:
var.x = 123;
This is a subjective question, "Why does the spec say this?" But I'll give it my shot.
Variables in a function normally have "automatic" storage, as opposed to one of the other durations (static duration, thread duration, and allocated duration).
In a struct, you are explicitly defining the memory layout of some object. But in a function, the compiler automatically allocates storage in some unspecified manner to your variables. Here's a question: how many bytes does x take up on the stack?
// sizeof(unsigned) == 4
unsigned x;
It could take up 4 bytes, or it could take up 8, or 12, or 0, or it could get placed in three different registers at the same time, or the stack and a register, or it could get four places on the stack.
The point is that the compiler is doing the allocation for you. Since you are not doing the layout of the stack, you should not specify the bit widths.
Extended discussion: Bitfields are actually a bit special. The spec states that adjacent bitfields get packed into the same storage unit. Bitfields are not actually objects.
You cannot sizeof() a bit field.
You cannot malloc() a bit field.
You cannot &addressof a bit field.
All of these things you can do with objects in C, but not with bitfields. Bitfields are a special thing made just for structures and nowhere else.
About int24_t (updated): It works on some architectures, but not others. It is not even slightly portable.
typedef struct {
int x : 24 __attribute__((packed));
} int24_t;
On Linux ELF/x64, OS X/x86, OS X/x64, sizeof(int24_t) == 3. But on OS X/PowerPC, sizeof(int24_t) == 4.
Note the code GCC generates for loading int24_t is basically equivalent to this:
int result = (((char *) ptr)[0] << 16) |
(((unsigned char *) ptr)[1] << 8) |
((unsigned char *)ptr)[2];
It's 9 instructions on x64, just to load a single value.
Members of a structure or union have relationships between their storage location. A compiler cannot reorder or pack them in clever ways to save space due to strict constraints on the layout; basically the only freedom a compiler has in laying out structures is the freedom to add extra padding beyond the amount that's needed for alignment. Bitfields allow you to manually give the compiler more freedom to pack information tightly by promising that (1) you don't need the address of these members, and (2) you don't need to store values outside a certain limited range.
If you're talking about individual variables rather than structure members, in the abstract machine they have no relationship between their storage locations. If they're local automatic variables in a function and their addresses are never taken, the compiler is free to keep them in registers or pack them in memory however it likes. There would be little or no benefit to providing such hints to the compiler manually.
Because it's not meaningful. Bitfield declarations are used to share and reorganize bits between fields of a struct. If you have no members, just a single variable, that is of constant size (which is implementation-defined), For example, it's a contradiction to declare a char, which is almost certainly 8 bits wide, as a one or twelwe bit variable.
If one has a struct QBLOB which contains combines four 2-bit bitfields into a single byte, every time that struct is used will represent a savings of three bytes as compared with a struct that simply contained four fields of type unsigned char. If one declares an array QBLOB myArray[1000000], such an array will take only 1,000,000 bytes; if QBLOB had been a struct with four unsigned char fields, it would have needed 3,000,000 bytes more. Thus, the ability to use bitfields may represent a big memory savings.
By contrast, on most architectures, declaring a simple variable to be of an optimally-sized bitfield type could save at most 15 bits as compared with declaring it to be the smallest suitable standard integral type. Since accessing bitfields generally requires more code than accessing variables of standard integral types, there are few cases where declaring individual variables as bit fields would offer any advantage.
There is one notable exception to this principle, though: some architectures include features which can set, clear, and test individual bits even more efficiently than they can read and write bytes. Compilers for some such architectures include a bit type, and will pack eight variables of that type into each byte of of storage. Such variables are often restricted to static or global scope, since the specialized instructions that handle them may be restricted to using certain areas of memory (the linker can ensure any such variables get placed where they have to go).
All objects must occupy one or more contiguous bytes or words, but a bitfield is not an object; it's simply a user-friendly way of masking out bits in a word. The struct containing the bitfield must occupy a whole number of bytes or words; the compiler just adds the necessary padding in case the bitfield sizes don't add up to a full word.
There's no technical reason why you couldn't extend C syntax to define bitfields outside of a struct (AFAIK), but they'd be of questionable utility for the amount of work involved.