how to split 16-value into two 8-bit values in C - c

I don't know if the question is right, but.
Example, a decimal of 25441, the binary is 110001101100001. How can i split it into two 8 bit "1100011" and "01100001"( which is "99" and "97"). However, I could only think of using bit manipulation to shift it by >>8 and i couldn't do the rest for "97". Here is my function, it's not a good one but i hope it helps:
void reversecode(int input[], char result[]) { //input is 25441
int i;
for (i = 0; i < 1; i++) {
result[i] = input[i] >> 8; // shift by 8 bit
printf("%i", result[i]); //to print result
}
}
I was thinking to use struct but i have no clue for starting it. i'm a beginenr in C, and sorry for my bad style. Thank you in prior.

The LSB is given simply by masking is out with a bit mask: input[i] & 0xFF.
The code you have posted input[i] >> 8 gives the next byte before that. However, it also gives anything that happened to be stored in the most significant bytes, in case int is 32 bits. So again you need to mask, (input[i] >> 8) & 0xFF.
Also avoid bit-shifting on signed types such as int, because if they have negative values, you invoke poorly-specified behavior which leads to bugs.
The correct way to mask out the individual bytes of an int is this:
// 16 bit system
uint8_t bytes [sizeof(int)] =
{
((uint16_t)i >> 0) & 0xFF, // shift by 0 not needed, of course, just stylistic
((uint16_t)i >> 8) & 0xFF,
};
// 32 bit system
uint8_t bytes [sizeof(int)] =
{
((uint32_t)i >> 0) & 0xFF,
((uint32_t)i >> 8) & 0xFF,
((uint32_t)i >> 16) & 0xFF,
((uint32_t)i >> 24) & 0xFF,
};
This places the LSB at index 0 in this array, similar to Little Endian representation in memory. Note however that the actual bit-shift is endianess-independent, and also fast, which is why it's a superior method.
Solutions based on unions or pointer arithmetic depend on endianess and are often buggy (pointer aliasing violations), so they should be avoided, as there is no benefit of using them.

you can use the bit-masking concept.
Like this,
uint16_t val = 0xABCD;
uint8_t vr = (uint8_t) (val & 0x00FF);
Or this can also be done by simply explicit type casting, as an 8-bit integer only carries LBS 8-bits from 16-bits value, & discards the remaining MSB 8-bits (by default, when assigns a larger value). This all done after shifting of bits.

Related

Copy low-order bytes of an integer whilst preserving endianness

I need to write a function that copies the specified number of low-order bytes of a given integer into an address in memory, whilst preserving their order.
void lo_bytes(uint8_t *dest, uint8_t no_bytes, uint32_t val)
I expect the usage to look like this:
uint8 dest[3];
lo_bytes(dest, 3, 0x44332211);
// Big-endian: dest = 33 22 11
// Little-endian: dest = 11 22 33
I've tried to implement the function using bit-shifts, memcpy, and iterating over each byte of val with a for-loop, but all of my attempts failed to work on either one or the other endianness.
Is it possible to do this in a platform-independent way, or do I need to use #ifdefs and have a separate piece of code for each endianness?
I've tried to implement the function using bit-shifts, memcpy, and
iterating over each byte of val with a for-loop, but all of my
attempts failed to work on either one or the other endianness.
All arithmetic, including bitwise arithmetic, is defined in terms of the values of the operands, not their representations. This cannot be sufficient for you because you want to obtain a result that differs depending on details of the representation style for type uint32_t.
You can operate on object representations via various approaches, but you still need to know which bytes to operate upon. That calls for some form of detection. If big-endian and little-endian are the only byte orders you're concerned with supporting, then I favor an approach similar to that given in #P__J__'s answer:
void lo_bytes(uint8_t *dest, uint8_t no_bytes, uint32_t val) {
static const union { uint32_t i; uint8_t a[4] } ubytes = { 1 };
memcpy(dest, &val + (1 - ubytes.a[0]) * (4 - no_bytes), no_bytes);
}
The expression (1 - ubytes.a[0]) evaluates to 1 if the representation of uint32_t is big endian, in which case the high-order bytes occur at the beginning of the representation of val. In that case, we want to skip the first 4 - no_bytes of the representation and copy the rest. If uint32_t has a little-endian representation, on the other hand, (1 - ubytes.a[0]) will evaluate to 0, with the result that the memcpy starts at the beginning of the representation. In every case, whichever bytes are copied from the representation of val, their order is maintained. That's what memcpy() does.
Is it possible to do this in a platform-independent way, or do I need to use #ifdefs and have a separate piece of code for each endianness?
No, that doesn't even make sense. Anything that cares about a specific characteristic of a platform (e.g. endianness) can't be platform independent.
Example 1 (platform independent):
// Copy the 3 least significant bytes to dest[]
dest[0] = value & 0xFF; dest[1] = (value >> 8) & 0xFF; dest[2] = (value >> 16) & 0xFF;
Example 2 (platform independent):
// Copy the 3 most significant bytes to dest[]
dest[0] = (value >> 8) & 0xFF; dest[1] = (value >> 16) & 0xFF; dest[2] = (value >> 24) & 0xFF;
Example 3 (platform dependent):
// I want the least significant bytes on some platforms and the most significant bytes on other platforms
#ifdef PLATFORM_TYPE_A
dest[0] = value & 0xFF; dest[1] = (value >> 8) & 0xFF; dest[2] = (value >> 16) & 0xFF;
#endif
#ifdef PLATFORM_TYPE_B
dest[0] = (value >> 8) & 0xFF; dest[1] = (value >> 16) & 0xFF; dest[2] = (value >> 24) & 0xFF;
#endif
Note that it makes no real difference what the cause of the platform dependence is (if it's endianness or something else), as soon as you have a platform dependence you can't have platform independence.
int detect_endianess(void) //1 if little endian 0 if big endianes
{
union
{
uint16_t u16;
uint8_t u8[2];
}val = {.u16 = 0x1122};
return val.u8[0] == 0x22;
}
void lo_bytes(void *dest, uint8_t no_bytes, uint32_t val)
{
if(detect_endianess())
{
memcpy(dest, &val, no_bytes);
}
else
{
memcpy(dest, (uint8_t *)(&val) + sizeof(val) - no_bytes, no_bytes);
}
}

Casting in C: gotchas

I am programming an Atmel SAMD20 in C. I came upon an error, that I have now fixed, but I'm not quite sure why it happened in the first place. Can someone point it out to me? (it's probably far too obvious, and I'm going to facepalm later.)
An array of sensors is generating uint16_t data, which I converted to uint8_t to send over I2C. So, this is how I originally wrote it:
for (i = 0; i < SENSBUS1_COUNT; ++i)
{
write_buffer[ (i*2) ] = (uint8_t) sample_sensbus1[i] & 0xff;
write_buffer[(i*2)+1] = (uint8_t) sample_sensbus1[i] >> 8;
}
Here, write_buffer is uint8_t and sample_sensbus1 is uint16_t.
This, for some reason, ends up messing up the most significant byte (in most cases, the most significant byte is just 1 (i.e. 0x100)). This, on the other hand, works fine, and is exactly what it should be:
for (i = 0; i < SENSBUS1_COUNT; ++i)
{
write_buffer[ (i*2) ] = sample_sensbus1[i] & 0xff;
write_buffer[(i*2)+1] = sample_sensbus1[i] >> 8;
}
Clearly, the implicit cast is smarter than I am.
What is going on?
write_buffer[(i*2)+1] = (uint8_t) sample_sensbus1[i] >> 8;
This is equivalent to:
write_buffer[(i*2)+1] = ((uint8_t) sample_sensbus1[i]) >> 8;
As you see, it does the cast before it does the shift. Your most significant byte is now gone.
This should work, though:
write_buffer[(i*2)+1] = (uint8_t) (sample_sensbus1[i] >> 8);
Your cast converts the uint16_t to uint8_t before it does the shift or mask. It is treated as though you wrote:
write_buffer[ (i*2) ] = ((uint8_t)sample_sensbus1[i]) & 0xff;
write_buffer[(i*2)+1] = ((uint8_t)sample_sensbus1[i]) >> 8;
You might need:
write_buffer[ (i*2) ] = (uint8_t)(sample_sensbus1[i] & 0xff);
write_buffer[(i*2)+1] = (uint8_t)(sample_sensbus1[i] >> 8);
In practice, the uncast version is OK too. Remember, a cast tells the compiler "I know more about this than you do; do as I say". That's dangerous if you don't know more than the compiler. Avoid casts whenever you can.
You might also note that shifting (left or right) by the size of the type in bits (or more) is undefined behaviour. However, the ((uint8_t)sample_sensbus[i]) >> 8 is not undefined behaviour, because of the 'usual arithmetic conversions' which mean that the result of (uint8_t)sample_sensbus[i] is converted to int before the shift occurs, and the size of an int cannot be 8 bits (it must be at least 16 bits to satisfy the standard), so the shift is not too big.
This is a question of operator precedence. In the first example, you are first converting to uint8_t and are applying the & and >> operators second. In the second example, those are applied before the implicit conversion takes place.
Casting is a unary prefix operator and as such has very high precedence.
(uint8_t) sample_sensbus1[i] & 0xff
parses as
((uint8_t)sample_sensbus1[i]) & 0xff
In this case & 0xff is redundant. But:
(uint8_t) sample_sensbus1[i] >> 8
parses as
((uint8_t)sample_sensbus1[i]) >> 8
Here the cast truncates the number to 8 bits, then >> 8 shifts everything out.
The problem is in this expression:
(uint8_t) sample_sensbus1[i] >> 8;
It is doing the following sequence:
Converting the sample_sensbus1[i] to uint8_t, effectively truncating it to the 8 least significant bits. This is where you are losing your data.
Converting the above to int as a part of usual arithmetic conversions, making an int with only 8 lower bits set.
Shifting the above int right 8 bits, effectively making the whole expression zero.

Mask or not mask when converting int to byte array?

Say you have a integer and you want to convert it to a byte array. After searching various places I've seen two ways of doing this, one with is shift only and one is shift then mask. I understand the shifting part, but why masking?
For example, scenario 1:
uint8 someByteArray[4];
uint32 someInt;
someByteArray[0] = someInt >> 24;
someByteArray[1] = someInt >> 16;
someByteArray[2] = someInt >> 8;
someByteArray[3] = someInt;
Scenario 2:
uint8 someByteArray[4];
uint32 someInt;
someByteArray[0] = (someInt >> 24) & 0xFF;
someByteArray[1] = (someInt >> 16) & 0xFF;
someByteArray[2] = (someInt >> 8) & 0xFF;
someByteArray[3] = someInt & 0xFF;
Is there a reason for choosing one over the other?
uint8 and uint32 are not standard types in C. I assume they represent 8-bit and 32-bit unsigned integral types, respectively (such as supported by Microsoft compilers as a vendor-specific extension).
Anyways ....
The masking is more general - it ensures the result is between 0 and 0xFF regardless of the actual type of elements someByteArray or of someInt.
In this particular case, it makes no difference, since the conversion of uint32 to uint8 is guaranteed to use modulo arithmetic (modulo 0xFF + 0x01 which is equal to 0x100 or 256 in decimal). However, if your code is changed to use variables or arrays of different types, the masking is necessary to ensure the result is between 0 and 255 (inclusive).
With some compilers the masking stops compiler warnings (it effectively tells the compiler that the expression produces a value between 0 and 0xFF, which can be stored in a 8 bit unsigned). However, some other compilers complain about the act of converting a larger type to an 8 bit type. Because of that, you will sometimes see a third variant, which truly demonstrates a "belts and suspenders" mindset.
uint8 someByteArray[4];
uint32 someInt;
someByteArray[0] = (uint8)((someInt >> 24) & 0xFF);
someByteArray[1] = (uint8)(someInt >> 16) & 0xFF);
someByteArray[2] = (uint8)((someInt >> 8) & 0xFF);
someByteArray[3] = (uint8)(someInt & 0xFF);

c Code that reads a 4 byte little endian number from a buffer

I encountered this piece of C code that's existing. I am struggling to understand it.
I supposidly reads a 4 byte unsigned value passed in a buffer (in little endian format) into a variable of type "long".
This code runs on a 64 bit word size, little endian x86 machine - where sizeof(long) is 8 bytes.
My guess is that this code is intended to also run on a 32 bit x86 machine - so a variable of type long is used instead of int for sake of storing value from a four byte input data.
I am having some doubts and have put comments in the code to express what I understand, or what I don't :-)
Please answer questions below in that context
void read_Value_From_Four_Byte_Buff( char*input)
{
/* use long so on 32 bit machine, can still accommodate 4 bytes */
long intValueOfInput;
/* Bitwise and of input buffer's byte 0 with 0xFF gives MSB or LSB ?*/
/* This code seems to assume that assignment will store in rightmost byte - is that true on a x86 machine ?*/
intValueOfInput = 0xFF & input[0];
/*left shift byte-1 eight times, bitwise "or" places in 2nd byte frm right*/
intValueOfInput |= ((0xFF & input[1]) << 8);
/* similar left shift in mult. of 8 and bitwise "or" for next two bytes */
intValueOfInput |= ((0xFF & input[2]) << 16);
intValueOfInput |= ((0xFF & input[3]) << 24);
}
My questions
1) The input buffer is expected to be in "Little endian". But from code looks like assumption here is that it read in as Byte 0 = MSB, Byte 1, Byte 2, Byte 3= LSB. I thought so because code reads bytes starting from Byte 0, and subsequent bytes ( 1 onwards) are placed in the target variable after left shifting. Is that how it is or am I getting it wrong ?
2) I feel this is a convoluted way of doing things - is there a simpler alternative to copy value from 4 byte buffer into a long variable ?
3) Will the assumption "that this code will run on a 64 bit machine" will have any bearing on how easily I can do this alternatively? I mean is all this trouble to keep it agnostic to word size ( I assume its agnostic to word size now - not sure though) ?
Thanks for your enlightenment :-)
You have it backwards. When you left shift, you're putting into more significant bits. So (0xFF & input[3]) << 24) puts Byte 3 into the MSB.
This is the way to do it in standard C. POSIX has the function ntohl() that converts from network byte order to a native 32-bit integer, so this is usually used in Unix/Linux applications.
This will not work exactly the same on a 64-bit machine, unless you use unsigned long instead of long. As currently written, the highest bit of input[3] will be put into the sign bit of the result (assuming a twos-complement machine), so you can get negative results. If long is 64 bits, all the results will be positive.
The code you are using does indeed treat the input buffer as little endian. Look how it takes the first byte of the buffer and just assigns it to the variable without any shifting. If the first byte increases by 1, the value of your result increases by 1, so it is the least-significant byte (LSB). Left-shifting makes a byte more significant, not less. Left-shifting by 8 is generally the same as multiplying by 256.
I don't think you can get much simpler than this unless you use an external function, or make assumptions about the machine this code is running on, or invoke undefined behavior. In most instances, it would work to just write uint32_t x = *(uint32_t *)input; but this assumes your machine is little endian and I think it might be undefined behavior according to the C standard.
No, running on a 64-bit machine is not a problem. I recommend using types like uint32_t and int32_t to make it easier to reason about whether your code will work on different architectures. You just need to include the stdint.h header from C99 to use those types.
The right-hand side of the last line of this function might exhibit undefined behavior depending on the data in the input:
((0xFF & input[3]) << 24)
The problem is that (0xFF & input[3]) will be a signed int (because of integer promotion). The int will probably be 32-bit, and you are shifting it so far to the left that the resulting value might not be representable in an int. The C standard says this is undefined behavior, and you should really try to avoid that because it gives the compiler a license to do whatever it wants and you won't be able to predict the result.
A solution is to convert it from an int to a uint32_t before shifting it, using a cast.
Finally, the variable intValueOfInput is written to but never used. Shouldn't you return it or store it somewhere?
Taking all this into account, I would rewrite the function like this:
uint32_t read_value_from_four_byte_buff(char * input)
{
uint32_t x;
x = 0xFF & input[0];
x |= (0xFF & input[1]) << 8;
x |= (0xFF & input[2]) << 16;
x |= (uint32_t)(0xFF & input[3]) << 24;
return x;
}
From the code, Byte 0 is LSB, Byte 3 is MSB. But there are some typos. The lines should be
intValueOfInput |= ((0xFF & input[2]) << 16);
intValueOfInput |= ((0xFF & input[3]) << 24);
You can make the code shorter by dropping 0xFF but using the type "unsigned char" in the argument type.
To make the code shorter, you can do:
long intValueOfInput = 0;
for (int i = 0, shift = 0; i < 4; i++, shift += 8)
intValueOfInput |= ((unsigned char)input[i]) << shift;

How to shift an array of bytes by 12-bits

I want to shift the contents of an array of bytes by 12-bit to the left.
For example, starting with this array of type uint8_t shift[10]:
{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0A, 0xBC}
I'd like to shift it to the left by 12-bits resulting in:
{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xAB, 0xC0, 0x00}
Hurray for pointers!
This code works by looking ahead 12 bits for each byte and copying the proper bits forward. 12 bits is the bottom half (nybble) of the next byte and the top half of 2 bytes away.
unsigned char length = 10;
unsigned char data[10] = {0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0A,0xBC};
unsigned char *shift = data;
while (shift < data+(length-2)) {
*shift = (*(shift+1)&0x0F)<<4 | (*(shift+2)&0xF0)>>4;
shift++;
}
*(data+length-2) = (*(data+length-1)&0x0F)<<4;
*(data+length-1) = 0x00;
Justin wrote:
#Mike, your solution works, but does not carry.
Well, I'd say a normal shift operation does just that (called overflow), and just lets the extra bits fall off the right or left. It's simple enough to carry if you wanted to - just save the 12 bits before you start to shift. Maybe you want a circular shift, to put the overflowed bits back at the bottom? Maybe you want to realloc the array and make it larger? Return the overflow to the caller? Return a boolean if non-zero data was overflowed? You'd have to define what carry means to you.
unsigned char overflow[2];
*overflow = (*data&0xF0)>>4;
*(overflow+1) = (*data&0x0F)<<4 | (*(data+1)&0xF0)>>4;
while (shift < data+(length-2)) {
/* normal shifting */
}
/* now would be the time to copy it back if you want to carry it somewhere */
*(data+length-2) = (*(data+length-1)&0x0F)<<4 | (*(overflow)&0x0F);
*(data+length-1) = *(overflow+1);
/* You could return a 16-bit carry int,
* but endian-ness makes that look weird
* if you care about the physical layout */
unsigned short carry = *(overflow+1)<<8 | *overflow;
Here's my solution, but even more importantly my approach to solving the problem.
I approached the problem by
drawing the memory cells and drawing arrows from the destination to the source.
made a table showing the above drawing.
labeling each row in the table with the relative byte address.
This showed me the pattern:
let iL be the low nybble (half byte) of a[i]
let iH be the high nybble of a[i]
iH = (i+1)L
iL = (i+2)H
This pattern holds for all bytes.
Translating into C, this means:
a[i] = (iH << 4) OR iL
a[i] = ((a[i+1] & 0x0f) << 4) | ((a[i+2] & 0xf0) >> 4)
We now make three more observations:
since we carry out the assignments left to right, we don't need to store any values in temporary variables.
we will have a special case for the tail: all 12 bits at the end will be zero.
we must avoid reading undefined memory past the array. since we never read more than a[i+2], this only affects the last two bytes
So, we
handle the general case by looping for N-2 bytes and performing the general calculation above
handle the next to last byte by it by setting iH = (i+1)L
handle the last byte by setting it to 0
given a with length N, we get:
for (i = 0; i < N - 2; ++i) {
a[i] = ((a[i+1] & 0x0f) << 4) | ((a[i+2] & 0xf0) >> 4);
}
a[N-2] = (a[N-1) & 0x0f) << 4;
a[N-1] = 0;
And there you have it... the array is shifted left by 12 bits. It could easily be generalized to shifting N bits, noting that there will be M assignment statements where M = number of bits modulo 8, I believe.
The loop could be made more efficient on some machines by translating to pointers
for (p = a, p2=a+N-2; p != p2; ++p) {
*p = ((*(p+1) & 0x0f) << 4) | (((*(p+2) & 0xf0) >> 4);
}
and by using the largest integer data type supported by the CPU.
(I've just typed this in, so now would be a good time for somebody to review the code, especially since bit twiddling is notoriously easy to get wrong.)
Lets make it the best way to shift N bits in the array of 8 bit integers.
N - Total number of bits to shift
F = (N / 8) - Full 8 bit integers shifted
R = (N % 8) - Remaining bits that need to be shifted
I guess from here you would have to find the most optimal way to make use of this data to move around ints in an array. Generic algorithms would be to apply the full integer shifts by starting from the right of the array and moving each integer F indexes. Zero fill the newly empty spaces. Then finally perform an R bit shift on all of the indexes, again starting from the right.
In the case of shifting 0xBC by R bits you can calculate the overflow by doing a bitwise AND, and the shift using the bitshift operator:
// 0xAB shifted 4 bits is:
(0xAB & 0x0F) >> 4 // is the overflow (0x0A)
0xAB << 4 // is the shifted value (0xB0)
Keep in mind that the 4 bits is just a simple mask: 0x0F or just 0b00001111. This is easy to calculate, dynamically build, or you can even use a simple static lookup table.
I hope that is generic enough. I'm not good with C/C++ at all so maybe someone can clean up my syntax or be more specific.
Bonus: If you're crafty with your C you might be able to fudge multiple array indexes into a single 16, 32, or even 64 bit integer and perform the shifts. But that is prabably not very portable and I would recommend against this. Just a possible optimization.
Here a working solution, using temporary variables:
void shift_4bits_left(uint8_t* array, uint16_t size)
{
int i;
uint8_t shifted = 0x00;
uint8_t overflow = (0xF0 & array[0]) >> 4;
for (i = (size - 1); i >= 0; i--)
{
shifted = (array[i] << 4) | overflow;
overflow = (0xF0 & array[i]) >> 4;
array[i] = shifted;
}
}
Call this function 3 times for a 12-bit shift.
Mike's solution maybe faster, due to the use of temporary variables.
The 32 bit version... :-) Handles 1 <= count <= num_words
#include <stdio.h>
unsigned int array[] = {0x12345678,0x9abcdef0,0x12345678,0x9abcdef0,0x66666666};
int main(void) {
int count;
unsigned int *from, *to;
from = &array[0];
to = &array[0];
count = 5;
while (count-- > 1) {
*to++ = (*from<<12) | ((*++from>>20)&0xfff);
};
*to = (*from<<12);
printf("%x\n", array[0]);
printf("%x\n", array[1]);
printf("%x\n", array[2]);
printf("%x\n", array[3]);
printf("%x\n", array[4]);
return 0;
}
#Joseph, notice that the variables are 8 bits wide, while the shift is 12 bits wide. Your solution works only for N <= variable size.
If you can assume your array is a multiple of 4 you can cast the array into an array of uint64_t and then work on that. If it isn't a multiple of 4, you can work in 64-bit chunks on as much as you can and work on the remainder one by one.
This may be a bit more coding, but I think it's more elegant in the end.
There are a couple of edge-cases which make this a neat problem:
the input array might be empty
the last and next-to-last bits need to be treated specially, because they have zero bits shifted into them
Here's a simple solution which loops over the array copying the low-order nibble of the next byte into its high-order nibble, and the high-order nibble of the next-next (+2) byte into its low-order nibble. To save dereferencing the look-ahead pointer twice, it maintains a two-element buffer with the "last" and "next" bytes:
void shl12(uint8_t *v, size_t length) {
if (length == 0) {
return; // nothing to do
}
if (length > 1) {
uint8_t last_byte, next_byte;
next_byte = *(v + 1);
for (size_t i = 0; i + 2 < length; i++, v++) {
last_byte = next_byte;
next_byte = *(v + 2);
*v = ((last_byte & 0x0f) << 4) | (((next_byte) & 0xf0) >> 4);
}
// the next-to-last byte is half-empty
*(v++) = (next_byte & 0x0f) << 4;
}
// the last byte is always empty
*v = 0;
}
Consider the boundary cases, which activate successively more parts of the function:
When length is zero, we bail out without touching memory.
When length is one, we set the one and only element to zero.
When length is two, we set the high-order nibble of the first byte to low-order nibble of the second byte (that is, bits 12-16), and the second byte to zero. We don't activate the loop.
When length is greater than two we hit the loop, shuffling the bytes across the two-element buffer.
If efficiency is your goal, the answer probably depends largely on your machine's architecture. Typically you should maintain the two-element buffer, but handle a machine word (32/64 bit unsigned integer) at a time. If you're shifting a lot of data it will be worthwhile treating the first few bytes as a special case so that you can get your machine word pointers word-aligned. Most CPUs access memory more efficiently if the accesses fall on machine word boundaries. Of course, the trailing bytes have to be handled specially too so you don't touch memory past the end of the array.

Resources