Change char* to point a few bits offset - c

Say I have this code:
char num[2] = {15, 240};
char* p_num;
Now, if I have understood everything correct, the bits in the array should be aligned like this:
00001111 11110000
My question is: Is there any possible way to make the pointer p_num to point to the four last bits in the first byte so that when I execute this code:
printf("%d", *p_num);
255 will be written?
I.e. p_num will point to the bits which the brackets below encloses:
0000[1111 1111]0000

No. The minimum addressable unit of memory is a byte (at best), though you could obtain the desired value using
((num[0] & 0xF) << 4) | ((num[1] >> 4) & 0xF)
For example,
unsigned char num[2] = {15, 240};
unsigned char combined = ((num[0] & 0xF) << 4) | ((num[1] >> 4) & 0xF);
printf("%d\n", (int)combined);
Note that I used unsigned char to store 240 and 255 since char can be signed or unsigned depending on the implementation.

No, for two reasons.
In C, the size of a char is defined to be 1. However, the unit itself is implementation dependent. C does not guarantee that it will be 1 byte. Granted the unit is typically 1 byte. However, since this is not guaranteed, your premise that the bits in the bytes of the array will be arranged as you believe is not accurate. Strictly speaking, the bits themselves could contain anything. Though it would be unusual to have something else, based on the language you cannot rely on that arrangement of bits. Any workaround that relies on bit shifts, etc, will reliably work only on that implementation and cannot be assumed to be portable.
A pointer to char will only point to a char. If you advance it using pointer arithmetic, it will advance by 1 of whatever the implementation defined unit is. The same is true for,pointers to any other type. There is no way provided by the language to advance a pointer by fractions of whatever the size of the type is that the pointer points to.

Related

can you address single bits of an int?

As I understand it, addressing a single bit in an int variable seems possible, if it is passed as a pointer. Am I correct?
uint8_t adcState[8];
uint16_t adcMessageId[8];
void adcEnable(uint8_t* data) {
// Set ADC Input as enabled
adcState[(uint8_t) data[0]] = 1;
// Get ADC Message ID
adcMessageId[(uint8_t) data[0]] = data[2] << 8 | data[1];
}
So far this is what I figured out that:
The function receives a pointer to 8bit int as an argument
It takes the least significant digit of that int (the pointer is treated as an array, and its first field is being read), and uses it as a field number for adcState array, which then is set to 1. For example this would mean if data was 729, the data[0] would be '9' and therefore the adcsState[9] becomes 1.
Is it possible? Can you use the pointers like this?
For the adcMessageId array a similar approach is taken. However here the value it is assigned depends on the third and second digit of the data int.
I don't understand the shift over here. Being a uint8_t value it has only 8 bits, so shifting with 8 bits always gives 0000 0000. Therefore an OR with data[1] would be just data[1] itself...
In our example, the adcMessageId[9] would become ('7' << 8) bitwise OR with '2', so just '2'.
Something in my logic seems wrong.
It would seem data is pointing to an array, not a single 8 bit int, and that:
The first element of the array is a pointer into the arrays adcState and adcMessageId
The second and third elements of the array comprise a data value for the array adcMessageId
As commenter #Eugene Sh. pointed out, data[2] is promoted to an int before shifting, so no bits are lost.
The pointer notation uint8_t * is as valid as array notation uint8_t [] in a function signature for passing an array; it's often how char * strings are passed, and arrays decay to a pointer to their first element when passed to functions anyway.
The function receives a pointer to 8bit int as an argument
Yes, roughly. And the function implementation assumes that the pointed-to uint8_t can be treated as the first element of an array of at least three uint8_t.
It takes the least significant digit of that int (the pointer is treated as an array, and its first field is being read), and uses it
as a field number for adcState array, which then is set to 1. For
example this would mean if data was 729, the data[0] would be '9' and
therefore the adcsState[9] becomes '1'. Is it possible? Can you use
the pointers like this?
No, you have completely misunderstood. data[0] means exactly the same thing as *data. It refers to the 8-bit unsigned integer to which data points. The number 729 is too large to be represented as a uint8_t, but if the object to which data pointed had the value 129 then data[0] would evaluate to 129.
You are perhaps confused by the appearance later in the function of data[1] and data[2]. These refer to two uint8_t objects following *data in memory, such as will reliably be present if data points to the first element of an array, as I alluded to above. Indexing with [] does not have the effect of slicing the uint8_t to which data points.
Pay attention also that I am saying "the object to which data points". One does not pass a uint8_t value directly as this function's parameter. It is not anticipating that an integer value would be reinterpreted as a pointer. You pass a pointer to the data you want the function to work with.
For the adcMessageId array a similar approach is taken. However here the value it is assigned depends on the third and second digit of
the data int.
In the adcMessageId case, again data[0] refers to the uint8_t to which data points. data[1] refers to another whole uint8_t following that in memory, and data[2] to the next after that.
I don't understand the shift over here. Being a uint8_t value it has only 8 bits, so shifting with 8 bits always gives 0000 0000.
uint8_t has only 8 bits, but all integer values narrower than int are converted to an integer type at least as wide as int, or perhaps to a floating-point type, for arithmetic operations. The specific promoted type depends in part on what the other operand is, and the result has the same, promoted type. Type int is at least 16 bits wide on all conforming C implementations. Thus this ...
data[2] << 8 | data[1]
... intends to pack the two uint8_t values data[2] and data[1] into one 16-bit integer, data[2] in the most-significant position. It's not entirely safe because the elements of data will be promoted to (signed) int instead of unsigned int, but that will present an issue only on implementations where int is 16 bits wide (which are uncommon these days), and even then, only if the value of data[2] is larger than 127. A safer way to express it would involve explicit casts:
(unsigned int) data[2] << 8 | (unsigned int) data[1]
You have a few misconceptions. Or maybe just wrong wording.
The function receives a pointer to 8bit int as an argument
More precisely it gets a pointer to an array of 8bit integers. Otherwise your usage would be invalid. Probably it gets a pointer to a string.
It takes the least significant digit of that int (the pointer is treated as an array, and its first field is being read),
That is wrong. You seem to use it as a pointer to a string holding a number.
In that case you access the first character, which is the MOST significant decimal digit.
and uses it as a field number for adcState array, which then is set to 1. For example this would mean if data was 729, the data[0] would be '9' and therefore the adcsState[9] becomes '1'. Is it possible? Can you use the pointers like this?
You are messing up things a bit.
If you want to access decimal digits, we are talking about strings and there the first element is '7' which is not to be confused with 7.
For the adcMessageId array a similar approach is taken. However here the value it is assigned depends on the third and second digit of the data int.
Maybe you should not talk about int if you are using strings.
I don't understand the shift over here. Being a uint8_t value it has only 8 bits, so shifting with 8 bits always gives 0000 0000. Therefore an OR with data[1] would be just data[1] itself... In our example, the adcMessageId[9] would become ('7' << 8) bitwise OR with '2', so just '2'.
That was already addressed in comments and Govind Parmar's answer: Integer promotion takes place before shifting.

Using a void pointer to store a 128 bit number in c

I am learning c. As an experiment I am trying to implement 128-bit addition and multiplication.
I want to do this using a void pointer as follows:
void* x = malloc(16);
memset(x, 0, 16);
Is it possible to read and write the ith bit of x? I understand how to do this using bitwise operations on standard data types but the compiler complains about invalid operands when I try and use bitwise operations on a void pointer. In general, I am interested in whether it is possible to malloc an arbitraryly sized block of memory and then manipulated each individual bit in c?
The C standard is surprisingly (although logically) inflexible when it comes to casting of pointer types, although you can cast the result of malloc to any pointer type you want.
In order to perform bit and byte analsysis, a safe type to use is char* or unsigned char*:
unsigned char* x = malloc(16);
(I'd always pick the unsigned char since some platforms have a 1's complement signed char type which, crudely put, is a pain in the neck).
Then you can use pointer arithmetic on x to manipulate the memory block on a byte by byte basis. Note that the standard states that sizeof(char) and sizeof(unsigned char) is 1, so pointer arithmetic on such a type will traverse the entire memory block without omitting any bits of it.
As for examining bit by bit, the standards-defined CHAR_BIT will tell you how many bits there are in a byte. (It's normally 8 but it would be unwise to assume that.)
Finally, don't forget to call free(x) at some point.
To set a bit at position i
*((uint8_t*)x + i/8) |= (1 << (i%8))
To clear it
*((uint8_t*)x + i/8) &= ~(1 << (i%8))

Copying int to different memory location, receiving extra bytes than expected

Trying to pre-pend a 2 byte message length, after getting the length in a 4 byte int. I use memcpy to copy 2 bytes of the int. When I look at the second byte I copied, it is as expected, but accessing the first byte actually prints 4 bytes.
I would expect that dest[0] and dest[1] both contain 1 byte of the int. whether or not it's a significant byte, or the order is switched... I can throw in an offset on the memcpy or reversing 0 and 1. It does not have to be portable, I would just like it to work.
The same error is happening in Windows with LoadRunner and Ubuntu with GCC - so I have at least tried to rule out portability as a cause.
I'm not sure where I'm going wrong. I am suspecting it's related to my lack of using pointers recently? Is there a better approach to cast an int to a short and then put it in the first 2 bytes of a buffer?
char* src;
char* dest;
int len = 2753; // Hex - AC1
src=(char*)malloc(len);
dest=(char*)malloc(len+2);
memcpy(dest, &len, 2);
memcpy(dest+2, src, len);
printf("dest[0]: %02x", dest[0]);
// expected result: c1
// actual result: ffffffc1
printf("dest[1]: %02x", dest[1]);
// expected result: 0a
// actual result: 0a
You cannot just take a random two bytes out of a four byte object and call it a cast to short.
You will need to copy your int into a two byte int before doing your memcpy.
But actually, that isn't the best way to do it either, because you have no control over the byte order of an integer.
Your code should look like this:
dest[0] = ((unsigned)len >> 8) & 0xFF;
dest[1] = ((unsigned)len) & 0xFF;
That should write it out in network byte order aka big endian. All of the standard network protocols use this byte order.
And I'd add something like:
assert( ((unsigned)len & 0xFFFF0000) == 0 ); // should be nothing in the high bytes
Firstly, you are using printf incorrectly. This
printf("dest[0]: %02x", dest[0]);
uses x format specifier in printf. x format specifier requires an argument of type unsigned int. Not char, but unsigned int and only unsigned int (or alternatively an int with non-negative value).
The immediate argument you supplied has type char, which is probably signed on your platform. This means that your dest[0] contains -63. A variadic argument of type char is automatically promoted to type int, which turns 0xc1 into 0xffffffc1 (as a signed representation of -63 in type int). Since printf expects an unsigned int value and you are passing a negative int value instead, the behavior is undefined. The printout that you see is nothing more than a manifestation of that undefined behavior. It is meaningless.
One proper way to print dest[0] in this case would be
printf("dest[0]: %02x", (unsigned) dest[0]);
I'm pretty sure the output will still be ffffffc1, but in this case 0xffffffc1 is the prefectly expected result of integer conversion from negative -63 value to unsigned int type. Nothing unusual here.
Alternatively you can do
printf("dest[0]: %02x", (unsigned char) dest[0]);
which should give you your desired c1 output. Note that the conversion to int takes place in this case as well, but since the original value is positive (193), the result of the conversion to int is positive too and printf works properly.
Finally, if you want to work with raw memory directly, the proper type to use would be unsigned char from the very beginning. Not char, but unsigned char.
Secondly, an object of type int may easily occupy more than two 8-bit bytes. Depending on the platform, the 0xA and 0xC1 values might end up in completely different portions of the memory region occupied by that int object. You should not expect that copying the first two bytes of an int object will copy the 0xAC1 portion specifically.
You make the assumption that an "int" is two bytes. What justification do you have for that? Your code is highly unportable.
You make another assumption that "char" is unsigned. What justification do you have for that? Again, your code is highly unportable.
You make another assumption about the ordering of bytes in an int. What justification do you have for that? Again, your code is highly unportable.
instead of the literal 2, use sizeof(int). Never hard code the size of a type.
If this code should be portable, you should not use int, but a fixed size datatype.
If you need 16 bit, you could use int16_t.
Also, the printing of the chars would need a cast to unsigned. Now, the char is upcasted to an int, and the sign is extended. This gives the initial FFFF's

bytes are swapped when pointer cast is done in C

I have an array of "unsigned short" i.e. 16-bits each element in C. I have two "unsigned short" values which should be written back in array in little endian order which means that least significant element will come first. For example, if I have following value:
unsigned int val = 0x12345678;
it should be stored in my array as:
unsigned short buff[10];
buff[0] = 0x5678;
buff[1] = 0x1234;
I have written a code to write the value at once and not extracting upper and lower 16-bits of the int value and write them separately since there might be atomicity problems. My code looks like this:
typedef unsigned int UINT32;
*((UINT32*)(buff)) = (value & 0xffff0000) + (value & 0xffff);
Surprisingly, the code above works correctly and the results will be:
buff[0] is 0x5678;
buff[1] is 0x1234;
The problem is, as it is shown, I am saving the "unsigned short" values in big endian order and not little endian as I wish. In other words, when I cast the pointer from "unsigned short*" to "unsigned int*" the 16-bit elements are swapped automatically! Does anybody knows what happens here and why the data gets swapped?
Your platform represents data in little endian format, and by casting buff to (UINT32 *), you are telling the compiler that buff must now be interpreted as pointer to unsigned int. The instruction
*((UINT32*)(buff)) = (value & 0xffff0000) + (value & 0xffff);
Just says "write (value & 0xffff0000) + (value & 0xffff) into this unsigned int (buff)". And that's what he does, how he stores it is not your business. You're not supposed to access either of the lower or upper 16 bits, because it is platform dependent which one comes first.
All you know is that if you access buff as an unsigned int, you will get the same value that you previously stored in there, but it is not safe to assume any particular byte order.
So basically your code has undefined behavior.

What does it mean for a char to be signed?

Given that signed and unsigned ints use the same registers, etc., and just interpret bit patterns differently, and C chars are basically just 8-bit ints, what's the difference between signed and unsigned chars in C? I understand that the signedness of char is implementation defined, and I simply can't understand how it could ever make a difference, at least when char is used to hold strings instead of to do math.
It won't make a difference for strings. But in C you can use a char to do math, when it will make a difference.
In fact, when working in constrained memory environments, like embedded 8 bit applications a char will often be used to do math, and then it makes a big difference. This is because there is no byte type by default in C.
In terms of the values they represent:
unsigned char:
spans the value range 0..255 (00000000..11111111)
values overflow around low edge as:
0 - 1 = 255 (00000000 - 00000001 = 11111111)
values overflow around high edge as:
255 + 1 = 0 (11111111 + 00000001 = 00000000)
bitwise right shift operator (>>) does a logical shift:
10000000 >> 1 = 01000000 (128 / 2 = 64)
signed char:
spans the value range -128..127 (10000000..01111111)
values overflow around low edge as:
-128 - 1 = 127 (10000000 - 00000001 = 01111111)
values overflow around high edge as:
127 + 1 = -128 (01111111 + 00000001 = 10000000)
bitwise right shift operator (>>) does an arithmetic shift:
10000000 >> 1 = 11000000 (-128 / 2 = -64)
I included the binary representations to show that the value wrapping behaviour is pure, consistent binary arithmetic and has nothing to do with a char being signed/unsigned (expect for right shifts).
Update
Some implementation-specific behaviour mentioned in the comments:
char != signed char. The type "char" without "signed" or "unsinged" is implementation-defined which means that it can act like a signed or unsigned type.
Signed integer overflow leads to undefined behavior where a program can do anything, including dumping core or overrunning a buffer.
#include <stdio.h>
int main(int argc, char** argv)
{
char a = 'A';
char b = 0xFF;
signed char sa = 'A';
signed char sb = 0xFF;
unsigned char ua = 'A';
unsigned char ub = 0xFF;
printf("a > b: %s\n", a > b ? "true" : "false");
printf("sa > sb: %s\n", sa > sb ? "true" : "false");
printf("ua > ub: %s\n", ua > ub ? "true" : "false");
return 0;
}
[root]# ./a.out
a > b: true
sa > sb: true
ua > ub: false
It's important when sorting strings.
There are a couple of difference. Most importantly, if you overflow the valid range of a char by assigning it a too big or small integer, and char is signed, the resulting value is implementation defined or even some signal (in C) could be risen, as for all signed types. Contrast that to the case when you assign something too big or small to an unsigned char: the value wraps around, you will get precisely defined semantics. For example, assigning a -1 to an unsigned char, you will get an UCHAR_MAX. So whenever you have a byte as in a number from 0 to 2^CHAR_BIT, you should really use unsigned char to store it.
The sign also makes a difference when passing to vararg functions:
char c = getSomeCharacter(); // returns 0..255
printf("%d\n", c);
Assume the value assigned to c would be too big for char to represent, and the machine uses two's complement. Many implementation behave for the case that you assign a too big value to the char, in that the bit-pattern won't change. If an int will be able to represent all values of char (which it is for most implementations), then the char is being promoted to int before passing to printf. So, the value of what is passed would be negative. Promoting to int would retain that sign. So you will get a negative result. However, if char is unsigned, then the value is unsigned, and promoting to an int will yield a positive int. You can use unsigned char, then you will get precisely defined behavior for both the assignment to the variable, and passing to printf which will then print something positive.
Note that a char, unsigned and signed char all are at least 8 bits wide. There is no requirement that char is exactly 8 bits wide. However, for most systems that's true, but for some, you will find they use 32bit chars. A byte in C and C++ is defined to have the size of char, so a byte in C also is not always exactly 8 bits.
Another difference is, that in C, a unsigned char must have no padding bits. That is, if you find CHAR_BIT is 8, then an unsigned char's values must range from 0 .. 2^CHAR_BIT-1. THe same is true for char if it's unsigned. For signed char, you can't assume anything about the range of values, even if you know how your compiler implements the sign stuff (two's complement or the other options), there may be unused padding bits in it. In C++, there are no padding bits for all three character types.
"What does it mean for a char to be signed?"
Traditionally, the ASCII character set consists of 7-bit character encodings. (As opposed to the 8 bit EBCIDIC.)
When the C language was designed and implemented this was a significant issue. (For various reasons like data transmission over serial modem devices.) The extra bit has uses like parity.
A "signed character" happens to be perfect for this representation.
Binary data, OTOH, is simply taking the value of each 8-bit "chunk" of data, thus no sign is needed.
Arithmetic on bytes is important for computer graphics (where 8-bit values are often used to store colors). Aside from that, I can think of two main cases where char sign matters:
converting to a larger int
comparison functions
The nasty thing is, these won't bite you if all your string data is 7-bit. However, it promises to be an unending source of obscure bugs if you're trying to make your C/C++ program 8-bit clean.
Signedness works pretty much the same way in chars as it does in other integral types. As you've noted, chars are really just one-byte integers. (Not necessarily 8-bit, though! There's a difference; a byte might be bigger than 8 bits on some platforms, and chars are rather tied to bytes due to the definitions of char and sizeof(char). The CHAR_BIT macro, defined in <limits.h> or C++'s <climits>, will tell you how many bits are in a char.).
As for why you'd want a character with a sign: in C and C++, there is no standard type called byte. To the compiler, chars are bytes and vice versa, and it doesn't distinguish between them. Sometimes, though, you want to -- sometimes you want that char to be a one-byte number, and in those cases (particularly how small a range a byte can have), you also typically care whether the number is signed or not. I've personally used signedness (or unsignedness) to say that a certain char is a (numeric) "byte" rather than a character, and that it's going to be used numerically. Without a specified signedness, that char really is a character, and is intended to be used as text.
I used to do that, rather. Now the newer versions of C and C++ have (u?)int_least8_t (currently typedef'd in <stdint.h> or <cstdint>), which are more explicitly numeric (though they'll typically just be typedefs for signed and unsigned char types anyway).
The only situation I can imagine this being an issue is if you choose to do math on chars. It's perfectly legal to write the following code.
char a = (char)42;
char b = (char)120;
char c = a + b;
Depending on the signedness of the char, c could be one of two values. If char's are unsigned then c will be (char)162. If they are signed then it will an overflow case as the max value for a signed char is 128. I'm guessing most implementations would just return (char)-32.
One thing about signed chars is that you can test c >= ' ' (space) and be sure it's a normal printable ascii char. Of course, it's not portable, so not very useful.

Resources