Guarantee exact data width in (void*) array in C

Guarantee exact data width in (void*) array in C - c

I have a simple function in C which provides a void* pointer to a data array. I know the size (in bytes) of each individual data-point within this memory block, and need to guarantee that I can modify each data-point in this block without accidentally altering neighboring data-points. In this example, I want to decrement each value by 1.
All data points are either 8-bit, 16-bit, or 32-bit.
eg:
void myFunction(void* data, size_t arraySize, size_t widthPerDataPoint)
{
if(!data)
return -1;
size_t w = widthPerDataPoint;
int numPoints = arraySize / widthPerDataPoint;
int i;
for(i=0; i<numPoints; i++)
{
if(w==1) // 8 bit
(*((int8_t*)data + i))--;
else if(w==2) // 16 bit
(*((int16_t*)data + i))--;
else if(w==4) // 32 bit
(*((int32_t*)data + i))--;
}
}
Unfortunately, the int8_t, etc, datatypes only guarantee their minimum size, according to C99 specifications, and not an exact size. Is there any way to re-cast and modify the data in-place and guarantee I won't smash my array or touch neighboring data points? Also, is there an equivalent technique that would somehow work for other data widths (ie: 24-bit, 60-bit, etc)?

int8_t is guaranteed to be exactly 8 bits, and if CHAR_BIT==1, exactly 1 byte.
Quoting the N1570 draft of the latest C standard, section 7.20.1.1:
The typedef name intN_t designates a signed integer type
with width N, no padding bits, and a two’s complement
representation. Thus, int8_t denotes such a signed integer type
with a width of exactly 8 bits.
Though for your purposes it might make more sense to use uint8_t, uint16_t, et al.
If the implementation doesn't support types with the required characteristics, it won't define them; you can detect this by checking, for example:
#include <stdint.h>
#ifdef UINT8_MAX
/* uint8_t exists */
#else
/* uint8_t doesn't exist */
#endif
(If CHAR_BIT != 8, then neither int8_t nor uint8_t will be defined.)
It's the [u]intleast_N_t and [u]intfast_t types for which the standard only guarantees minimum sizes.
You'll have to guarantee that both the array and the offsets within it are properly aligned for the types you're using to access it. I presume you're already taking care of that.

By definition, incrementing a pointer of type T* by n will shift it by n * sizeof(T) bytes. Therefore, consistency is guaranteed to you by the compiler. No worries.

The code doesn't seem entirely unreasonable. I personally would probably do something like this:
switch(widthPerDataPoint)
{
case 1:
{
int8_t *dptr = data;
for(i = 0; i < numPoints; i++)
dptr[i]--;
}
break;
case 2:
{
int16_t *dptr = data;
for(i = 0; i < numPoints; i++)
dptr[i]--;
}
break;
case 4:
{
int32_t *dptr = data;
for(i = 0; i < numPoints; i++)
dptr[i]--;
}
break;
default:
fprintf(stderr, "Someone gave the wrong width - width=%d\n",
widthPerDatapoint);
break;
}
The advantage here is that you don't get a bunch of conditions in every loop. The compiler MAY sort it out anyways, but I don't always trust compilers to figure such things out - and I think it's a bit cleaner too.

How about the following?
Copy the value out to a local int of whatever size
Perform the decrement on the local variable
Use an appropriate bit mask to zero out the location in the array
(e.g., ~0xFF for 8-bit)
'And' the local variable back into the array.

Related

How to convert to integer a char[4] of "hexadecimal" numbers [C/Linux]

So I'm working with system calls in Linux. I'm using "lseek" to navigate through the file and "read" to read. I'm also using Midnight Commander to see the file in hexadecimal. The next 4 bytes I have to read are in little-endian , and look like this : "2A 00 00 00". But of course, the bytes can be something like "2A 5F B3 00". I have to convert those bytes to an integer. How do I approach this? My initial thought was to read them into a vector of 4 chars, and then to build my integer from there, but I don't know how. Any ideas?
Let me give you an example of what I've tried. I have the following bytes in file "44 00". I have to convert that into the value 68 (4 + 4*16):
char value[2];
read(fd, value, 2);
int i = (value[0] << 8) | value[1];
The variable i is 17480 insead of 68.
UPDATE: Nvm. I solved it. I mixed the indexes when I shift. It shoud've been value[1] << 8 ... | value[0]

General considerations
There seem to be several pieces to the question -- at least how to read the data, what data type to use to hold the intermediate result, and how to perform the conversion. If indeed you are assuming that the on-file representation consists of the bytes of a 32-bit integer in little-endian order, with all bits significant, then I probably would not use a char[] as the intermediate, but rather a uint32_t or an int32_t. If you know or assume that the endianness of the data is the same as the machine's native endianness, then you don't need any other.
Determining native endianness
If you need to compute the host machine's native endianness, then this will do it:
static const uint32_t test = 1;
_Bool host_is_little_endian = *(char *)&test;
It is worthwhile doing that, because it may well be the case that you don't need to do any conversion at all.
Reading the data
I would read the data into a uint32_t (or possibly an int32_t), not into a char array. Possibly I would read it into an array of uint8_t.
uint32_t data;
int num_read = fread(&data, 4, 1, my_file);
if (num_read != 1) { /* ... handle error ... */ }
Converting the data
It is worthwhile knowing whether the on-file representation matches the host's endianness, because if it does, you don't need to do any transformation (that is, you're done at this point in that case). If you do need to swap endianness, however, then you can use ntohl() or htonl():
if (!host_is_little_endian) {
data = ntohl(data);
}
(This assumes that little- and big-endian are the only host byte orders you need to be concerned with. Historically, there have been others, which is why the byte-reorder functions come in pairs, but you are extremely unlikely ever to see one of the others.)
Signed integers
If you need a signed instead of unsigned integer, then you can do the same, but use a union:
union {
uint32_t unsigned;
int32_t signed;
} data;
In all of the preceding, use data.unsigned in place of plain data, and at the end, read out the signed result from data.signed.

Suppose you point into your buffer:
unsigned char *p = &buf[20];
and you want to see the next 4 bytes as an integer and assign them to your integer, then you can cast it:
int i;
i = *(int *)p;
You just said that p is now a pointer to an int, you de-referenced that pointer and assigned it to i.
However, this depends on the endianness of your platform. If your platform has a different endianness, you may first have to reverse-copy the bytes to a small buffer and then use this technique. For example:
unsigned char ibuf[4];
for (i=3; i>=0; i--) ibuf[i]= *p++;
i = *(int *)ibuf;
EDIT
The suggestions and comments of Andrew Henle and Bodo could give:
unsigned char *p = &buf[20];
int i, j;
unsigned char *pi= &(unsigned char)i;
for (j=3; j>=0; j--) *pi++= *p++;
// and the other endian:
int i, j;
unsigned char *pi= (&(unsigned char)i)+3;
for (j=3; j>=0; j--) *pi--= *p++;

C - access memory directly using address?

I've been lightly studying C for a few weeks now with some book.
int main(void)
{
float num = 3.15;
int *ptr = (int *)&num; //so I can use line 8 and 10
for (int i = 0; i < 32; i++)
{
if (!(i % 8) && (i / 8))
printf(" ");
printf("%d", *ptr >> (31 - i) & 1);
}
return 0;
}
output : 01000000 01001001 10011001 10011010
As you see 3.15 in single precision float is 01000000 01001001 10011001 10011010.
So let's say ptr points to address 0x1efb40.
Here are the questions:
As I understood in the book, first 8 bits of num data is stored in 0x1efb40, 2nd 8 bits in 0x1efb41, next 8 bits in 0x1efb42 and last 8 bits in 0x1efb43. Am I right?
If I'm right, is there any way I can directly access the 2nd 8 bits with hex address value 0x1efb41? Thereby can I change the data to something like 11111111?

The ordering of bytes within a datatype is known as endianness and is system specific. What you describe with the least significant byte (LSB) first is called little endian and is what you would find on x86 based processors.
As for accessing particular bytes of a representation, you can use a pointer to an unsigned char to point to the variable in question to view the specific bytes. For example:
float num = 3.15;
unsigned char *p = (unsigned char *)&num;
int i;
for (i=0; i<sizeof(num); i++) {
printf("byte %d = %02x\n", i, p[i]);
}
Note that this is only allowed to access bytes via a character pointer, not an int *, as the latter violates strict aliasing.

The code you wrote is not actually valid C. C has a rule called "strict aliasing," which states that if a region of memory contains a value of one type (i.e. float), it cannot be accessed as though it was another type (i.e. int). This rule has its origins in some performance optimizations that let the compiler generate faster code. I can't say it's an obvious rule, but it's the rule.
You can work around this by using union. If you make a union like union { float num, int numAsInt }, you can store a float and then read it as an integer. The result is unspecified. Alternatively, you are always permitted to access the bytes of a value as chars (just not anything larger). char is given special treatment (presumably to make it so you can copy a buffer of data as bytes, then cast it to your data's type and access it, which is something that happens a lot in low level code like network stacks).
Welcome to a fun corner of learning C. There's unspecified behavior and undefined behavior. Informally, unspecified behavior says "we won't say what happens, but it will be reasonable." The C spec will not say what order the bytes are in. But it will say that you will get some bytes. Undefined behavior is nastier. Undefined behavior says anything can happen, ranging from compiler errors to exceptions at runtime, to absolutely nothing at all (making you think your code is valid when it is not).
As for the values, dbush points out in his answer that the order of the bytes is defined by the platform you are on. You are seeing a "little endian" representation of a IEE754 floating point number. On other platforms, it may be different.

Union punning is much safer:
#include <stdio.h>
typedef union
{
unsigned char uc[sizeof(double)];
float f;
double d;
}u_t;
void print(u_t u, size_t size, int endianess)
{
size_t start = 0;
int increment = 1;
if(endianess)
{
start = size - 1;
increment = -1;
}
for(size_t index = 0; index < size; index++)
{
printf("%hhx ", u.uc[start]);
start += increment;
}
printf("\n");
}
int main(void)
{
u_t u;
u.f = 3.15f;
print(u, sizeof(float),0);
print(u, sizeof(float),1);
u.d = 3.15;
print(u, sizeof(double),0);
print(u, sizeof(double),1);
return 0;
}
you can test it yourself: https://ideone.com/7ABZaj

Changing the Endiannes of an integer which can be 2,4 or 8 bytes using a switch-case statement

In a (real time) system, computer 1 (big endian) gets an integer data from from computer 2 (which is little endian). Given the fact that we do not know the size of int, I check it using a sizeof() switch statement and use the __builtin_bswapX method accordingly as follows (assume that this builtin method is usable).
...
int data;
getData(&data); // not the actual function call. just represents what data is.
...
switch (sizeof(int)) {
case 2:
intVal = __builtin_bswap16(data);
break;
case 4:
intVal = __builtin_bswap32(data);
break;
case 8:
intVal = __builtin_bswap64(data);
break;
default:
break;
}
...
is this a legitimate way of swapping the bytes for an integer data? Or is this switch-case statement totally unnecessary?
Update: I do not have access to the internals of getData() method, which communicates with the other computer and gets the data. It then just returns an integer data which needs to be byte-swapped.
Update 2: I realize that I caused some confusion. The two computers have the same int size but we do not know that size. I hope it makes sense now.

Seems odd to assume the size of int is the same on 2 machines yet compensate for variant endian encodings.
The below only informs the int size of the receiving side and not the sending side.
switch(sizeof(int))
The sizeof(int) is the size, in char of an int on the local machine. It should be sizeof(int)*CHAR_BIT to get the bit size. [Op has edited the post]
The sending machine should detail the data width, as a 16, 32, 64- bit without regard to its int size and the receiving end should be able to detect that value as part of the message or an agreed upon width should be used.
Much like hton() to convert from local endian to network endian, the integer size with these function is moving toward fixed width integers like
#include <netinet/in.h>
uint32_t htonl(uint32_t hostlong);
uint16_t htons(uint16_t hostshort);
uint32_t ntohl(uint32_t netlong);
uint16_t ntohs(uint16_t netshort);
So suggest sending/receiving the "int" as a 32-bit uint32_t in network endian.
[Edit]
Consider computers exist that have different endian (little and big are the most common, others exist) and various int sizes with bit width 32 (common), 16, 64 and maybe even some odd-ball 36 bit and such and room for growth to 128-bit. Let us assume N combinations. Rather than write code to convert from 1 of N to N different formats (N*N) routines, let us define a network format and fix its endian to big and bit-width to 32. Now each computer does not care nor need to know the int width/endian of the sender/recipient of data. Each platform get/receives data in a locally optimized method from its endian/int to network endian/int-width.
OP describes not knowing the the sender's int width yet hints that the int width on the sender/receiver might be the same as the local machine. If the int widths are specified to be the same and the endian are specified to be one big/one little as described, then OP's coding works.
However, such a "endians are opposite and int-width the same" seems very selective. I would prepare code to cope with a interchange standard (network standard) as certainly, even if today it is "opposite endian, same int", tomorrow will evolved to a network standard.

A portable approach would not depend on any machine properties, but only rely on mathematical operations and a definition of the communication protocol that is also hardware independent. For example, given that you want to store bytes in a defined way:
void serializeLittleEndian(uint8_t *buffer, uint32_t data) {
size_t i;
for (i = 0; i < sizeof(uint32_t); ++i) {
buffer[i] = data % 256;
data /= 256;
}
}
and to restore that data to whatever machine:
uint32_t deserializeLittleEndian(uint8_t *buffer) {
uint32_t data = 0;
size_t i;
for (i = 0; i < sizeof(uint32_t); ++i) {
data *= 256;
data += buffer[i];
}
return data;
}
EDIT: This is not portable to systems with other than 8 bits per byte due to the uses of int8_t and int32_t. The use of type int8_t implies a system with 8 bit chars. However, it will not compile for systems where these conditions are not met. Thanks to Olaf and Chqrlie.

Yes, this is totally cool - given you fix your switch for proper sizeof return values. One might be a little fancy and provide, for example, template specializations based on the size of int. But a switch like this is totally cool and will not produce any branches in optimized code.

As already mentioned, you generally want to define a protocol for communications across networks, which the hton/ntoh functions are mostly meant for. Network byte order is generally treated as big endian, which is what the hton/ntoh functions use. If the majority of your machines are little endian, it may be better to standardize on it instead though.
A couple people have been critical of using __builtin_bswap, which I personally consider fine as long you don't plan to target compilers that don't support it. Although, you may want to read Dan Luu's critique of intrinsics.
For completeness, I'm including a portable version of bswap that (at very least Clang) compiles into a bswap for x86(64).
#include <stddef.h>
#include <stdint.h>
size_t bswap(size_t x) {
for (size_t i = 0; i < sizeof(size_t) >> 1; i++) {
size_t d = sizeof(size_t) - i - 1;
size_t mh = ((size_t) 0xff) << (d << 3);
size_t ml = ((size_t) 0xff) << (i << 3);
size_t h = x & mh;
size_t l = x & ml;
size_t t = (l << ((d - i) << 3)) | (h >> ((d - i) << 3));
x = t | (x & ~(mh | ml));
}
return x;
}

how can split integers into bytes without using arithmetic in c?

I am implementing four basic arithmetic functions(add, sub, division, multiplication) in C.
the basic structure of these functions I imagined is
the program gets two operands by user using scanf,
and the program split these values into bytes and compute!
I've completed addition and subtraction,
but I forgot that I shouldn't use arithmetic functions,
so when splitting integer into single bytes,
I wrote codes like
while(quotient!=0){
bin[i]=quotient%2;
quotient=quotient/2;
i++;
}
but since there is arithmetic functions that i shouldn't use..
so i have to rewrite that splitting parts,
but i really have no idea how can i split integer into single byte without using
% or /.

To access the bytes of a variable type punning can be used.
According to the Standard C (C99 and C11), only unsigned char brings certainty to perform this operation in a safe way.
This could be done in the following way:
typedef unsigned int myint_t;
myint_t x = 1234;
union {
myint_t val;
unsigned char byte[sizeof(myint_t)];
} u;
Now, you can of course access to the bytes of x in this way:
u.val = x;
for (int j = 0; j < sizeof(myint_t); j++)
printf("%d ",u.byte[j]);
However, as WhozCrag has pointed out, there are issues with endianness.
It cannot be assumed that the bytes are in determined order.
So, before doing any computation with bytes, your program needs to check how the endianness works.
#include <limits.h> /* To use UCHAR_MAX */
unsigned long int ByteFactor = 1u + UCHAR_MAX; /* 256 almost everywhere */
u.val = 0;
for (int j = sizeof(myint_t) - 1; j >= 0 ; j--)
u.val = u.val * ByteFactor + j;
Now, when you print the values of u.byte[], you will see the order in that bytes are arranged for the type myint_t.
The less significant byte will have value 0.

I assume 32 bit integers (if not the case then just change the sizes) there are more approaches:
BYTE pointer
#include<stdio.h>
int x; // your integer or whatever else data type
BYTE *p=(BYTE*)&x;
x=0x11223344;
printf("%x\n",p[0]);
printf("%x\n",p[1]);
printf("%x\n",p[2]);
printf("%x\n",p[3]);
just get the address of your data as BYTE pointer
and access the bytes directly via 1D array
union
#include<stdio.h>
union
{
int x; // your integer or whatever else data type
BYTE p[4];
} a;
a.x=0x11223344;
printf("%x\n",a.p[0]);
printf("%x\n",a.p[1]);
printf("%x\n",a.p[2]);
printf("%x\n",a.p[3]);
and access the bytes directly via 1D array
[notes]
if you do not have BYTE defined then change it for unsigned char
with ALU you can use not only %,/ but also >>,& which is way faster but still use arithmetics
now depending on the platform endianness the output can be 11,22,33,44 of 44,33,22,11 so you need to take that in mind (especially for code used in multiple platforms)
you need to handle sign of number, for unsigned integers there is no problem
but for signed the C uses 2'os complement so it is better to separate the sign before spliting like:
int s;
if (x<0) { s=-1; x=-x; } else s=+1;
// now split ...
[edit2] logical/bit operations
x<<n,x>>n - is bit shift left and right of x by n bits
x&y - is bitwise logical and (perform logical AND on each bit separately)
so when you have for example 32 bit unsigned int (called DWORD) yu can split it to BYTES like this:
DWORD x; // input 32 bit unsigned int
BYTE a0,a1,a2,a3; // output BYTES a0 is the least significant a3 is the most significant
x=0x11223344;
a0=DWORD((x )&255); // should be 0x44
a1=DWORD((x>> 8)&255); // should be 0x33
a2=DWORD((x>>16)&255); // should be 0x22
a3=DWORD((x>>24)&255); // should be 0x11
this approach is not affected by endianness
but it uses ALU
the point is shift the bits you want to position of 0..7 bit and mask out the rest
the &255 and DWORD() overtyping is not needed on all compilers but some do weird stuff without them especially on signed variables like char or int
x>>n is the same as x/(pow(2,n))=x/(1<<n)
x&((1<<n)-1) is the same as x%(pow(2,n))=x%(1<<n)
so (x>>8)=x/256 and (x&255)=x%256

how to cast uint8_t array of 4 to uint32_t in c

I am trying to cast an array of uint8_t to an array of uint32_t, but it seems not to be working.
Can any one help me on this. I need to get uint8_t values to uint32_t.
I can do this with shifting but i think there is a easy way.
uint32_t *v4full;
v4full=( uint32_t *)v4;
while (*v4full) {
if (*v4full & 1)
printf("1");
else
printf("0");
*v4full >>= 1;
}
printf("\n");

Given the need to get uint8_t values to uint32_t, and the specs on in4_pton()...
Try this with a possible correction on the byte order:
uint32_t i32 = v4[0] | (v4[1] << 8) | (v4[2] << 16) | (v4[3] << 24);

There is a problem with your example - actually with what you are trying to do (since you don't want the shifts).
See, it is a little known fact, but you're not allowed to switch pointer types in this manner
specifically, code like this is illegal:
type1 *vec1=...;
type2 *vec2=(type2*)vec1;
// do stuff with *vec2
The only case where this is legal is if type2 is char (or unsigned char or const char etc.), but if type2 is any other type (uint32_t in your example) it's against the standard and may introduce bugs to your code if you compile with -O2 or -O3 optimization.
This is called the "strict-aliasing rule" and it allows compilers to assume that pointers of different types never point to related points in memory - so that if you change the memory of one pointer, the compiler doesn't have to reload all other pointers.
It's hard for compilers to find instances of breaking this rule, unless you make it painfully clear to it. For example, if you change your code to do this:
uint32_t v4full=*((uint32_t*)v4);
and compile using -O3 -Wall (I'm using gcc) you'll get the warning:
warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
So you can't avoid using the shifts.
Note: it will work on lower optimization settings, and it will also work in higher settings if you never change the info pointer to by v4 and v4_full. It will work, but it's still a bug and still "against the rules".

If v4full is a pointer then the line
uint32_t *v4full;
v4full=( uint32_t)&v4;
Should throw an error or at least a compiler warning. Maybe you mean to do
uint32_t *v4full;
v4full=( uint32_t *) v4;
Where I assume v4 is itself a pointer to a uint8 array. I realize I am extrapolating from incomplete information…
EDIT since the above appears to have addressed a typo, let's try again.
The following snippet of code works as expected - and as I think you want your code to work. Please comment on this - how is this code not doing what you want?
#include <stdio.h>
#include <inttypes.h>
int main(void) {
uint8_t v4[4] = {1,2,3,4};
uint32_t *allOfIt;
allOfIt = (uint32_t*)v4;
printf("the number is %08x\n", *allOfIt);
}
Output:
the number is 04030201
Note - the order of the bytes in the printed number is reversed - you get 04030201 instead of 01020304 as you might have expected / wanted. This is because my machine (x86 architecture) is little-endian. If you want to make sure that the order of the bytes is the way you want it (in other words, that element [0] corresponds to the most significant byte) you are better off using #bvj's solution - shifting each of the four bytes into the right position in your 32 bit integer.
Incidentally, you can see this earlier answer for a very efficient way to do this, if needed (telling the compiler to use a built in instruction of the CPU).

One other issue that makes this code non-portable is that many architectures require a uint32_t to be aligned on a four-byte boundary, but allow uint8_t to have any address. Calling this code on an improperly-aligned array would then cause undefined behavior, such as crashing the program with SIGBUS. On these machines, the only way to cast an arbitrary uint8_t[] to a uint32_t[] is to memcpy() the contents. (If you do this in four-byte chunks, the compiler should optimize to whichever of an unaligned load or two-loads-and-a-shift is more efficient on your architecture.)
If you have control over the declaration of the source array, you can #include <stdalign.h> and then declare alignas(uint32_t) uint8_t bytes[]. The classic solution is to declare both the byte array and the 32-bit values as members of a union and type-pun between them. It is also safe to use pointers obtained from malloc(), since these are guaranteed to be suitably-aligned.

This is one solution:
/* convert character array to integer */
uint32_t buffChar_To_Int(char *array, size_t n){
int number = 0;
int mult = 1;
n = (int)n < 0 ? -n : n; /* quick absolute value check */
/* for each character in array */
while (n--){
/* if not digit or '-', check if number > 0, break or continue */
if((array[n] < '0' || array[n] > '9') && array[n] != '-'){
if(number)
break;
else
continue;
}
if(array[n] == '-'){ /* if '-' if number, negate, break */
if(number){
number = -number;
break;
}
}
else{ /* convert digit to numeric value */
number += (array[n] - '0') * mult;
mult *= 10;
}
}
return number;
}

One more solution:
u32 ip;
if (!in4_pton(str, -1, (u8 *)&ip, -1, NULL))
return -EINVAL;
... use ip as it defined above - (as variable of type u32)
Here we use result of in4_pton function (ip) without any additional variables and castings.