using bit-fields as representation for integers in c [duplicate] - c

This question already has answers here:
bit vector implementation of sets
(2 answers)
Closed 6 years ago.
In my C class we were given an assignment:
Write an interactive program (standard input/output). Define the new type set using typedef which can hold a set of integers in the range 0-127. The data structure has to be as efficient as possible in terms of storage (hint: working with bits). Also you need to define 6 global variables A,B,C,D,E,F of type set. All operations on sets in the program will be on these 6 variables.
This command read_set A,5,6,7,4,5,4,-1 will read user's input of integers while -1 means end of user's input. Other commands a user can use: print_set A - prints the set in increasing order, union_set A,B,C does union on 2 sets and saves the output in a third set, intersect_set A,B,C - determines the intersection of 2 sets and saves the output to a third set.
As far as I understand I need to use bit-fields. I could create a table of integers from 0-127. Then I could create the 6 variables A,B,C,D,E,F using set type definition and giving 128 bit-fields to each variable. Then if a user inputs 15 I would turn on the the bit which represents 15 in the data type. I'm really not sure if this is the way, because it's not clear to me how I would arrange bit-fields such that I can turn on exactly 15-th bit if I need to, I would need to convert somehow an integer to bit-field name... Also print_set prints the set in increasing order so how could I re-arrange bit-fields for this?
Really hope you have some ideas.

Yes, each of the sets called A, B, C, D, E and F is represented by a couple of unsigned long long integers like this:
typedef struct {
unsigned long long high;
unsigned long long low;
} Set;
See https://en.wikipedia.org/wiki/C_data_types
This gives you 128 bits of data in a Set (64 bits for the high numbers 64 to 127, and 64 bits for the low numbers 0 to 63).
Then you just need to do some bit manipulation like this: http://www.tutorialspoint.com/ansi_c/c_bits_manipulation.htm
For a number between 0 and 63, you'd shift 1 to the left x times and then set that bit on the "low" field.
For a number between 64 and 127, you'd shift 1 to the left x-64 times and then set that bit on the "high" field.
Hope this helps!

Using bitfields for this assignment will prove very cumbersome because of alignment issues, and you cannot define arrays of bitfields anyway. I would suggest using an array of bytes (unsigned char) and packing values into this array. A 7-bit value spanning at most 2 bytes.
The array for count values should be allocated with a size of (count + 7) / 8 bytes. In order to conserve space, you can store small sets in an integer and larger sets using an allocated array.
The datatype would look like:
#include <stdint.h>
#include <stdlib.h>
typedef struct set {
size_t count;
union {
uintptr_t v;
unsigned char *a;
};
} set;
Here is how to extract the n-th value:
int get_7bits(const set *s, size_t n) {
if (s == NULL || n >= s->count) {
return -1;
} else
if (n < sizeof(uintptr_t) * CHAR_BIT / 7) {
return (s->v >> (n * 7)) & 127;
} else {
size_t i = n / 7;
int shift = n % 7;
if (shift <= CHAR_BIT - 7) {
/* value fits in one byte */
return (s->a[i] >> shift) & 127;
} else {
/* value spans 2 bytes */
return ((s->a[i] | (s->a[i + 1] << CHAR_BIT)) >> shift) & 127;
}
}
}
You can write the other access functions and complete your assignment.

Related

How to analyze bytes of a variable's value in C

is it possible to divide for example an integer in n bits?
For example, since an int variable has a size of 32 bits (4 bytes) is it possible to divide the number in 4 "pieces" of 8 bits and put them in 4 other variables that have a size of 8 bits?
I solved using unsigned char *pointer pointing to the variable that I want to analyze bytes, something like this:
int x = 10;
unsigned char *p = (unsigned char *) &x;
//Since my cpu is little endian I'll print bytes from the end
for(int i = sizeof(int) - 1; i >= 0; i--)
//print hexadecimal bytes
printf("%.2x ", p[i]);
Yes, of course it is. But generally we just use bit operations directly on the bits (called bitops) using bitwise operators defined for all discrete integer types.
For instance, if you need to test the 5th least significant bit you can use x &= 1 << 4 to have x just to have the 5th bit set, and all others set to zero. Then you can use if (x) to test if it has been set; C doesn't use a boolean type but assumes that zero is false and any other value means true. If you store 1 << 4 into a constant then you have created a "(bit) mask" for that particular bit.
If you need a value 0 or 1 then you can use a shift the other way and use x = (x >> 4) & 1. This is all covered in most C books, so I'd implore you to read about these bit operations there.
There are many Q/A's here how to split integers into bytes, see e.g. here. In principle you can store those in a char, but if you may require integer operations then you can also split the int into multiple values. One problem with that is that an int is just defined to at least store values from -32768 to 32767. That means that the number of bytes in an int can be 2 bytes or more.
In principle it is also possible to use bit fields but I'd be hesitant to use those. With an int you will at least know that the bits will be stored in the least significant bits.

What are bit vectors and how do I use them to convert chars to ints?

Here's the explanation for our task when implementing a set data structure in C "The set is constructed as a Bit vector, which in turn is implemented as an array of the data type char."
My confusion arises from the fact that almost all the functions we're given take in a set and an int as shown in the function below yet our array is made up of chars. How would I call functions if they can only take in ints when I have an array of chars? Here's my attempt att calling the function in my main function as well as the structs and example of function used.
int main(){
set *setA = set_empty();
set_insert("green",setA );
}
struct set {
int capacity;
int size;
char *array;
};
void set_insert(const int value, set *s)
{
if (!set_member_of(value, s)) {
int bit_in_array = value; // To make the code easier to read
// Increase the capacity if necessary
if (bit_in_array >= s->capacity) {
int no_of_bytes = bit_in_array / 8 + 1;
s->array = realloc(s->array, no_of_bytes);
for (int i = s->capacity / 8 ; i < no_of_bytes ; i++) {
s->array[i] = 0;
}
s->capacity = no_of_bytes * 8;
}
// Set the bit
int byte_no = bit_in_array / 8;
int bit = 7 - bit_in_array % 8;
s->array[byte_no] = s->array[byte_no] | 1 << bit;
s->size++;
}
}
TL;DR: The types of the index (value in your case) and the indexed element of an array (array in your case) are independent from each other. There is no conversion.
Most digital systems these days store their value in bits, each of them can hold only 0 or 1.
An integer value can therefore be viewed as a binary number, a value to the base of 2. It is a sequence of bits, each of which assigned a power of two. See the Wikipedia page on two's complement for details. But this aspect is not relevant for your issue.
Relevant is the view that an integer value is a sequence of bits. The simplest integer type of C is the char. It holds commonly 8 bits. We can assign indexes to these bits, and therefore think of them as a "vector", mathematically. Some people start to count "from the left", others start to count "from the right". Other common terms in this area are "MSB" and "LSB", see this Wikipedia page for more.
To access an element of a vector, you use its index. On a common char this is commonly a value between 0 and 7, inclusively. Remember, in CS we start to count from zero. The type of this index can be any integer wide enough to hold the value, for example an int. This is why you use a int in your case. This data type is independent from the type of elements in the vector.
How to solve the problem, if you need more than 8 bits? Well, then you can use more chars. This is the reason why your structure holds (a pointer to) an array of chars. All n chars of the array represent a vector of n * 8 bits, and you call this amount the "capacity".
Another option is to use a wider type, like a long or even a long long. And you can build an array of elements of these types, too. However, the widths of such types are commonly not equal in all systems.
BTW, the mathematical "vector" is the same thing as an "array" in CS. Different science areas, different terms.
Now, what is a "set"? I hope your script explains that a bit better than I can... It is a collection that contains an element only once or not at all. All elements are distinct. In your case the elements are represented by (small) integers.
Given a vector of bits of arbitrary capacity, we can "map" an element of a set on a bit of this vector by its index. This is done by storing a 1 in the mapped bit, if the element is present in the set, or 0, if it is not.
To access the correct bit, we need the index of the single char in the array, and the index of the bit in this char. You calculate these values in the lines:
int byte_no = bit_in_array / 8;
int bit = 7 - bit_in_array % 8;
All variables are of type int, most probably because this is the common type. It can be any other integer type, like a size_t for example, as long as it can hold the necessary values, even different types for the different variables.
With these two values at hand, you can "insert" the element into the set. For this action, you set the respective bit to 1:
s->array[byte_no] = s->array[byte_no] | 1 << bit;
Please note that the shift operator << has a higher precedence than the bit-wise OR operator |. Some coding style rules request to use parentheses to make this clear, but you can also use this even clearer assignment:
s->array[byte_no] |= 1 << bit;

Find number of bits in a data type

I need to write a macro named CountBitsM. this macro has one parameter and produces a value of type int. The parameter is any expression with an object data type or the literal name of any object data type, so i used int. This macro determines the number of bits of storage used for the data type on any machine in which its run. And i can use a macro from limits.h. Here is what i wrote, does this look right?
#ifndef COUNTBITSM_H
#define COUNTBITSM_H
#include <limits.h>
#define CountBitsM(int) ((int)*(CHAR_BIT))
#endif
Second question was to create a function CountIntBitsF that counts the number of bits used to represent a type int value on any machine. However, i can NOT USE any #define, or #include header files, or any macro. I also can not use any multiplications or divisions. The hint that was given was to start with a value of 1 in a type unsigned int variable and left-shift it one bit at a time, keeping count of number of shifts, until the variables value becomes 0. Here is what i have so far:
int CountIntBitsF(void)
{
int IntgMax = 8;
unsigned int count = 1;
while (IntgMax = IntgMax>>2) count++;
return count;
}
First off, i am not supposed to use division or multiplication so am i doing the shift properly? And i cant assume a char/byte contains 8 or any other specific number of bits. So how or what should i set my IntgMax to? Thanks for any help. I am new to C.
Macro for Bits in a Type
A macro to produce the number of bits used to represent a type in storage is:
#define CountBitsM(x) (sizeof (x) * CHAR_BIT)
However, this produces a result with type size_t (usually). If you really need an int result as stated in the question, convert it (but be aware overflow becomes possible):
#define CountBitsM(x) ((int) (sizeof (x) * CHAR_BIT))
Counting Bits
The second question asks to count the number of bits “to represent a type int value” by shifting bits in an unsigned value. There are two theoretical problems here. One is that the number of bits used to represent a value may including padding bits, and counting the bits by shifting a 1 through them only counts the value bits, not the padding bits. The second is that an int may have more padding bits than an unsigned; it may use fewer bits for the sign and value. Overwhelmingly, modern systems will not have these issues; the number of used bits in an int will be the same as the total number of bits used to store it and the number of bits used in an unsigned.
That said, you can count the number of bits in an unsigned object with:
int count = 0;
for (unsigned u = 1; 0 != u; u <<= 1)
++count;
This repeatedly shifts the bit in u left until it is shifted out, while counting the number of iterations required to do this. Note that the bits in an int cannot properly be counted this way, because the behavior of left shift is not defined by the C standard when it overflows an int.
Question one
#define NBITS(type_or_object) (sizeof(type_or_object) * CHAR_BIT)
or without multiplication
#define NBITS(type_or_object) (sizeof(type_or_object) << (CHAR_BIT == 8 ? 3 : CHAR_BIT == 16 ? 4 : CHAR_BIT == 32 ? 5 : 0))
Second question:
For the most popular two's complement (but I think it will also work for sign bit as well as -0 < 0 as I remember). Ir is for signed type. Unsigned types are easy.
int CountIntBits(void)
{
int IntgMax = 1;
int count = 1;
while (IntgMax > 0 )
{
count++;
IntgMax <<= 1;
}
return count;
}
int main(void)
{
printf("%d\n", CountIntBits());
}
or (also no multiplication :) )
int CountIntBits(void)
{
int shift = CHAR_BIT == 8 ? 3 : CHAR_BIT == 16 ? 4 : CHAR_BIT == 32 ? 5 : 0;
return sizeof(int) << shift;
}
for unsigned types:
int CountIntBits(void)
{
unsigned IntgMax = 1;
int count = 0;
while (IntgMax)
{
count++;
IntgMax <<= 1;
}
return count;
}

how can split integers into bytes without using arithmetic in c?

I am implementing four basic arithmetic functions(add, sub, division, multiplication) in C.
the basic structure of these functions I imagined is
the program gets two operands by user using scanf,
and the program split these values into bytes and compute!
I've completed addition and subtraction,
but I forgot that I shouldn't use arithmetic functions,
so when splitting integer into single bytes,
I wrote codes like
while(quotient!=0){
bin[i]=quotient%2;
quotient=quotient/2;
i++;
}
but since there is arithmetic functions that i shouldn't use..
so i have to rewrite that splitting parts,
but i really have no idea how can i split integer into single byte without using
% or /.
To access the bytes of a variable type punning can be used.
According to the Standard C (C99 and C11), only unsigned char brings certainty to perform this operation in a safe way.
This could be done in the following way:
typedef unsigned int myint_t;
myint_t x = 1234;
union {
myint_t val;
unsigned char byte[sizeof(myint_t)];
} u;
Now, you can of course access to the bytes of x in this way:
u.val = x;
for (int j = 0; j < sizeof(myint_t); j++)
printf("%d ",u.byte[j]);
However, as WhozCrag has pointed out, there are issues with endianness.
It cannot be assumed that the bytes are in determined order.
So, before doing any computation with bytes, your program needs to check how the endianness works.
#include <limits.h> /* To use UCHAR_MAX */
unsigned long int ByteFactor = 1u + UCHAR_MAX; /* 256 almost everywhere */
u.val = 0;
for (int j = sizeof(myint_t) - 1; j >= 0 ; j--)
u.val = u.val * ByteFactor + j;
Now, when you print the values of u.byte[], you will see the order in that bytes are arranged for the type myint_t.
The less significant byte will have value 0.
I assume 32 bit integers (if not the case then just change the sizes) there are more approaches:
BYTE pointer
#include<stdio.h>
int x; // your integer or whatever else data type
BYTE *p=(BYTE*)&x;
x=0x11223344;
printf("%x\n",p[0]);
printf("%x\n",p[1]);
printf("%x\n",p[2]);
printf("%x\n",p[3]);
just get the address of your data as BYTE pointer
and access the bytes directly via 1D array
union
#include<stdio.h>
union
{
int x; // your integer or whatever else data type
BYTE p[4];
} a;
a.x=0x11223344;
printf("%x\n",a.p[0]);
printf("%x\n",a.p[1]);
printf("%x\n",a.p[2]);
printf("%x\n",a.p[3]);
and access the bytes directly via 1D array
[notes]
if you do not have BYTE defined then change it for unsigned char
with ALU you can use not only %,/ but also >>,& which is way faster but still use arithmetics
now depending on the platform endianness the output can be 11,22,33,44 of 44,33,22,11 so you need to take that in mind (especially for code used in multiple platforms)
you need to handle sign of number, for unsigned integers there is no problem
but for signed the C uses 2'os complement so it is better to separate the sign before spliting like:
int s;
if (x<0) { s=-1; x=-x; } else s=+1;
// now split ...
[edit2] logical/bit operations
x<<n,x>>n - is bit shift left and right of x by n bits
x&y - is bitwise logical and (perform logical AND on each bit separately)
so when you have for example 32 bit unsigned int (called DWORD) yu can split it to BYTES like this:
DWORD x; // input 32 bit unsigned int
BYTE a0,a1,a2,a3; // output BYTES a0 is the least significant a3 is the most significant
x=0x11223344;
a0=DWORD((x )&255); // should be 0x44
a1=DWORD((x>> 8)&255); // should be 0x33
a2=DWORD((x>>16)&255); // should be 0x22
a3=DWORD((x>>24)&255); // should be 0x11
this approach is not affected by endianness
but it uses ALU
the point is shift the bits you want to position of 0..7 bit and mask out the rest
the &255 and DWORD() overtyping is not needed on all compilers but some do weird stuff without them especially on signed variables like char or int
x>>n is the same as x/(pow(2,n))=x/(1<<n)
x&((1<<n)-1) is the same as x%(pow(2,n))=x%(1<<n)
so (x>>8)=x/256 and (x&255)=x%256

Enum and strings in C

I have a char* string coming in. I need to store it accordingly.
The string can be any of those values { UK, GD, BD, ER, WR, FL}
If I want to keep them as enumerated type, which data type is the best to use. Like for 6 values three bits is enough, but how to store three bits in C?
What you want is a Bit Field:
typedef struct {
unsigned char val : 2; //use 2 bits
unsigned char : 6; // remaining 6 bits
} valContainer;
...
valContainer x;
x.val = GD;
Do note that there isn't really a way to store less than one byte, as the definition of a byte is the smallest amount of memory the computer can address. This is just a method of having names associated with different bits in a byte.
Also, of course, 2 bits is not enough for 6 values (2 bits hold 4 distinct values). So you really want at least 3 bits (8 distinct values).
Just store them as an unsigned short. Unless you're storing other things in your struct to fill out a whole word, you're WAY prematurely optimizing. The compiler will have to pad out your data anyway.
As the answer by Eric Finn suggests, you can use bit fields to store a data element of 3 bits. However, this is only good if you have something else to store in the same byte.
struct {
unsigned char value: 3;
unsigned char another: 4;
unsigned char yet_another: 5;
// 12 bits declared so far; 4 more "padding" bits are unusable
} whatever;
If you want to store an array of many such small elements, you have to do it in a different way, for example, clumping 10 elements in each 32-bit word.
int n = ...; // number of elements to store
uint32_t *data = calloc(n / 10, sizeof(*data));
for (int i = 0; i < n; i++)
{
int value = read_string_and_convert_to_int();
data[i / 10] &= ~(7 << (i % 10 * 3));
data[i / 10] |= value << (i % 10 * 3);
}
If you want to have only one element (or a few), just use enum or int.

Resources