C variable smaller then 8-bit - c

I'm writing C implementation of Conway's Game of Life and pretty much done with the code, but I'm wondering what is the most efficient way to storage the net in the program.
The net is two dimensional and stores whether cell (x, y) is alive (1) or dead (0). Currently I'm doing it with unsigned char like that:
struct:
typedef struct {
int rows;
int cols;
unsigned char *vec;
} net_t;
allocation:
n->vec = calloc( n->rows * n->cols, sizeof(unsigned char) );
filling:
i = ( n->cols * (x - 1) ) + (y - 1);
n->vec[i] = 1;
searching:
if( n->vec[i] == 1 )
but I don't really need 0-255 values - I only need 0 - 1, so I'm feeling that doing it like that is a waste of space, but as far as I know 8-bit char is the smallest type in C.
Is there any way to do it better?
Thanks!

The smallest declarable / addressable unit of memory you can address/use is a single byte, implemented as unsigned char in your case.
If you want to really save on space, you could make use of masking off individual bits in a character, or using bit fields via a union. The trade-off will be that your code will execute a bit slower, and will certainly be more complicated.
#include <stdio.h>
union both {
struct {
unsigned char b0: 1;
unsigned char b1: 1;
unsigned char b2: 1;
unsigned char b3: 1;
unsigned char b4: 1;
unsigned char b5: 1;
unsigned char b6: 1;
unsigned char b7: 1;
} bits;
unsigned char byte;
};
int main ( ) {
union both var;
var.byte = 0xAA;
if ( var.bits.b0 ) {
printf("Yes\n");
} else {
printf("No\n");
}
return 0;
}
References
Union and Bit Fields, Accessed 2014-04-07, <http://www.rightcorner.com/code/CPP/Basic/union/sample.php>
Access Bits in a Char in C, Accessed 2014-04-07, <https://stackoverflow.com/questions/8584577/access-bits-in-a-char-in-c>
Struct - Bit Field, Accessed 2014-04-07, <http://cboard.cprogramming.com/c-programming/10029-struct-bit-fields.html>

Unless you're working on an embedded platform, I wouldn't be too concerned about the size your net takes up by using an unsigned char to store only a 1 or 0.
To address your specific question: char is the smallest of the C data types. char, signed char, and unsigned char are all only going to take up 1 byte each.
If you want to make your code smaller you can use bitfields to decrees the amount of space you take up, but that will increase the complexity of your code.
For a simple exercise like this, I'd be more concerned about readability than size. One way you can make it more obvious what you're doing is switch to a bool instead of a char.
#include <stdbool.h>
typedef struct {
int rows;
int cols;
bool *vec;
} net_t;
You can then use true and false which, IMO, will make your code much easier to read and understand when all you need is 1 and 0.
It will take up at least as much space as the way you're doing it now, but like I said, consider what's really important in the program you're writing for the platform you're writing it for... it's probably not the size.

The smallest type on C as i know are the char (-128, 127), signed char (-128, 127), unsigned char (0, 255) types, all of them takes a whole byte, so if you are storing multiple bits values on different variables, you can instead use an unsigned char as a group of bits.
unsigned char lives = 128;
At this moment, lives have a 128 decimal value, which it's 10000000 in binary, so now you can use a bitwise operator to get a single value from this variable (like an array of bits)
if((lives >> 7) == 1) {
//This code will run if the 8 bit from right to left (decimal 128) it's true
}
It's a little complex, but finally you'll end up with a bit array, so instead of using multiple variables to store single TRUE / FALSE values, you can use a single unsigned char variable to store 8 TRUE / FALSE values.
Note: As i have some time out of the C/C++ world, i'm not 100% sure that it's "lives >> 7", but it's with the '>' symbol, a little research on it and you'll be ready to go.

You're correct that a char is the smallest type - and it is typically (8) bits, though this is a minimum requirement. And sizeof(char) or (unsigned char) is (1). So, consider using an (unsigned) char to represent (8) columns.
How many char's are required per row? It's (cols / 8), but we have to round up for an integer value:
int byte_cols = (cols + 7) / 8;
or:
int byte_cols = (cols + 7) >> 3;
which you may wish to store with in the net_t data structure. Then:
calloc(n->rows * n->byte_cols, 1) is sufficient for a contiguous bit vector.
Address columns and rows by x and y respectively. Setting (x, y) (relative to 0) :
n->vec[y * byte_cols + (x >> 3)] |= (1 << (x & 0x7));
Clearing:
n->vec[y * byte_cols + (x >> 3)] &= ~(1 << (x & 0x7));
Searching:
if (n->vec[y * byte_cols + (x >> 3)] & (1 << (x & 0x7)))
/* ... (x, y) is set... */
else
/* ... (x, y) is clear... */
These are bit manipulation operations. And it's fundamentally important to learn how (and why) this works. Google the term for more resources. This uses an eighth of the memory of a char per cell, so I certainly wouldn't consider it premature optimization.

Related

Array declaration of an unsigned char

I have a problem which states that an unsigned char array stores unsigned int's, with each int using 3 bytes only.
Why would my teacher initialize the vector like this unsigned char s[]="\x12\x34\x78\x9A\xBC\xDE\xFF" and not just simply with ints? I believe that the numbers are represented in hexadecimal?
Because your teacher is being stingy with his memory usage, probably.
By using a "packed" array of unsigned chars, each 24-bit integer can be stored using just 24 bits (assuming an 8-bit char, whhich is not very controversial here I hope).
Note that also chosing to use string notation makes it more compact in the source (although less readable); the first integer is "\x12\x34\x78", which in array notation would be 0x12, 0x34, 0x78 which is longer due to the commas (and spaces, which of course could be removed unlike the commas).
A possible compromise could be to use the fact that in C adjacent string literals are concatenated, and write each 24-bit number as a string of its own:
unsigned char s[] = "\x12\34\x78" "\x9a\xbc\xde";
That makes it easier to spot the boundaries of each number, but of course the repeated quotes take up space.
You can extract a single integer like so:
unsigned int unpack24(size_t index)
{
if(index >= (sizeof s) / 3)
return 0;
const unsigned int hi = s[3 * index];
const unsigned int mid = s[3 * index + 1];
const unsigned int low = s[3 * index + 2];
return low | (mid << 8) | (hi << 16);
}
Note that the above assumes big-endian numbers, so the first one would unpack to 0x123478; I can't know that this is correct of course.

How to split and recombine an unsigned long into signed shorts?

I need to store a large number, but due to limitations in an old game engine, I am restricted to working with signed short (I can, however, use as many of these as I want).
I need to split an unsigned long (0 to 4,294,967,295) into multiple signed short (-32,768 to 32,767). Then I need to recombine the multiple signed short into a new unsigned long later.
For example, take the number 4,000,000,000. This should be split into multiple signed short and then recombined into unsigned long.
Is this possible in C? Thanks.
In addition to dbush's answer you can also use a union, e.g.:
union
{
unsigned long longvalue;
signed short shortvalues[2];
}
value;
The array of two shorts overlays the single long value.
I assume your problem is finding a place to store these large values. There are options we haven't yet explored which don't involve splitting the values up and recombining them:
Write them to a file, and read them back later. This might seem silly at first, but considering the bigger picture, if the values end up in a file later on then this might seem like the most attractive option.
Declare your unsigned long to have static storage duration e.g. outside of any blocks of code A.K.A globally (I hate that term) or using the static keyword inside a block of code.
None of the other answers so far are strictly portable, not that it seems like it should matter to you. You seem to be describing a twos complement 16-bit signed short representation and a 32-bit unsigned long representation (you should put assertions in place to ensure this is the case), which has implications that restrict the options for the implementation (that is, the C compiler, the OS, the CPU, etc)... so the portability issues associated with them are unlikely to occur. In case you're curious, however, I'll discuss those issues anyway.
The portability issues associated are that one type or the other might have padding bits causing the sizes to mismatch, and that there might be trap representations for short.
Changing the type but not the representation is by far much cleaner and easier to get right, though not portable; this includes the union hack, you could also avoid the union by casting an unsigned long * to a short *. These solutions are the cleanest solutions, which makes Ken Clement's answer my favourite so far, despite the non-portability.
Binary shifts (the >> and << operators), and (the & operator), or (|) operators introduce additional portability issues when you use them on signed types; they're also bulky and clumsy leading to more code to debug and a higher chance that mistakes are made.
You need to consider that while ULONG_MAX is guaranteed to be at least 4,294,967,295, SHORT_MIN is not guaranteed by the C standard to be -32,768; it might be -32,767 (which is quite uncommon indeed, though still possible)... There might be a negative zero or trap representation in place of that -32,768 value.
This means you can't portably rely upon a pair of signed shorts being able to represent all of the values of an unsigned long; even when the sizes match up you need another bit to account for the two missing values.
With this in mind, you could use a third signed char... The implementation-defined and undefined behaviours of the shift approaches could be avoided that way.
signed short x = (value ) & 0xFFF,
y = (value >> 12) & 0xFFF,
z = (value >> 24) & 0xFFF;
value = (unsigned long) x
+ ((unsigned long) y << 12)
+ ((unsigned long) z << 24);
You can do it like this (I used fixed size types to properly illustrate how it works):
#include<stdio.h>
#include<stdint.h>
int main()
{
uint32_t val1;
int16_t val2a, val2b;
uint32_t val3;
val1 = 0x11223344;
printf("val1=%08x\n", val1);
// to short
val2a = val1 >> 16;
val2b = val1 & 0xFFFF;
printf("val2a=%04x\n", val2a);
printf("val2b=%04x\n", val2b);
// to long
val3 = (uint32_t)val2a << 16;
val3 |= (uint32_t)val2b;
printf("val3=%08x\n", val3);
return 0;
}
Output:
val1=11223344
val2a=1122
val2b=3344
val3=11223344
There are any number of ways to do it. One thing to consider is that unsigned long may not have the same size on different hardware/operating systems. You can use exact length types found in stdint.h to avoid ambiguity (e.g. uint8_t, uint16_t, etc.). One implementation incorporating exact types (and cheezy hex values) would be:
#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
#include <limits.h>
int main (void) {
uint64_t a = 0xfacedeadbeefcafe, b = 0;
uint16_t s[4] = {0};
uint32_t i = 0, n = 0;
printf ("\n a : 0x%16"PRIx64"\n\n", a);
/* separate uint64_t into 4 uint16_t */
for (i = 0; i < sizeof a; i += 2, n++)
printf (" s[%"PRIu32"] : 0x%04"PRIx16"\n", n,
(s[n] = (a >> (i * CHAR_BIT))));
/* combine 4 uint16_t into uint64_t */
for (n = i = 0; i < sizeof b; i += 2, n++)
b |= (uint64_t)s[n] << i * CHAR_BIT;
printf ("\n b : 0x%16"PRIx64"\n\n", b);
return 0;
}
Output
$ ./bin/uint64_16
a : 0xfacedeadbeefcafe
s[0] : 0xcafe
s[1] : 0xbeef
s[2] : 0xdead
s[3] : 0xface
b : 0xfacedeadbeefcafe
This is one possible solution (which assumes ulong is 32-bits, and sshort is 16-bits):
unsigned long L1, L2;
signed short S1, S2;
L1 = 0x12345678; /* Initial ulong to store away into two sshort */
S1 = L1 & 0xFFFF; /* Store component 1 */
S2 = L1 >> 16; /* Store component 2*/
L2 = S1 | (S2<<16); /* Retrive ulong from two sshort */
/* Print results */
printf("Initial value: 0x%08lx\n",L1);
printf("Stored component 1: 0x%04hx\n",S1);
printf("Stored component 2: 0x%04hx\n",S2);
printf("Retrieved value: 0x%08lx\n",L2);

Enum and strings in C

I have a char* string coming in. I need to store it accordingly.
The string can be any of those values { UK, GD, BD, ER, WR, FL}
If I want to keep them as enumerated type, which data type is the best to use. Like for 6 values three bits is enough, but how to store three bits in C?
What you want is a Bit Field:
typedef struct {
unsigned char val : 2; //use 2 bits
unsigned char : 6; // remaining 6 bits
} valContainer;
...
valContainer x;
x.val = GD;
Do note that there isn't really a way to store less than one byte, as the definition of a byte is the smallest amount of memory the computer can address. This is just a method of having names associated with different bits in a byte.
Also, of course, 2 bits is not enough for 6 values (2 bits hold 4 distinct values). So you really want at least 3 bits (8 distinct values).
Just store them as an unsigned short. Unless you're storing other things in your struct to fill out a whole word, you're WAY prematurely optimizing. The compiler will have to pad out your data anyway.
As the answer by Eric Finn suggests, you can use bit fields to store a data element of 3 bits. However, this is only good if you have something else to store in the same byte.
struct {
unsigned char value: 3;
unsigned char another: 4;
unsigned char yet_another: 5;
// 12 bits declared so far; 4 more "padding" bits are unusable
} whatever;
If you want to store an array of many such small elements, you have to do it in a different way, for example, clumping 10 elements in each 32-bit word.
int n = ...; // number of elements to store
uint32_t *data = calloc(n / 10, sizeof(*data));
for (int i = 0; i < n; i++)
{
int value = read_string_and_convert_to_int();
data[i / 10] &= ~(7 << (i % 10 * 3));
data[i / 10] |= value << (i % 10 * 3);
}
If you want to have only one element (or a few), just use enum or int.

Defining smallest possible sized macro in C

I want to define a boolean macro in C that uses less than 4 bytes. I have looked into this, and maybe it is possible to define an asm macro, with gcc, that could be less. It is important that the definition will be small because I will have tens of thousands of matrices which hold these boolean values, and it is important that they can be as memory efficient as possible. Ideally, I want to define a 4-bit, or 8-bit macro that represents true and false, and will evaluate as such in an if-statement.
Edit:
When I define a macro
#define True 0
#define False !True
and then print the size, it returns a size of 4 bytes, which is very inefficient.
Edit2:
I just read up on bitpacking, and however little bits I could have for a boolean would be best. I'm just not too sure how to bitpack a struck that has the size of a few bits.
Edit3:
#include <stdio.h>
#include <string.h>
#define false (unsigned char(0))
#define true (!false)
int main() {
if (true) {
printf("The size of true is %d\n", sizeof(true));
}
}
gives the following output
test.c: In function ‘main’:
test.c:8:9: error: expected ‘)’ before numeric constant
test.c:9:51: error: expected ‘)’ before numeric constant
Try this instead for your macros:
#define false ((unsigned char) 0)
#define true (!false)
This won't fix your space needs though. For more efficient storage, you need to use bits:
void SetBoolValue(int bitOffset, unsigned char *array, bool value)
{
int index = bitOffset >> 3;
int mask = 1 << (bitOffset & 0x07);
if (value)
array[index] |= mask;
else
array[index] &= ~mask;
}
bool GetBoolValue(int bitOffset, unsigned char *array)
{
int index = bitOffset >> 3;
int mask = 1 << (bitOffset & 0x07);
return array[index] & mask;
}
Where each value of "array" can hold 8 bools. On modern systems, it can be faster to use a U32 or U64 as the array, but it can take up more space for smaller amounts of data.
To pack larger amounts of data:
void SetMultipleBoolValues(int bitOffset, unsigned char *array, int value, int numBitsInValue)
{
for(int i=0; i<numBitsInValue; i++)
{
SetBoolValue(bitOffset + i, array, (value & (1 << i)));
}
}
And here would be a driver:
int main(void)
{
static char array[32]; // Static so it starts 0'd.
int value = 1234; // An 11-bit value to pack
for(int i=0; i<4; i++)
SetMultipleBoolValues(i * 11, array, value, 11); // 11 = 11-bits of data - do it 4 times
for(int i=0; i<32; i++)
printf("%c", array[i]);
return 0;
}
If you are using this in a structure, then you will want to use a bit field.
struct {
unsigned flag : 1;
/* other fields */
};
If you are wanting an array of boolean values, you should implement a bit vector (I was about to implement one, but Michael Dorgan's already done it).
First of all, there's no storage associated with your macros; they expand to the integer constants 0 and 1. The sizeof evaluates to 4 because the expressions have integer type. You can certainly assign those values to objects of smaller type (short or char).
For me, life got a lot simpler when I stopped using TRUE and FALSE macros1. Remember that in C, a zero-valued integral expression evaluates to false, and all non-zero-valued integral expressions evaluate to true.
If you want to store values into something smaller than 8 bits, then you're going to have to do your own bit packing, something like
#define TEST(x,bit) ((x) & (1 << (bit)))
#define SET(x,bit) ((x) |= (1 << (bit)))
#define CLEAR(x,bit) ((x) &= ~(1 << (bit)))
The smallest useful type for this is unsigned char. So if you need to store N single-bit values, you need an array of N/CHAR_BIT+1 elements. For example, to store 10 single-bit boolean values, you need 2 eight-bit array elements. Bits 0 through 7 will be stored in element 0, and bits 8 through 10 will be stored in element 1.
So, something like
#define MAX_BITS 24
unsigned char bits[MAX_BITS / CHAR_BIT + 1];
int bit = ...;
SET(bits[bit/CHAR_BIT], bit % CHAR_BIT);
if ( TEST(bits[bit/CHAR_BIT], bit % CHAR_BIT) )
{
// do something if bit is set
}
CLEAR(bits[bit/CHAR_BIT], bit % CHAR_BIT);
No warranties express or implied; I don't do a lot of bit twiddling. But hopefully this at least points you in the right direction.
1. The precipitating event was someone dropping a header where TRUE == FALSE. Not the most productive afternoon.
You should probably just use an unsigned char, it will be the smallest individually addressable type:
typedef unsigned char smallBool;
smallBool boolMatrix[M][N];
The above will use M * N bytes for the matrix.
Of course, wasting CHAR_BIT - 1 bits to store a single bit is ... wasteful. Consider bit-packing the boolean values.

How to convert from integer to unsigned char in C, given integers larger than 256?

As part of my CS course I've been given some functions to use. One of these functions takes a pointer to unsigned chars to write some data to a file (I have to use this function, so I can't just make my own purpose built function that works differently BTW). I need to write an array of integers whose values can be up to 4095 using this function (that only takes unsigned chars).
However am I right in thinking that an unsigned char can only have a max value of 256 because it is 1 byte long? I therefore need to use 4 unsigned chars for every integer? But casting doesn't seem to work with larger values for the integer. Does anyone have any idea how best to convert an array of integers to unsigned chars?
Usually an unsigned char holds 8 bits, with a max value of 255. If you want to know this for your particular compiler, print out CHAR_BIT and UCHAR_MAX from <limits.h> You could extract the individual bytes of a 32 bit int,
#include <stdint.h>
void
pack32(uint32_t val,uint8_t *dest)
{
dest[0] = (val & 0xff000000) >> 24;
dest[1] = (val & 0x00ff0000) >> 16;
dest[2] = (val & 0x0000ff00) >> 8;
dest[3] = (val & 0x000000ff) ;
}
uint32_t
unpack32(uint8_t *src)
{
uint32_t val;
val = src[0] << 24;
val |= src[1] << 16;
val |= src[2] << 8;
val |= src[3] ;
return val;
}
Unsigned char generally has a value of 1 byte, therefore you can decompose any other type to an array of unsigned chars (eg. for a 4 byte int you can use an array of 4 unsigned chars). Your exercise is probably about generics. You should write the file as a binary file using the fwrite() function, and just write byte after byte in the file.
The following example should write a number (of any data type) to the file. I am not sure if it works since you are forcing the cast to unsigned char * instead of void *.
int homework(unsigned char *foo, size_t size)
{
int i;
// open file for binary writing
FILE *f = fopen("work.txt", "wb");
if(f == NULL)
return 1;
// should write byte by byte the data to the file
fwrite(foo+i, sizeof(char), size, f);
fclose(f);
return 0;
}
I hope the given example at least gives you a starting point.
Yes, you're right; a char/byte only allows up to 8 distinct bits, so that is 2^8 distinct numbers, which is zero to 2^8 - 1, or zero to 255. Do something like this to get the bytes:
int x = 0;
char* p = (char*)&x;
for (int i = 0; i < sizeof(x); i++)
{
//Do something with p[i]
}
(This isn't officially C because of the order of declaration but whatever... it's more readable. :) )
Do note that this code may not be portable, since it depends on the processor's internal storage of an int.
If you have to write an array of integers then just convert the array into a pointer to char then run through the array.
int main()
{
int data[] = { 1, 2, 3, 4 ,5 };
size_t size = sizeof(data)/sizeof(data[0]); // Number of integers.
unsigned char* out = (unsigned char*)data;
for(size_t loop =0; loop < (size * sizeof(int)); ++loop)
{
MyProfSuperWrite(out + loop); // Write 1 unsigned char
}
}
Now people have mentioned that 4096 will fit in less bits than a normal integer. Probably true. Thus you can save space and not write out the top bits of each integer. Personally I think this is not worth the effort. The extra code to write the value and processes the incoming data is not worth the savings you would get (Maybe if the data was the size of the library of congress). Rule one do as little work as possible (its easier to maintain). Rule two optimize if asked (but ask why first). You may save space but it will cost in processing time and maintenance costs.
The part of the assignment of: integers whose values can be up to 4095 using this function (that only takes unsigned chars should be giving you a huge hint. 4095 unsigned is 12 bits.
You can store the 12 bits in a 16 bit short, but that is somewhat wasteful of space -- you are only using 12 of 16 bits of the short. Since you are dealing with more than 1 byte in the conversion of characters, you may need to deal with endianess of the result. Easiest.
You could also do a bit field or some packed binary structure if you are concerned about space. More work.
It sounds like what you really want to do is call sprintf to get a string representation of your integers. This is a standard way to convert from a numeric type to its string representation. Something like the following might get you started:
char num[5]; // Room for 4095
// Array is the array of integers, and arrayLen is its length
for (i = 0; i < arrayLen; i++)
{
sprintf (num, "%d", array[i]);
// Call your function that expects a pointer to chars
printfunc (num);
}
Without information on the function you are directed to use regarding its arguments, return value and semantics (i.e. the definition of its behaviour) it is hard to answer. One possibility is:
Given:
void theFunction(unsigned char* data, int size);
then
int array[SIZE_OF_ARRAY];
theFunction((insigned char*)array, sizeof(array));
or
theFunction((insigned char*)array, SIZE_OF_ARRAY * sizeof(*array));
or
theFunction((insigned char*)array, SIZE_OF_ARRAY * sizeof(int));
All of which will pass all of the data to theFunction(), but whether than makes any sense will depend on what theFunction() does.

Resources