Cache block tag size

Cache block tag size - c

I'm writing a cache simulation program in C on linux using gcc as the compiler and I'm done for the most part. Only a few test cases go wrong (a few things out of the thousands of fed addresses that should be hitting are missing). I specify the cache properties on the command line. I suspect the error within my code has to do with the tag (if things aren't hitting then their tags aren't matching up when they should be). So my question is: Am I calculating the tag right?
//setting sizes of bits
int offsetSize = log2(lineSize);
int indexSize = 0;
if (strcmp(associativity,"direct") == 0){//direct associativity
indexSize = log2(numLines);
}else if (assocNum == numLines){//fully associative
indexSize = 0;
}else{//set associative
indexSize = log2(assocNum);
}
address = (int) strtol(readAddress,&eptr,16);
unsigned long long int mask = 0;
//get the offset Bits
mask = (1 << offsetSize) - 1;
offsetBits = address & mask;
//get the index bits
mask = (1 << (indexSize)) - 1;
mask = mask << offsetSize;
indexBits = (address & mask) >> offsetSize;
//get tag bits
tagBits = address >> (offsetSize+indexSize);
The addresses that are being fed are usually 48 bits, so the variables address and mask is of type unsigned long long int. I think the problem I'm having is that I'm taking all the upper bits of the address, when I should only be taking a small set of bits from the large address.
For example: I have 32 cache lines in a 4-way set associative cache with a block size of 4.
offsetSize = log2(4) = 2
indexSize = log2(4) = 2
My code currently takes the upper bits of the address no matter the address size, minus the last 4 bits. Should I be taking only the upper 28 bits instead? (tagSize = (8*4)-3-2)

My code currently takes the upper bits of the address no matter the address size, minus the last 4 bits. Should I be taking only the upper 28 bits instead?
The tag has to contain all upper bits so that the tag can be used to determine if it is or isn't a cache hit.
If addresses are 48-bits and are split into 3 fields, you'd have a 2-bit "offset in cache line" field, a 2-bit "index in cache" field and a 44-bit "upper bits that have to be stored in the tag" field. If you only store 28 bits in the tag then you get cache hits when you should get cache misses (because the entry in the cache happens to contain data for a different address where the 28 bits happened to match).
Note that you can/should think of "associativity" as the number of sets of cache lines that happen to operate in parallel (where direct mapped is just "associativity = 1", and where fully associative is just "associativity = total_cache_size / cache_line_size"). The associativity has no direct effect on the index size (only the size of the sets of cache lines matters for index size), and the problem you're having is probably related to indexSize = log2(assocNum); (which doesn't make sense).
In other words:
if( direct_mapped ) {
associativity = 1;
} else {
max_associativity = total_cache_size / cache_line_size;
if( fully_associative || (associativity > max_associativity) ) {
associativity = max_associativity;
}
}
set_size = total_cache_size / associativity;
number_of_lines_in_set = set_size / cache_line_size;
offset_size = log2(cache_line_size);
index_size = log2(number_of_lines_in_set);
tag_size = address_size - index_size - offsetSize;

Related

What are bit vectors and how do I use them to convert chars to ints?

Here's the explanation for our task when implementing a set data structure in C "The set is constructed as a Bit vector, which in turn is implemented as an array of the data type char."
My confusion arises from the fact that almost all the functions we're given take in a set and an int as shown in the function below yet our array is made up of chars. How would I call functions if they can only take in ints when I have an array of chars? Here's my attempt att calling the function in my main function as well as the structs and example of function used.
int main(){
set *setA = set_empty();
set_insert("green",setA );
}
struct set {
int capacity;
int size;
char *array;
};
void set_insert(const int value, set *s)
{
if (!set_member_of(value, s)) {
int bit_in_array = value; // To make the code easier to read
// Increase the capacity if necessary
if (bit_in_array >= s->capacity) {
int no_of_bytes = bit_in_array / 8 + 1;
s->array = realloc(s->array, no_of_bytes);
for (int i = s->capacity / 8 ; i < no_of_bytes ; i++) {
s->array[i] = 0;
}
s->capacity = no_of_bytes * 8;
}
// Set the bit
int byte_no = bit_in_array / 8;
int bit = 7 - bit_in_array % 8;
s->array[byte_no] = s->array[byte_no] | 1 << bit;
s->size++;
}
}

TL;DR: The types of the index (value in your case) and the indexed element of an array (array in your case) are independent from each other. There is no conversion.
Most digital systems these days store their value in bits, each of them can hold only 0 or 1.
An integer value can therefore be viewed as a binary number, a value to the base of 2. It is a sequence of bits, each of which assigned a power of two. See the Wikipedia page on two's complement for details. But this aspect is not relevant for your issue.
Relevant is the view that an integer value is a sequence of bits. The simplest integer type of C is the char. It holds commonly 8 bits. We can assign indexes to these bits, and therefore think of them as a "vector", mathematically. Some people start to count "from the left", others start to count "from the right". Other common terms in this area are "MSB" and "LSB", see this Wikipedia page for more.
To access an element of a vector, you use its index. On a common char this is commonly a value between 0 and 7, inclusively. Remember, in CS we start to count from zero. The type of this index can be any integer wide enough to hold the value, for example an int. This is why you use a int in your case. This data type is independent from the type of elements in the vector.
How to solve the problem, if you need more than 8 bits? Well, then you can use more chars. This is the reason why your structure holds (a pointer to) an array of chars. All n chars of the array represent a vector of n * 8 bits, and you call this amount the "capacity".
Another option is to use a wider type, like a long or even a long long. And you can build an array of elements of these types, too. However, the widths of such types are commonly not equal in all systems.
BTW, the mathematical "vector" is the same thing as an "array" in CS. Different science areas, different terms.
Now, what is a "set"? I hope your script explains that a bit better than I can... It is a collection that contains an element only once or not at all. All elements are distinct. In your case the elements are represented by (small) integers.
Given a vector of bits of arbitrary capacity, we can "map" an element of a set on a bit of this vector by its index. This is done by storing a 1 in the mapped bit, if the element is present in the set, or 0, if it is not.
To access the correct bit, we need the index of the single char in the array, and the index of the bit in this char. You calculate these values in the lines:
int byte_no = bit_in_array / 8;
int bit = 7 - bit_in_array % 8;
All variables are of type int, most probably because this is the common type. It can be any other integer type, like a size_t for example, as long as it can hold the necessary values, even different types for the different variables.
With these two values at hand, you can "insert" the element into the set. For this action, you set the respective bit to 1:
s->array[byte_no] = s->array[byte_no] | 1 << bit;
Please note that the shift operator << has a higher precedence than the bit-wise OR operator |. Some coding style rules request to use parentheses to make this clear, but you can also use this even clearer assignment:
s->array[byte_no] |= 1 << bit;

Data handling in bit field

I am reading the code that involves some bitwise operations as shown below:
unsigned char data = 0;
unsigned char status = 0;
//DAQmx functions for reading data
DAQmxReadDigitalLines(taskHandleIn,1,10.0,DAQmx_Val_GroupByChannel,dataIn,8,&read,&bytesPerSamp,NULL);
DAQmxReadDigitalLines(taskHandleOut,1,10.0,DAQmx_Val_GroupByChannel,dataOutRead,8,&read,&bytesPerSamp,NULL);
for (int i = 0; i < 8; i++)
{
if (dataOutRead[i] == 1)
data = data | (0x01 << i);
else
data = data & ~(0x01 << i);
}
for (int i = 0; i < 4; i++)
{
if (dataIn[i] == 1)
status = status | (0x01 << (7 - i));
else
status = status & ~(0x01 << (7 - i));
}
ctrl = 0;
In the above codes, dataOutRead and dataIn are both uInt8 8-element arrays originally initialized to zero.
I don't quite understand what the code is actually doing? Anyone can walk me through these codes?

Key part of understanding this code is the conditional with a bitwise operation inside:
if(dataOutRead[i]==1) {
data = data | (0x01 << i);
} else {
data = data & ~(0x01 << i);
}
It uses bytes of dataOutRead as a sequence of ones and not ones (presumably, but not necessarily, zeros). This sequence is "masked" into bits of data starting with the least significant one:
When dataOutRead[i] is 1, the corresponding bit is set
When dataOutRead[i] is not 1, the corresponding bit is cleared. This step is unnecessary, because data is zeroed out before entering the loop.
This could be thought of as converting a "byte-encoded-binary" (one byte per bit) into its corresponding binary number.
The second loop does the same thing with reversed bits, processing only the lower four bits, and sticking them into the upper nibble of the data byte in reverse order.
It's hard to speculate on the purpose of this approach, but it could be useful in applications that use arrays of full-byte Booleans to control the state of some hardware register, e.g. in a microcontroller.

Well the first loop is creating an unsigned char same as that of dataOutRead - Replicating whatever there is in dataOutRead to data. This one checks whether the ith bit is set/reset - and based on that it sets or resets in data.
Second loop does the same but with 4 least significant bits and copies whatever is there in most signigficant bits of status (Bit 7 to 4) from dataIn (but in reverse manner). To clarify further:-
7 6 5 4 3 2 1 0
x y z w w z y x
If in the second case 2 bit is set/reset then 5 bit of status is being set/reset.

Get list of bits set in BitMap

In C, Is there any optimized way of retrieving list of BitPositions set without parsing through each bit.
Consider following example
int bitmap[4];
So, there are 4 * 32 Bit Positions..Values are following
bitmap = { 0x1, 0x0, 0x0, 0x0010001 }
I want retrieve Position of each bit set instead of parsing from 0 to 4 * 32 positions.

First of all, one cannot really use int for bitmap in C, because shifting a bit to left to the sign bit has undefined behaviour, C doesn't guarantee that the representation is two's complement, or that there are 32 bits in an int; that being said the easiest way to avoid these pitfalls is to use the uint32_t from <stdint.h> instead. Thus
#include <stdint.h>
uint32_t bitmap[4];
So consider that you number these bits 0 ... 127 from indexes 0 ... 3; and within indexes 0 ... 31; so, you can get the index into array and the bit number within that value by using the following formula:
int bit_number = // a value from 0 ... 127
int index = value >> 32; // shift right by number of bits in each index
int bit_in_value = value & 31; // take modulo 32 to get the bit in value
Now you can index the integer by:
bitmap[index];
and the bit mask for the desired value is
uint32_t mask = (uint32_t)1 << bit_in_value;
so you can check if the bit is set by doing
bit_is_set = !!(bitmap[index] & mask);
Now to speed things up, you can skip any index for which bitmap[index] is 0 because it doesn't contain any bits set; likewise, within each index you can speed things up by shifting bits in the uint32_t from the bitmap right by 1 and masking with 1; and breaking the loop when the uint32_t becomes 0:
for (int index = 0; index <= 3; index ++) {
uint32_t entry = bitmap[index];
if (! entry) {
continue;
}
int bit_number = 32 * index;
while (entry) {
if (entry & 1) {
printf("bit number %d is set\n", bit_number);
}
entry >>= 1;
bit_number ++;
}
}
Other than that there is not much to speed up, besides lookup tables, or using compiler intrinsics, such as this to set which is the lowest bit set but you'd still have to use some anyway.

An optimal solution which runs in O(k), where k = the total number of set bits in your entire list, can be achieved by using a lookup table. For example, you can use a table of 256 entries to describe the bit positions of every set bit in that byte. The index would be the actual value of the Byte.
For each entry you could use the following structure.
struct
{
int numberOfSetBits;
char* list; // use malloc and alloocate the list according to numberOfSetBits
}
You can then iterate across the list member of each structure and the number of iterations = the number of set bits for that byte. For a 32-bit integer you will have to iterate through 4 of these structs, one per each byte. To determine which entry you need to check you use a Bitmap and shift 8 bits. Note, that the bit positions are relative to that byte, so you may have to add an offset or either 24, 16, or 8 depending on the byte you are iterating through (assuming a 32 bit integer).
Note: if additional memory usage is not a problem for you, you could build a 64K Table of 16-bit entries and you will decrease the number of your structs by half.

Related with this question, you can see What is the fastest way to return the positions of all set bits in a 64-bit integer?
A simple solution, but perhaps not the fastest, depending on the times of the log and pow functions:
#include<math.h>
#include<stdio.h>
void getSetBits(unsigned int num, int offset){
int bit;
while(num){
bit = log2(num);
num -= pow(2, bit);
printf("%i\n", offset + bit); // use bit number
}
}
int main(){
int i, bitmap[4] = {0x1, 0x0, 0x0, 0x0010001};
for(i = 0; i < 4; i++)
getSetBits(bitmap[i], i * 32);
}
Complexity O(D) | D is the number of set bits.

Determining details of a cache

Your machine has an L1 cache and memory with the following properties.
Memory address space: 24 bits
Cache block size: 16 bytes
Cache associativity: direct-mapped
Caches size: 256 bytes
I am asked to determine the following: 1. the number of tag bits. 2. the number of bits of the cash index. 3. number of bits for cache size.
tag bits = m - (s+b)
m = 24. s = log2 S, S = C/(B*E). E = 1 due to it being direct mapped. so S = 256/16 = 16. s = log2 16 = 4. B = 16 (cache block size) b = log2 B; which is log2 16= 4. so s=4,b=4,m=24. t = 24-(4+4) = 16 total tag bits.
I am not sure how to figure this out.
I believe number of bits for cache size is just C*(num bits/byte) = 256*8 = 2048.
Can anyone help me figure out 2., and determine if the logic in 1. & 3. are correct?

1) This is correct for m=32 (isn't it 24?).
2) The number of index-bits: The number of bits to address a block in the cache when it'd direct-mapped, since it identifies the set (which consists only of one block in this case). If it was 2-way, one bit less would be needed for the index (and added to the tag-bits). For this problem, Since there are 16 sets you need 16 index bits which can be represented in 4 index bits.
3) It is not completely clear how to interpret this question. I would understand it as the number of bits needed to address the cache, which would be 4 in this case? If indeed, as you assume, the number of bits in the cache was meant, you would have to add 16*16 bits for the tag bits to your solution.

How to define and work with an array of bits in C?

I want to create a very large array on which I write '0's and '1's. I'm trying to simulate a physical process called random sequential adsorption, where units of length 2, dimers, are deposited onto an n-dimensional lattice at a random location, without overlapping each other. The process stops when there is no more room left on the lattice for depositing more dimers (lattice is jammed).
Initially I start with a lattice of zeroes, and the dimers are represented by a pair of '1's. As each dimer is deposited, the site on the left of the dimer is blocked, due to the fact that the dimers cannot overlap. So I simulate this process by depositing a triple of '1's on the lattice. I need to repeat the entire simulation a large number of times and then work out the average coverage %.
I've already done this using an array of chars for 1D and 2D lattices. At the moment I'm trying to make the code as efficient as possible, before working on the 3D problem and more complicated generalisations.
This is basically what the code looks like in 1D, simplified:
int main()
{
/* Define lattice */
array = (char*)malloc(N * sizeof(char));
total_c = 0;
/* Carry out RSA multiple times */
for (i = 0; i < 1000; i++)
rand_seq_ads();
/* Calculate average coverage efficiency at jamming */
printf("coverage efficiency = %lf", total_c/1000);
return 0;
}
void rand_seq_ads()
{
/* Initialise array, initial conditions */
memset(a, 0, N * sizeof(char));
available_sites = N;
count = 0;
/* While the lattice still has enough room... */
while(available_sites != 0)
{
/* Generate random site location */
x = rand();
/* Deposit dimer (if site is available) */
if(array[x] == 0)
{
array[x] = 1;
array[x+1] = 1;
count += 1;
available_sites += -2;
}
/* Mark site left of dimer as unavailable (if its empty) */
if(array[x-1] == 0)
{
array[x-1] = 1;
available_sites += -1;
}
}
/* Calculate coverage %, and add to total */
c = count/N
total_c += c;
}
For the actual project I'm doing, it involves not just dimers but trimers, quadrimers, and all sorts of shapes and sizes (for 2D and 3D).
I was hoping that I would be able to work with individual bits instead of bytes, but I've been reading around and as far as I can tell you can only change 1 byte at a time, so either I need to do some complicated indexing or there is a simpler way to do it?
Thanks for your answers

If I am not too late, this page gives awesome explanation with examples.
An array of int can be used to deal with array of bits. Assuming size of int to be 4 bytes, when we talk about an int, we are dealing with 32 bits. Say we have int A[10], means we are working on 10*4*8 = 320 bits and following figure shows it: (each element of array has 4 big blocks, each of which represent a byte and each of the smaller blocks represent a bit)
So, to set the kth bit in array A:
// NOTE: if using "uint8_t A[]" instead of "int A[]" then divide by 8, not 32
void SetBit( int A[], int k )
{
int i = k/32; //gives the corresponding index in the array A
int pos = k%32; //gives the corresponding bit position in A[i]
unsigned int flag = 1; // flag = 0000.....00001
flag = flag << pos; // flag = 0000...010...000 (shifted k positions)
A[i] = A[i] | flag; // Set the bit at the k-th position in A[i]
}
or in the shortened version
void SetBit( int A[], int k )
{
A[k/32] |= 1 << (k%32); // Set the bit at the k-th position in A[i]
}
similarly to clear kth bit:
void ClearBit( int A[], int k )
{
A[k/32] &= ~(1 << (k%32));
}
and to test if the kth bit:
int TestBit( int A[], int k )
{
return ( (A[k/32] & (1 << (k%32) )) != 0 ) ;
}
As said above, these manipulations can be written as macros too:
// Due order of operation wrap 'k' in parentheses in case it
// is passed as an equation, e.g. i + 1, otherwise the first
// part evaluates to "A[i + (1/32)]" not "A[(i + 1)/32]"
#define SetBit(A,k) ( A[(k)/32] |= (1 << ((k)%32)) )
#define ClearBit(A,k) ( A[(k)/32] &= ~(1 << ((k)%32)) )
#define TestBit(A,k) ( A[(k)/32] & (1 << ((k)%32)) )

typedef unsigned long bfield_t[ size_needed/sizeof(long) ];
// long because that's probably what your cpu is best at
// The size_needed should be evenly divisable by sizeof(long) or
// you could (sizeof(long)-1+size_needed)/sizeof(long) to force it to round up
Now, each long in a bfield_t can hold sizeof(long)*8 bits.
You can calculate the index of a needed big by:
bindex = index / (8 * sizeof(long) );
and your bit number by
b = index % (8 * sizeof(long) );
You can then look up the long you need and then mask out the bit you need from it.
result = my_field[bindex] & (1<<b);
or
result = 1 & (my_field[bindex]>>b); // if you prefer them to be in bit0
The first one may be faster on some cpus or may save you shifting back up of you need
to perform operations between the same bit in multiple bit arrays. It also mirrors
the setting and clearing of a bit in the field more closely than the second implemention.
set:
my_field[bindex] |= 1<<b;
clear:
my_field[bindex] &= ~(1<<b);
You should remember that you can use bitwise operations on the longs that hold the fields
and that's the same as the operations on the individual bits.
You'll probably also want to look into the ffs, fls, ffc, and flc functions if available. ffs should always be avaiable in strings.h. It's there just for this purpose -- a string of bits.
Anyway, it is find first set and essentially:
int ffs(int x) {
int c = 0;
while (!(x&1) ) {
c++;
x>>=1;
}
return c; // except that it handles x = 0 differently
}
This is a common operation for processors to have an instruction for and your compiler will probably generate that instruction rather than calling a function like the one I wrote. x86 has an instruction for this, by the way. Oh, and ffsl and ffsll are the same function except take long and long long, respectively.

You can use & (bitwise and) and << (left shift).
For example, (1 << 3) results in "00001000" in binary. So your code could look like:
char eightBits = 0;
//Set the 5th and 6th bits from the right to 1
eightBits &= (1 << 4);
eightBits &= (1 << 5);
//eightBits now looks like "00110000".
Then just scale it up with an array of chars and figure out the appropriate byte to modify first.
For more efficiency, you could define a list of bitfields in advance and put them in an array:
#define BIT8 0x01
#define BIT7 0x02
#define BIT6 0x04
#define BIT5 0x08
#define BIT4 0x10
#define BIT3 0x20
#define BIT2 0x40
#define BIT1 0x80
char bits[8] = {BIT1, BIT2, BIT3, BIT4, BIT5, BIT6, BIT7, BIT8};
Then you avoid the overhead of the bit shifting and you can index your bits, turning the previous code into:
eightBits &= (bits[3] & bits[4]);
Alternatively, if you can use C++, you could just use an std::vector<bool> which is internally defined as a vector of bits, complete with direct indexing.

bitarray.h:
#include <inttypes.h> // defines uint32_t
//typedef unsigned int bitarray_t; // if you know that int is 32 bits
typedef uint32_t bitarray_t;
#define RESERVE_BITS(n) (((n)+0x1f)>>5)
#define DW_INDEX(x) ((x)>>5)
#define BIT_INDEX(x) ((x)&0x1f)
#define getbit(array,index) (((array)[DW_INDEX(index)]>>BIT_INDEX(index))&1)
#define putbit(array, index, bit) \
((bit)&1 ? ((array)[DW_INDEX(index)] |= 1<<BIT_INDEX(index)) \
: ((array)[DW_INDEX(index)] &= ~(1<<BIT_INDEX(index))) \
, 0 \
)
Use:
bitarray_t arr[RESERVE_BITS(130)] = {0, 0x12345678,0xabcdef0,0xffff0000,0};
int i = getbit(arr,5);
putbit(arr,6,1);
int x=2; // the least significant bit is 0
putbit(arr,6,x); // sets bit 6 to 0 because 2&1 is 0
putbit(arr,6,!!x); // sets bit 6 to 1 because !!2 is 1
EDIT the docs:
"dword" = "double word" = 32-bit value (unsigned, but that's not really important)
RESERVE_BITS: number_of_bits --> number_of_dwords
RESERVE_BITS(n) is the number of 32-bit integers enough to store n bits
DW_INDEX: bit_index_in_array --> dword_index_in_array
DW_INDEX(i) is the index of dword where the i-th bit is stored.
Both bit and dword indexes start from 0.
BIT_INDEX: bit_index_in_array --> bit_index_in_dword
If i is the number of some bit in the array, BIT_INDEX(i) is the number
of that bit in the dword where the bit is stored.
And the dword is known via DW_INDEX().
getbit: bit_array, bit_index_in_array --> bit_value
putbit: bit_array, bit_index_in_array, bit_value --> 0
getbit(array,i) fetches the dword containing the bit i and shifts the dword right, so that the bit i becomes the least significant bit. Then, a bitwise and with 1 clears all other bits.
putbit(array, i, v) first of all checks the least significant bit of v; if it is 0, we have to clear the bit, and if it is 1, we have to set it.
To set the bit, we do a bitwise or of the dword that contains the bit and the value of 1 shifted left by bit_index_in_dword: that bit is set, and other bits do not change.
To clear the bit, we do a bitwise and of the dword that contains the bit and the bitwise complement of 1 shifted left by bit_index_in_dword: that value has all bits set to one except the only zero bit in the position that we want to clear.
The macro ends with , 0 because otherwise it would return the value of dword where the bit i is stored, and that value is not meaningful. One could also use ((void)0).

It's a trade-off:
(1) use 1 byte for each 2 bit value - simple, fast, but uses 4x memory
(2) pack bits into bytes - more complex, some performance overhead, uses minimum memory
If you have enough memory available then go for (1), otherwise consider (2).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight