Embedded C - How to create a cache for expensive external reads? - c

I am working with a microcontroller that has an external EEPROM containing tables of information.
There is a large amount of information, however there is a good chance that we will request the same information cycle to cycle if we are fairly 'stable' - i.e. if we are at a constant temperature for example.
Reads from the EEPROM take around 1ms, and we do around 30 per cycle. Our cycle is currently about 100ms so there is significant savings to be had.
I am therefore looking at implementing a RAM cache. A hit should be significantly faster than 1ms since the microcontroller core is running at 8Mhz.
The lookup involves a 16-bit address returning 16-bit data. The microcontroller is 32-bit.
Any input on caching would be greatly appreciated, especially if I am totally missing the mark and should be using something else, like a linked list, or even a pre-existing library.
Here is what I think I am trying to achieve:
-A cache made up of an array of structs. The struct would contain the address, data and some sort of counter indicating how often this piece of data has been accessed (readCount).
-The array would be sorted by address normally. I would have an efficient lookup() function to lookup an address and get the data (suggestions?)
-If I got a cache miss, I would sort the array by readCount to determine the least used cached value and throw it away. I would then fill its position with the new value I have looked up from EEPROM. I would then reorder the array by address. Any sorting would use an efficient sort (shell sort? - not sure how to handle this with arrays)
-I would somehow decrement all of the readCount variables to that they would tend to zero if not used. This should preserve constantly used variables.
Here are my thoughts so far (pseudocode, apologies for my coding style):
#define CACHE_SIZE 50
//one piece of data in the cache
struct cacheItem
{
uint16_t address;
uint16_t data;
uint8_t readCount;
};
//array of cached addresses
struct cacheItem cache[CACHE_SIZE];
//function to get data from the cache
uint16_t getDataFromCache(uint16_t address)
{
uint8_t cacheResult;
struct cacheItem * cacheHit; //Pointer to a successful cache hit
//returns CACHE_HIT if in the cache, else returns CACHE_MISS
cacheResult = lookUpCache(address, cacheHit);
if(cacheResult == CACHE_MISS)
{
//Think this is necessary to easily weed out the least accessed address
sortCacheByReadCount();//shell sort?
removeLastCacheEntry(); //delete the last item that hasn't been accessed for a while
data = getDataFromEEPROM(address); //Expensive EEPROM read
//Add on to the bottom of the cache
appendToCache(address, data, 1); //1 = setting readCount to 1 for new addition
//Think this is necessary to make a lookup function faster
sortCacheByAddress(); //shell sort?
}
else
{
data = cacheHit->data; //We had a hit, so pull the data
cacheHit->readCount++; //Up the importance now
}
return data;
}
//Main function
main(void)
{
testData = getDataFromCache(1234);
}
Am I going down the completely wrong track here? Any input is appreciated.

Repeated sorting sounds expensive to me. I would implement the cache as a hash table on the address. To keep things simple, I would start by not even counting hits but rather evicting old entries immediately on seeing a hash collision:
const int CACHE_SIZE=32; // power of two
struct CacheEntry {
int16_t address;
int16_t value
};
CacheEntry cache[CACHE_SIZE];
// adjust shifts for different CACHE_SIZE
inline int cacheIndex(int adr) { return (((adr>>10)+(adr>>5)+adr)&(CACHE_SIZE-1)); }
int16_t cachedRead( int16_t address )
{
int idx = cacheIndex( address );
CacheEntry * pCache = cache+idx;
if( address != pCache->address ) {
pCache->value = readEeprom( address );
pCache->address = address;
}
return pCache->value
}
If this proves not effective enough, I would start by fiddling around with the hash function.

Don't be afraid to do more computations, in most cases I/O is slower.
This is the simpliest implementation I can think of:
#define CACHE_SIZE 50
something cached_vals[CACHE_SIZE];
short int cached_item_num[CACHE_SIZE];
char cache_hits[CACHE_SIZE]; // 0 means free.
void inc_hits(char index){
if (cache_hits[index] > 127){
for (int i = 0; i < CACHE_SIZE; i++)
cache_hits[i] <<= 1;
cache_hits[i]++; // 0 is reserved as "free" marker
};
cache_hits[index]++;
}:
int get_new_space(short int item){
for (int i = 0; i < CACHE_SIZE; i++)
if (!cache_hits[i]) {
inc_hits(i);
return i;
};
// no free values, dropping the one with lowest count
int min_val = 0;
for (int i = 1; i < CACHE_SIZE; i++)
min_val = min(cache_hits[min_val], cache_hits[i]);
cache_hits[min_val] = 2; // just to give new values more chanches to "survive"
cached_item_num[min_val] = item;
return min_val;
};
something* get_item(short int item){
for (int i = 0; i < CACHE_SIZE; i++){
if (cached_item_num[i] == item){
inc_hits(i);
return cached_vals + i;
};
};
int new_item = get_new_space(item);
read_from_eeprom(item, cached_vals + new_item);
return chached_vals + new_item;
};

Sorting and moving data seems like a bad idea, and it's not clear you gain anything useful from it.
I'd suggest a much simpler approach. Allocate 4*N (for some N) bytes of data, as an array of 4-byte structs each containing an address and the data. To look up a value at address A, you look at the struct at index A mod N; if its stored address is the one you want, then use the associated data, otherwise look up the data off the EEPROM and store it there along with address A. Simple, easy to implement, easy to test, and easy to understand and debug later.
If the location of your current lookup tends to be near the location of previous lookups, that should work quite well -- any time you're evicting data, it's going to be from at least N locations away in the table, which means you're probably not likely to want it again any time soon -- I'd guess that's at least as good a heuristic as "how many times did I recently use this". (If your EEPROM is storing several different tables of data, you could probably just do a cache for each one as the simplest way to avoid collisions there.)

You said that which entry you need from the table relates to the temperature, and that the temperature tends to remain stable. As long as the temperature does not change too quickly then it is unlikely that you will need an entry from the table which more than 1 entry away from the previously needed entry.
You should be able to accomplish your goal by keeping just 3 entries in RAM. The first entry is the one you just used. The next entry is the one corresponding to the temperature just below the last temperature measurement, and the other one is the temperature just above the last temperature measurement. When the temperature changes one of these entries probably becomes the new current one. You can then preform whatever task it is you need using this data, and then go ahead and read the entry you need (higher or lower than the current temperature) after you have finished other work (before reading the next temperature measure).
Since there are only 3 entries in RAM at a time you don't have to be clever about what data structure you need to store them in to access them efficiently, or even keeping them sorted because it will never be that long.
If temperatures can move faster than 1 unit per examination period then you could just increase the size of your cache and maybe have a few more anticipatory entries (in the direction that temperature seems to be heading) than you do trailing entries. Then you may want to store the entries in an efficient structure, though. I wouldn't worry about how recently you accessed an entry, though, because next temperature probability distribution predictions based on current temperature will usually be pretty good. You will need to make sure you handle the case where you are way off and need to read in the entry for a just read temperature immediately, though.

There are my suggestions:
Replace oldest, or replace least recent policy would be better, as reolacing least accessed would quickly fill up cache and then just repeatedly replace last element.
Do not traverse all array, but take some pseudo-random (seeded by address) location to replace. (special case of single location is already presented by #ruslik).
My idea would be:
#define CACHE_SIZE 50
//one piece of data in the cache
struct cacheItem
{
uint16_t address;
uint16_t data;
uint8_t whenWritten;
};
//array of cached addresses
struct cacheItem cache[CACHE_SIZE];
// curcular cache write counter
unit8_t writecount = 0;
// this suggest cache location either contains actual data or to be rewritten;
struct cacheItem *cacheLocation(uint16_t address) {
struct cacheLocation *bestc, *c;
int bestage = -1, age, i;
srand(address); // i'll use standard PRNG to acquire locations; as it initialized
// it will always give same sequence for same location
for(i = 0; i<4; i++) { // any number of iterations you find best
c = &(cache[rand()%CACHE_SIZE]);
if(c->address == address) return c; // FOUND!
age = (writecount - whenWritten) & 0xFF; // after age 255 comes age 0 :(
if(age > bestage) {
bestage = age;
bestc = c;
}
}
return c;
}
....
struct cacheItem *c = cacheLocation(addr);
if(c->address != addr) {
c->address = addr;
c->data = external_read(addr);
c->whenWritten = ++writecount;
}
cache age will wrap after 255 to 0 but but it's hust slightly randomizes cache replacements, so it did not make workaround.

Related

Solving Lights out for AI Course

So I was given the following task: Given that all lights in a 5x5 version of a game are turned on, write an algorithm using UCS / A* / BFS / Greedy best first search that finds a solution.
What I did first was realize that UCS would be unnecessary as the cost from moving from one state to another is 1(pressing a button that flips itself and neighbouring ones). So what I did is wrote BFS instead. It turned out that it works too long and fills up a queue, even though I was paying attention to removing parent nodes when I was finished with them not to overflow the memory. It would work for around 5-6mins and then crash because of memory.
Next, what I did is write DFS(even though it was not mentioned as one of possibilities) and it did find a solution in 123 secs, at depth 15(I used depth-first limited because I knew that there was a solution at depth 15).
What I am wondering now is am I missing something? Is there some good heuristics to try to solve this problem using A* search? I figured out absolutely nothing when it's about heuristics, because it doesn't seem any trivial to find one in this problem.
Thanks very much. Looking forward to some help from you guys
Here is my source code(I think it's pretty straightforward to follow):
struct state
{
bool board[25];
bool clicked[25];
int cost;
int h;
struct state* from;
};
int visited[1<<25];
int dx[5] = {0, 5, -5};
int MAX_DEPTH = 1<<30;
bool found=false;
struct state* MakeStartState()
{
struct state* noviCvor = new struct state();
for(int i = 0; i < 25; i++) noviCvor->board[i] = false, noviCvor->clicked[i] = false;
noviCvor->cost = 0;
//h=...
noviCvor->from = NULL;
return noviCvor;
};
struct state* MakeNextState(struct state* temp, int press_pos)
{
struct state* noviCvor = new struct state();
for(int i = 0; i < 25; i++) noviCvor->board[i] = temp->board[i], noviCvor->clicked[i] = temp->clicked[i];
noviCvor->clicked[press_pos] = true;
noviCvor->cost = temp->cost + 1;
//h=...
noviCvor->from = temp;
int temp_pos;
for(int k = 0; k < 3; k++)
{
temp_pos = press_pos + dx[k];
if(temp_pos >= 0 && temp_pos < 25)
{
noviCvor->board[temp_pos] = !noviCvor->board[temp_pos];
}
}
if( ((press_pos+1) % 5 != 0) && (press_pos+1) < 25 )
noviCvor->board[press_pos+1] = !noviCvor->board[press_pos+1];
if( (press_pos % 5 != 0) && (press_pos-1) >= 0 )
noviCvor->board[press_pos-1] = !noviCvor->board[press_pos-1];
return noviCvor;
};
bool CheckFinalState(struct state* temp)
{
for(int i = 0; i < 25; i++)
{
if(!temp->board[i]) return false;
}
return true;
}
int bijection_mapping(struct state* temp)
{
int temp_pow = 1;
int mapping = 0;
for(int i = 0; i < 25; i++)
{
if(temp->board[i])
mapping+=temp_pow;
temp_pow*=2;
}
return mapping;
}
void BFS()
{
queue<struct state*> Q;
struct state* start = MakeStartState();
Q.push(start);
struct state* temp;
visited[ bijection_mapping(start) ] = 1;
while(!Q.empty())
{
temp = Q.front();
Q.pop();
visited[ bijection_mapping(temp) ] = 2;
for(int i = 0; i < 25; i++)
{
if(!temp->clicked[i])
{
struct state* next = MakeNextState(temp, i);
int mapa = bijection_mapping(next);
if(visited[ mapa ] == 0)
{
if(CheckFinalState(next))
{
printf("NADJENO RESENJE\n");
exit(0);
}
visited[ mapa ] = 1;
Q.push(next);
}
}
}
delete temp;
}
}
PS. As I am not using map anymore(switched to array) for visited states, my DFS solution improved from 123 secs to 54 secs but BFS still crashes.
First of all, you may already recognize that in Lights Out you never have to flip the same switch more than once, and it doesn't matter in which order you flip the switches. You can thus describe the current state in two distinct ways: either in terms of which lights are on, or in terms of which switches have been flipped. The latter, together with the starting pattern of lights, gives you the former.
To employ a graph-search algorithm to solve the problem, you need a notion of adjacency. That follows more easily from the second characterization: two states are adjacent if there is exactly one switch about which they they differ. That characterization also directly encodes the length of the path to each node (= the number of switches that have been flipped), and it reduces the number of subsequent moves that need to be considered for each state considered, since all possible paths to each node are encoded in the pattern of switches.
You could use that in a breadth-first search relatively easily (and this may be what you in fact tried). BFS is equivalent to Dijkstra's algorithm in that case, even without using an explicit priority queue, because you enqueue new nodes to explore in priority (path-length) order.
You can also convert that to an A* search with addition of a suitable heuristic. For example, since each move turns off at most five lights, one could take as the heuristic the number of lights still on after each move, divided by 5. Though that's a bit crude, I'm inclined to think that it would be of some help. You do need a real priority queue for that alternative, however.
As far as implementation goes, do recognize that you can represent both the pattern of lights currently on and the pattern of switches that have been pressed as bit vectors. Each pattern fits in a 32-bit integer, and a list of visited states requires 225 bits, which is well within the capacity of modern computing systems. Even if you use that many bytes, instead, you ought to be able to handle it. Moreover, you can perform all needed operations using bitwise arithmetic operators, especially XOR. Thus, this problem (at its given size) ought to be computable relatively quickly.
Update:
As I mentioned in comments, I decided to solve the problem for myself, with -- it seemed to me -- very good success. I used a variety of techniques to achieve good performance and minimize memory usage, and in this case, those mostly were complementary. Here are some of my tricks:
I represented each whole-system state with a single uint64_t. The top 32 bits contain a bitmask of which switches have been flipped, and the bottom 32 contain a bitmask of which lights are on as a result. I wrapped these in a struct along with a single pointer to link them together as elements of a queue. A given state can be tested as a solution with one bitwise-and operation and one integer comparison.
I created a pre-initialized array of 25 uint64_t bitmasks representing the effect of each move. One bit set among the top 32 of each represents the switch that is flipped, and between 3 and five bits set among the bottom 32 represent the lights that are toggled as a result. The effect of flipping one switch can then be computed simply as new_state = old_state ^ move[i].
I implemented plain breadth-first search instead of A*, in part because I was trying to put something together quickly, and in particular because that way I could use a regular queue instead of a priority queue.
I structured my BFS in a way that naturally avoided visiting the same state twice, without having to actually track which states had ever been enqueued. This was based on some insight into how to efficiently generate distinct bit patterns without repeating, with those having fewer bits set generated before those having more bits set. The latter criterion was satisfied fairly naturally by the queue-based approach required anyway for BFS.
I used a second (plain) queue to recycle dynamically-allocated queue nodes after they were removed from the main queue, to minimize the number calls to malloc().
Overall code was a bit less than 200 lines, including blank and comment lines, data type declarations, I/O, queue implementation (plain C, no STL) -- everything.
Note, by the way, that the priority queue employed in standard Dijkstra and in A* is primarily about finding the right answer (shortest path), and only secondarily about doing so efficiently. Enqueueing and dequeueing from a standard queue can both be O(1), whereas those operations on a priority queue are o(log m) in the number of elements in the queue. A* and BFS both have worst-case queue size upper bounds of O(n) in the total number of states. Thus, BFS will scale better than A* with problem size; the only question is whether the former reliably gives you the right answer, which in this case, it does.

What is the best way to copy memory for X indexes to many locations within a single array in C?

What is the best way to copy memory for X indexes to many locations within a single array in C?
The problem trying to be solved is emulated memory for a CPU emulator. The original hardware has this type of memory mirroring, and I am attempting to replicate it via code.
Say you have an array:
int memory[100] = {0};
and you have 10 indexes which are mirrored at different locations. For example if memory[0] is changed, index 0, 10, 20, 30... should change to that value or if memory[3] is changed, index 3, 13, 23, 33 should be mirrored.
Likewise if any mirrored location is changed all other mirror locations should reflect this, such as if index 23 is changed, 3, 13, 23, 33... etc should reflect this.
Another requirement is a way to specify where the start and ends of the mirrored locations are. For example index 10-19 could be mirrored at index 30-39, then again at 70-79 leaving unmodified space in between segments of mirrored indexes.
Would using memcpy be the fastest/most efficient way of achieving this if this, or would some sort of iterating loop and pointer math be better for efficiency? How would the pointer math be done to calculate the start address to copy to as well as the destination? Would an array of pointers holding the start addresses that live inside the memory array be the best way to handle this?
Something maybe like (this probably won't compile it is just pseudo code for an idea of mine):
#define NUMBER_OF_MIRRORS 3
#define LENGTH_OF_MIRRORS 10
int memory[100] = {0};
int *memory_mirror_starts[3] = {&memory[10], &memory[30], &memory[70]};
// When memory needs to be mirrored
for(int i = 0; i < NUMBER_OF_MIRRORS; i++) {
for(int n = 0; n < LENGTH_OF_MIRRORS; n++) {
memory_mirror_starts[i][n] = memory_mirror_starts[0][n];
}
}
I think I may be on the right track, but this wouldn't satisfy all my requests since this specifically copies the results of the first mirror to the rest. If a write was to any of the other mirrors it would be overwritten rather than copied to the other mirrors.
Thanks for any tips and advice.
To specify where each mirror starts in the memory array, the "array of pointers into 'memory' works",
int *memory_mirror_starts[3] = {&memory[10], &memory[30], &memory[70]}; // (A)
or you could simply give each offset:
int memory_mirror_starts[3] = { 10, 30, 70 }; // (B)
Then to ensure each write to a given mirror is indeed replicated to all mirrors, without copying the whole thing all the time, you could have a poke function to write at a given index in a mirror (for each approach, (A) and (B))
void poke(int index, int value) {
int j;
for (j=0 ; j<NUMBER_OF_MIRRORS ; j++)
memory_mirror_starts[j][index] = value; // (A)
}
or
void poke(int index, int value) {
int j;
for (j=0 ; j<NUMBER_OF_MIRRORS ; j++)
memory[memory_mirror_starts[j] + index] = value; // (B)
}
Having a function to centralize write access hides the mirrors complexity to developers and ensures all mirrors are indeed updated correctly.
Note index could be checked to be >=0 and < LENGTH_OF_MIRRORS.
Performance wise,
adding inline to the functions declarations will change the function calls with the function code in place, saving the calls. The poke function being small, the code shouldn't get much bigger.
(A) does basically *(*(memory_mirror_starts + j) + index) = value
(B) does basically *(memory + *(memory_mirror_starts + j) + index) = value
so (A) could be a tad faster, (but optimizers have their say, and it is better to test both solutions)
Inline:
inline void poke(int index, int value) { ...

Expand hash table without rehash?

I am looking to for a hash table data structure that does not require rehash for expansion and shrink?
Rehash is a CPU consuming effort. I was wondering if it is possible to design hash table data structure in a way that does not require rehash at all? Have you heard about such a data structure before?
does not require rehash for expansion and shrink? Rehash is a CPU consuming effort. I was wondering if it is possible to design hash table data structure in a way that does not require rehash at all? Have you heard about such a data structure before?
That depends on what you call "rehash":
If you simply mean that the table-level rehash shouldn't reapply the hash function to each key during resizing, then that's easy with most libraries: e.g. wrap the key and its raw (pre-modulo-table-size) real hash value together a la struct X { size_t hash_; Key key_ };, supply the hashtable library with a hash function that returns hash_, but a comparison function that compares key_s (depending on the complexity of key_ comparison, you may be able to use hash_ to optimise, e.g. lhs.hash_ == rhs.hash_ && lhs.key_ == rhs.key_).
This will help most if the hashing of keys was particularly time consuming (e.g. cryptographic strength on longish keys). For very simple hashing (e.g. passthrough of ints) it'll slow you down and waste memory.
If you mean the table-level operation of increasing or decreasing memory storage and reindexing all stored values, then yes - it can be avoided - but to do so you have to fundamentally change the way the hash table works, and the normal performance profile. Discussed below.
As just one example, you could leverage a more typical hashtable implementation (let's call it H) by having your custom hashtable (C) have an H** p that - up to an initial size limit - will have p[0] be the only instance of H, and simply ferry operations/results through. If the table grows beyond that, you keep p[0] referencing the existing H, while creating a second H hashtable to be tracked by p[1]. Then things start getting dicey:
to search or erase in C, your implementation needs to search p[1] then p[0] and report any match from either
to insert a new value in C, your implementation must confirm it's not in p[0], then insert to p[1]
with each insert (and potentially even for other operations), it could optionally migrate any matching - or an arbitrary p[0] entry - to p[1] so gradually p[0] empties; you can easily guarantee p[0] will be empty before p[1] will be so full (and consequently a larger table will be needed). When p[0] is empty you may want to p[0] = p[1]; p[1] = NULL; to keep the simple mental model of what's where - lots of options.
Some existing hash table implementations are very efficient at iterating over elements (e.g. GNU C++ std::unordered_set), as there's a singly linked list of all the values, and the hash table is really only a collection of pointers (in C++ parlance, iterators) into the linked list. This can mean that if your utilisation falls below some threshold (e.g. 10% load factor) for your only/larger hash table, you know you can very efficiently migrate the remaining elements to a smaller table.
These kind of tricks are used by some hash tables to avoid a sudden heavy cost during rehashing, and instead spread the pain more evenly over a number of subsequent operations, avoiding a possibly nasty spike in latency.
Some of the implementation options only make sense for either an open or a closed hashing implementation, or are only useful when the keys and/or values are small or large and depending on whether the table embeds them or points to them. Best way to learn about it is to code....
It depends what you want to avoid. Rehashing implies recomputing the hash values. You can avoid that by storing the hash values in the hash structures. Redispatching the entries into the reallocated hashtable may be less expensive (typically a single modulo or masking operation) and is hardly avoidable for simple hashtable implementations.
Assuming you actually do need this.. It is possible. Here I'll give a trivial example you can build on.
// Basic types we deal with
typedef uint32_t key_t;
typedef void * value_t;
typedef struct
{
key_t key;
value_t value;
} hash_table_entry_t;
typedef struct
{
uint32_t initialSize;
uint32_t size; // current max entries
uint32_t count; // current filled entries
hash_table_entry_t *entries;
} hash_table_t;
// Hash function depends on the size of the table
key_t hash(value_t value, uint32_t size)
{
// Simple hash function that just does modulo hash table size;
return *(key_t*)&value % size;
}
void init(hash_table_t *pTable, uint32_t initialSize)
{
pTable->initialSize = initialSize;
pTable->size = initialSize;
pTable->count = 0;
pTable->entries = malloc(pTable->size * sizeof(*pTable->entries));
/// #todo handle null return;
// Set to ~0 to signal invalid keys.
memset(pTable->entries, ~0, pTable->size * sizeof(*pTable->entries));
}
void insert(hash_table_t *pTable, value_t val)
{
key_t key = hash(val, pTable->size);
for (key_t i = key; i != (key-1); i=(i+1)%pTable->size)
{
if (pTable->entries[i].key == ~0)
{
pTable->entries[i].key = key;
pTable->entries[i].value = val;
pTable->count++;
break;
}
}
// Expand when 50% full
if (pTable->count > pTable->size/2)
{
pTable->size *= 2;
pTable->entries = realloc(pTable->entries, pTable->size * sizeof(*pTable->entries));
/// #todo handle null return;
memset(pTable->entries + pTable->size/2, ~0, pTable->size * sizeof(*pTable->entries));
}
}
_Bool contains(hash_table_t *pTable, value_t val)
{
// Try current size first
uint32_t sizeToTry = pTable->size;
do
{
key_t key = hash(val, sizeToTry);
for (key_t i = key; i != (key-1); i=(i+1)%pTable->size)
{
if (pTable->entries[i].key == ~0)
break;
if (pTable->entries[i].key == key && pTable->entries[i].value == val)
return true;
}
// Try all previous sizes we had. Only report failure if found for none.
sizeToTry /= 2;
} while (sizeToTry != pTable->initialSize);
return false;
}
The idea is that the hash function depends on the size of the table. When you change the size of the table, you don't rehash current entries. You add new ones with the new hash function. When reading the entries, you try all the hash functions that have ever been used on this table.
This way, get()/contains() and similar operations take longer the more times you expanded your table, but you don't have the huge spike of rehashing. I can imagine some systems where this would be a requirement.

Most efficient way to check if elements in an array have changed

I have an array in c and I need to perform some operation only if the elements in an array have changed. However the time and memory taken for this is very important. I realized that an efficient way to do this would probably be to hash all the elements of the array and compare the result with the previous result. If they match that means the elements dont change. I would however like to know if this is the most efficient way of doing things. Also since the array is only 8 bytes long(1 byte for each element) which hashing function would be least time consuming?
The elements in an array are actually being received from another microcontroller. So they may or may not change depending on whether what the other micro-controller measured is the same or not
If you weren't tied to a simple array, you could create a "MRU" List of structures where the structure could contain a flag that indicates if the item was changed since it was last inspected.
Every time an item changes set the "changed flag" and move it to the head of the list. When you need to check for the changed items you traverse the list from the head and unset the changed flags and stopping at the first element with its change flag not set.
Sorry, I missed the part about the array being only 8 bytes long. With that info and with the new info from your edit, I'm thinking the previous suggestion is not ideal.
If the array is only 8-bytes long why not just cache a copy of the previous array and compare it to the new array received?
Below is a clarification of my comment about "shortcutting" the compares. How you implement this would depend on what the sizeof(int) is on the platform used.
Using a 64-bit integer you could get away with one compare to determine if the array has changed. For example:
#define ARR_SIZE 8
unsigned char cachedArr[ARR_SIZE];
unsigned char targetArr[ARR_SIZE];
unsigned int *ic = (unsigned int *)cachedArr;
unsigned int *it = (unsigned int *)targetArr;
// This assertion needs to be true for this implementation to work
// correctly.
assert(sizeof(int) == sizeof(cachedArr));
/*
** ...
** assume initialization and other suff here
** leading into the main loop that is receiving the target array data.
** ...
*/
if (*ic != *it)
{
// Target array has changed; find out which element(s) changed.
// If you only cared that there was a change and did not care
// to know which specific element(s) had changed you could forego
// this loop altogether.
for (int i = 0; i < ARR_SIZE; i++)
{
if (cachedArr[i] != targetArr[i])
{
// Do whatever needs to be done based on the i'th element
// changed
}
}
// Cache the array again since it has changed.
memcpy(cachedArr, targetArr, sizeof(cachedArr));
}
// else no change to the array
If the native integer size was smaller than 64-bit you could use the same theory, but you'd have to loop over the array sizeof(cachedArr) / sizeof(unsigned int) times; and there would be a worst-case scenario involved (but isn't there always) if the change was in the last chunk tested.
It should be noted that with doing any char to integer type casting you may need to take into consideration alignment (if the char data is aligned to the appropriate word-size boundary).
Thinking further upon this however, it might be better altogether to just unroll the loop yourself and do:
if (cachedArr[0] != targetArr[0])
{
doElement0ChangedWork();
}
if (cachedArr[1] != targetArr[1])
{
doElement1ChangedWork();
}
if (cachedArr[2] != targetArr[2])
{
doElement2ChangedWork();
}
if (cachedArr[3] != targetArr[3])
{
doElement3ChangedWork();
}
if (cachedArr[4] != targetArr[4])
{
doElement4ChangedWork();
}
if (cachedArr[5] != targetArr[5])
{
doElement5ChangedWork();
}
if (cachedArr[6] != targetArr[6])
{
doElement6ChangedWork();
}
if (cachedArr[7] != targetArr[7])
{
doElement7ChangedWork();
}
Again, depending on whether or not knowing which specific element(s) changed that could be tightened up. This would result in more instruction memory needed but eliminates the loop overhead (the good old memory versus speed trade-off).
As with anything time/memory related test, measure, compare, tweak and repeat until desired results are achieved.
only if the elements in an array have changed
Who else but you is going to change them? You can just keep track of whether you've made a change since the last time you did the operation.
If you don't want to do that (perhaps because it'd require recording changes in too many places, or because the record-keeping would take too much time, or because another thread or other hardware is messing with the array), just save the old contents of the array in a separate array. It's only 8 bytes. When you want to see whether anything has changed, compare the current array to the copy element-by-element.
As others have said, the elements will only change if the code changed them.
Maybe this data can be changed by another user? Otherwise you would know that you had changed an entry.
As far as the hash function, there are only 2^8 = 256 different values that this array can take. A hash function won't really help here. Also, a hash function has to be computed, which costs memory so I don't think that will work for your application.
I would just compare bits until you find one has changed. If one has changed, the you will check 4 bits on average before you that your array has changed (assuming that each bit is equally likely to change).
If one hasn't changed, that is worst case scenario and you will have to check all eight bits to conclude that none have changed.
If array only 8 bytes long, you can treat it as if it is a long long type number. Suppose original array is char data[8].
long long * pData = (logn long *)data;
long long olddata = *pData;
if ( olddata != *pData )
{
// detect which one changed
}
I mean, this way you operate all data in one shot, this is much faster than access each element using index. hash is slower n this case.
If it is byte oriented with only eight elements, doing an XOR function would be more efficient than any other comparison.
If ((LocalArray[0] ^ received Array [0]) & (LocalArray[1] ^ received Array [1]) & ...)
{
//Yes it is changed
}

Efficient way to scan struct nested arrays

I have a struct that has several arrays members:
typedef myData someStruct {
uint16_t array1 [ARRAY_LENGTH]
uint16_t array2 [ARRAY_LENGTH]
} myData;
myData testData = {0}; // Global struct
At some point in my program I need to set the arrays to some set of predefined values, e.g., set array1 to all 0, array2 to all 0xFF, etc. My first instinct was to write out a for loop something like:
void someFunction (myData * test) {
for (uint16_t i = 0; i < ARRAY_LENGTH; ++i) {
test->array1[i] = 0xFF;
test->array2[i] = 0xCC;
}
}
However I then reasoned that the actions required by the program to do this would go something like:
load address of array1 first position
set value 0xFF;
load far address of array2 first postion
set value 0xCC;
load far address of array1 second position
set value 0xFF;
// and so on...
Whereas if I used a separate loop for each array the addresses would be a lot nearer each other (as arrays and structs stored contiguously), so the address loads are only to the next byte each time, making the code actually more efficient as follows:
void someFunction (myData * test) {
uint16_t i = 0;
for (i; i < ARRAY_LENGTH; ++i)
test->array1[i] = 0xFF;
for (i = 0; i < ARRAY_LENGTH; ++i)
test->array2[i] = 0xCC;
}
Is my reasoning correct, is the second one better? Furthermore, would a compiler (say gcc, for e.g.) normally be able to make this optimization itself?
It's going to depend on your system architecture. For example, on, say, a SPARC system, the cache line size is 64-bytes, and there are enough cache slots for both arrays, so the first version would be efficient. The load of the first array element would populate the cache, and subsequent loads would be very fast. If the compiler is smart enough, it can use prefetch as well.
On ISAs that support offset addressing, it doesn't actually fetch the address of the array element each time, it just increments an offset. So it only fetches the base address of the array, once, and then uses a load instruction with the base and offset. Each time through the loop it increments the offset in a register. Some instruction sets even have auto-increment.
The best thing to do would be to write a sample program/function, and try it. Optimizations at this low a level require either a thorough knowledge of the CPu/system, or lots of trial and error.
My humble recommendation: try and see. One loop solution saves arithmetic operations around increment and test of i. Two loops will probably profit of better cache optimization, especially if arrays are aligned to memory pages. In such case each access may cause a cache miss and cache reload. Personally, if the speed really matters I would prefer two loops with some unfolding.

Resources