I have two data structures:
typedef struct{
int a;
int b;
int c;
}EVENTS;
EVENTS typeone[20];
EVENTS typetwo[20];
These have been filled. typeone has been filled till typeone[5] and typetwo till typetwo[8].
I just want to compare the first six of typeone and typetwo and see if there are equal in all their members.
Is there a way to do typeone[1] == typetwo[1]
Basically comparing all the values inside the datastructure at [1].
Is there a short way to do this or would I have to loop through each member and compare separately?
Thanks
This is a comp.lang.c FAQ. In a nutshell, no, C does not support struct comparison with the == operator (the answer in the FAQ states a few reasons as to why this is hard in the general case). You have to write your own function and compare member by member. As was pointed out, memcmp() is not a guaranteed way due to unspecified behavior when accessing padding bytes.
int eventsequal (const EVENTS *const a, const EVENTS *const b)
{
if (a->a != b->a) return 0;
if (a->b != b->b) return 0;
if (a->c != b->c) return 0;
return 1;
}
And then your example typeone[1] == typetwo[1] becomes
if (eventsequal (typeone + 1, typetwo + 1)) {
/* They're equal. */
}
To avoid the problem of padding, you have to compare the fields individually. This doesn't have to be all that horrible:
#include <stdbool.h>
bool EVENTS_equal(const EVENTS *e1, const EVENTS *e2)
{
return e1->a == e2->a && e1->b == e2->b && e1->c == e2->c;
}
then just loop:
size_t i;
bool equal = true;
for(i = 0; i < 6; ++i)
{
if(!EVENTS_equal(typeone + i, typetwo + i))
{
equal = 0;
break;
}
}
it's not that much code really, and of course you could trivially encapsulate the looping in a function that cross-compares the n first slots of two EVENTS arrays.
typedef struct{
int a;
int b;
int c;
}EVENTS;
#pragma pack(1)
EVENTS typeone[20];
EVENTS typetwo[20];
#pragma pack()
int equal(EVENTS* v1, EVENTS* v2)
{
return 0==memcmp(v1, v2, sizeof(*v1));
}
Note #pragma pack(1). It ensures that there are no padding bytes in the structures. This way memcmp will not try to compare padding bytes and the comparison is way faster than a field-by-field method, but while in this case the performance is unlikely to be adversely affected, take:
typedef struct{
char a;
long b;
} somestruct;
#pragma pack(1)
somestruct foo;
#pragma pack()
Retrieving foo.b will take much more machine code than in case of padded structures, because it will miss word-aligned position where it can be retrieved with a single 32-bit instruction, it will have to be picked out with four byte-reads, and then assembled into the target register from these four pieces. So, take the performance impact into account.
Also, check if your compiler supports #pragma pack. Most modern compilers do, but exceptions may still happen.
Related
Context:
I am experimenting with functional programming patterns in C90.
Goal:
This is what I'm trying to achieve in ISO C90:
struct mut_arr tmp = {0};
/* ... */
struct arr const res_c99 = {tmp};
Initializing a const struct member of type struct mut_arr with a lvalue (tmp).
#include <stdio.h>
enum
{
MUT_ARR_LEN = 4UL
};
struct mut_arr
{
unsigned char bytes[sizeof(unsigned char const) * MUT_ARR_LEN];
};
struct arr {
struct mut_arr const byte_arr;
};
static struct arr map(struct arr const* const a,
unsigned char (*const op)(unsigned char const))
{
struct mut_arr tmp = {0};
size_t i = 0UL;
for (; i < sizeof(tmp.bytes); ++i) {
tmp.bytes[i] = op(a->byte_arr.bytes[i]);
}
struct arr const res_c99 = {tmp};
return res_c99;
}
static unsigned char op_add_one(unsigned char const el)
{
return el + 1;
}
static unsigned char op_print(unsigned char const el)
{
printf("%u", el);
return 0U;
}
int main() {
struct arr const a1 = {{{1, 2, 3, 4}}};
struct arr const a2 = map(&a1, &op_add_one);
map(&a2, &op_print);
return 0;
}
This is what I tried in C90:
#include <stdio.h>
#include <string.h>
enum {
MUT_ARR_LEN = 4UL
};
struct mut_arr {
unsigned char bytes[sizeof(unsigned char const) * MUT_ARR_LEN];
};
struct arr {
struct mut_arr const byte_arr;
};
struct arr map(struct arr const* const a,
unsigned char (*const op)(unsigned char const))
{
struct arr const res = {0};
unsigned char(*const res_mut_view)[sizeof(res.byte_arr.bytes)] =
(unsigned char(*const)[sizeof(res.byte_arr.bytes)]) & res;
struct mut_arr tmp = {0};
size_t i = 0UL;
for (; i < sizeof(tmp.bytes); ++i) {
tmp.bytes[i] = op(a->byte_arr.bytes[i]);
}
memcpy(res_mut_view, &tmp.bytes[0], sizeof(tmp.bytes));
return res;
}
unsigned char op_add_one(unsigned char const el) { return el + 1; }
unsigned char op_print(unsigned char const el) {
printf("%u", el);
return 0U;
}
int main() {
struct arr const a1 = {{{1, 2, 3, 4}}};
struct arr const a2 = map(&a1, &op_add_one);
map(&a2, &op_print);
return 0;
}
All I do is to create an "alternate view" (making it essentially writable). Hence, I cast the returned address to unsigned char(*const)[sizeof(res.byte_arr.bytes)].
Then, I use memcpy, and copy the contents of the tmp to res.
I also tried to use the scoping mechanism to circumvent initializing in the beginning.
But it does not help, since there cannot be a runtime evaluation.
This works, but it is not anything like the C99 solution above.
Is there perhaps a more elegant way to pull this off?
PS: Preferably, the solution should be as portable as possible, too. (No heap allocations, only static allocations. It should remain thread-safe. These programs above seem to be, as I only use stack allocation.)
Union it.
#include <stdio.h>
#include <string.h>
enum {
MUT_ARR_LEN = 4UL
};
struct mut_arr {
unsigned char bytes[sizeof(unsigned char) * MUT_ARR_LEN];
};
struct arr {
const struct mut_arr byte_arr;
};
struct arr map(const struct arr *a, unsigned char (*op)(unsigned char)) {
union {
struct mut_arr tmp;
struct arr arr;
} u;
size_t i = 0;
for (; i < sizeof(u.tmp.bytes); ++i) {
u.tmp.bytes[i] = op(a->byte_arr.bytes[i]);
}
return u.arr;
}
unsigned char op_add_one(unsigned char el) {
return el + 1;
}
unsigned char op_print(unsigned char el) {
printf("%u", el);
return 0U;
}
int main() {
const struct arr a1 = {{{1, 2, 3, 4}}};
const struct arr a2 = map(&a1, &op_add_one);
map(&a2, &op_print);
return 0;
}
Let's throw some standard stuffs from https://port70.net/~nsz/c/c89/c89-draft.html .
One special guarantee is made in order to simplify the use of unions: If a union contains several structures that share a common initial sequence, and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them. Two structures share a common initial sequence if corresponding members have compatible types for a sequence of one or more initial members.
Two types have compatible type if their types are the same.
For two qualified types to be compatible, both shall have the identically qualified version of a compatible type;
The idea is that "common initial sequence" of mut_arr and arr is unsigned char [sizeof(unsigned char) * MUT_ARR_LEN]; so you can access one using the other.
However, as I read it now, it is unspecified if "initial sequence if corresponding members" includes nested struct members or not. So technically to be super standard compliant, you would:
struct arr map(const struct arr *a, unsigned char (*op)(unsigned char)) {
struct mutmut_arr {
struct mut_arr byte_arr;
};
union {
struct mutmut_arr tmp;
struct arr arr;
} u;
size_t i = 0;
for (; i < sizeof(u.tmp.bytes); ++i) {
u.tmp.byte_arr.bytes[i] = op(a->byte_arr.bytes[i]);
}
return u.arr;
}
#subjective I do want to note two things.
The placement of const type qualifier in your code is very confusing. It's typical in C to write const <type> not <type> const. It's typical to align * to the right with space on the left. I was not able to read your code efficiently at all. I removed almost all const from the code above.
Creating such interface as presented will be pain with no great benefits, with a lot of edge cases with lurking undefined behaviors around the corner. In C programming language, trust the programmer - it's one of the principles of C programming language. Do not prevent the programmer to do what has to be done (initializing a structure member). I would advise making the member mutable and have one structure definition and call it day. const qualified structure members usually are just hard to deal with, with no big benefits.
My answer might sound outrageous at first glance. It is
STOP WHAT YOU ARE DOING, NOW!
I will take my time to explain and give you a glimpse into your future (which is dim, if you pursue this idea) and try to convince you. But the gist of my answer is the bold line above.
Your prototype omits crucial parts to have some lasting solution to your "functional programming in C" approach. For example, you only have arrays of bytes (unsigned char). But for a "real" solution for "real" programmers, you need to consider different types. If you go to hoogle (Haskells online type and function browser engine thingy), you will notice, that fmap, which is the functional feature you try to achieve in C is defined as:
fmap :: Functor f => (a -> b) -> f a -> f b
This means, the mapping is not always from type a to type a. It's a monadic thingy, you try to offer your C programming fellows. So, an array of type element type a needs to be mapped to an array of element type b. Hence, your solution needs to offer not just arrays of bytes.
In C, arrays can reside in different types of memory and we cannot hide this very well. (In real functional languages, memory management is kind of abstracted away for the larger part and you just do not care. But in C, you must care. The user of your library must care and you need to allow them to dutifully care. Arrays can be global, on the stack, on the heap, in shared memory, ... and you need to offer a solution, allowing all that. Else, it will always just be a toy, propagating an illusion, that "it is possible and useful".
So, with just allowing arrays of different, custom types (someone will want arrays of arrays of a type as well, mind you!) and to be aware of memory management, how could a header file of your next evolution look like. Here is what I came up with:
#ifndef __IMMUTABLE_ARRAY_H
#define __IMMUTABLE_ARRAY_H
#include <stdint.h>
#include <stdlib.h>
#include <stdatomic.h>
// lacking namespaces or similar facilities in C, we use
// the prefix IA (Immutable Array) in front of all the stuff
// declared in this header.
// Wherever you see a naked `int`, think "bool".
// 0 -> false, 1 -> true.
// We do not like stdbool.h because sometimes trouble
// ensues in mixed C/C++ code bases on some targets, where
// sizeof(C-bool) != sizeof(C++-bool) o.O. So we cannot use
// C-bool in headers...
// We need storage classes!
// There are arrays on heap, static (global arrays),
// automatic arrays (on stack, maybe by using alloca),
// arrays in shared memory, ....
// For those different locations, we need to be able to
// perform different actions, e.g. for cleanup.
// IAStorageClass_t defines the behavior for a specific
// storage class.
// There is also the case of an array of arrays to consider...
// where we would need to clean up each member of the array
// once the array goes out of scope.
struct IAArray_tag;
typedef struct IAArray_tag IAArray_t;
typedef struct IAStorageClass_tag IAStorageClass_t;
typedef int (*IAArrayAllocator) (IAStorageClass_t* sclass,
size_t elementSize,
size_t capacity,
void* maybeStorage,
IAArray_t* target);
typedef void (*IAArrayDeleter) (IAArray_t* arr);
typedef void (*IAArrayElementDeleter) (IAArray_t* arr);
typedef int64_t (*IAArrayAddRef) (IAArray_t* arr);
typedef int64_t (*IAArrayRelease) (IAArray_t* arr);
typedef struct IAStorageClass_tag {
IAArrayAllocator allocator;
IAArrayDeleter deleter;
IAArrayElementDeleter elementDeleter;
IAArrayAddRef addReffer;
IAArrayRelease releaser;
} IAStorageClass_t;
enum IAStorageClassID_tag {
IA_HEAP_ARRAY = 0,
IA_STACK_ARRAY = 1,
IA_GLOBAL_ARRAY = 2,
IA_CUSTOM_CLASSES_BEGIN = 100
};
typedef enum IAStorageClassID_tag IAStorageClassID_t;
// creates the default storage classes (for heap and automatic).
void IAInitialize();
void IATerminate();
// returns a custom and dedicated identifier of the storage class.
int32_t
IARegisterStorageClass
(IAArrayAllocator allocator,
IAArrayDeleter deleter,
IAArrayElementDeleter elementDeleter,
IAArrayAddRef addReffer,
IAArrayRelease releaser);
struct IAArray_tag {
const IAStorageClass_t* storageClass;
int64_t refCount;
size_t elementSize; // Depends on the type you want to store
size_t capacity;
size_t length;
void* data;
};
// to make sure, uninitialized array variables are properly
// initialized to a harmless state.
IAArray_t IAInitInstance();
// allows to check if we ran into some uninitialized instance.
// In C++, this would be like after default constructor.
// See IAInitInstance().
int IAIsArray(IAArray_t* arr);
int
IAArrayCreate
(int32_t storageClassID,
size_t elementSize, // the elementSize SHALL be padded to
// a system-acceptable alignment size.
size_t capacity,
size_t size,
void* maybeStorage,
IAArray_t* target);
typedef
int
(*IAInitializerWithIndex_t)
(size_t index,
void* elementPtr);
int
IAArrayCreateWithInitializer
(int32_t storageClassID,
size_t elementSize,
size_t capacity,
void* maybeStorage,
IAInitializerWithIndex_t initializer,
IAArray_t* target);
IAArray_t* IAArrayAddReference(IAArray_t* arr);
void IAArrayReleaseReference(IAArray_t* arr);
// The one and only legal way to access elements within the array.
// Shortcutters, clever guys and other violators get hung, drawn
// and quartered!
const void * const IAArrayAccess(IAArray_t* arr, size_t index);
typedef void (*IAValueMapping_t)
(size_t index,
void* sourceElementPtr,
size_t sourceElementSize,
void* targetElementPtr,
size_t targetElementSize);
size_t IAArraySize(IAArray_t* arr);
size_t IAArrayCapacity(IAArray_t* arr);
size_t IAArrayElementSize(IAArray_t* arr);
// Because of reasons, we sometimes want to recycle
// an array and populate it with new values.
// This can only be referentially transparent and safe,
// if there are no other references to this array stored
// anywhere. i.e. if refcount == 1.
// If our app code passed the array around to other functions,
// some nasty ones might sneakily store themselves a pointer
// to an array and then the refcount > 1 and we cannot
// safely recycle the array instance.
// Then, we have to release it and create ourselves a new one.
int IACanRecycleArray(IAArray_t* arr);
// Starship troopers reporter during human invasion
// of bug homeworld: "It is an ugly planet, a bug planet!"
// This is how we feel about C. Map needs some noisy extras,
// just because C does not allow to build new abstractions with
// types. Yes, we could send Erich Gamma our regards and pack
// all the noise into some IAArrayFactory * :)
int
IAArrayMap(IAValueMapping_t mapping,
IAArray_t* source,
int32_t targetStorageClassID,
size_t targetElementSize,
void* maybeTargetStorage,
IAArray_t* target);
#endif
Needless to say, that I did not bother to implement my cute immutable-array.h in my still empty immutable-array.c, yes?
But once we did it, the joy woulds begin and we could write robust, functional C programs, yes? No! This is how well written functional C application code using those arrays might look like:
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <stdlib.h>
#include <stdatomic.h>
#include <math.h>
#include <assert.h>
#include "immutable-array.h"
typedef struct F64FloorResult_tag {
double div;
double rem;
} F64FloorResult_t;
void myFloor(double number, F64FloorResult_t* result) {
if (NULL != result) {
result->div = floor(number);
result->rem = number - result->div;
}
}
int randomDoubleInitializer(size_t index, double* element) {
if (NULL != element) {
*element = ((double)rand()) / (double)RAND_MAX;
return 1;
}
return 0;
}
void
doubleToF64FloorMapping
(size_t index,
double* input,
size_t inputElementSize,
F64FloorResult_t *output,
size_t outputElementSize) {
assert(sizeof(double) == inputElementSize);
assert(sizeof(F64FloorResult_t) == outputElementSize);
assert(NULL != input);
assert(NULL != output);
myFloor(*input, output);
}
int main(int argc, const char* argv[]) {
IAInitialize();
{
double sourceData[20];
IAArray_t source = IAInitInstance();
if (IAArrayCreateWithInitializer
((IAStorageClassID_t)IA_STACK_ARRAY,
sizeof(double),
20,
&sourceData[0],
(IAInitializerWithIndex_t)randomDoubleInitializer,
&source)) {
IAArray_t result = IAInitInstance();
F64FloorResult_t resultData[20];
if (IAArrayMap
((IAValueMapping_t)doubleToF64FloorMapping,
&source,
(int32_t)IA_STACK_ARRAY,
sizeof(F64FloorResult_t),
&result)) {
assert(IAArraySize(&source) == IAArraySize(&result));
for (size_t index = 0;
index < IAArraySize(&source);
index++) {
const double* const ival =
(const double* const)IAArrayAccess(&source, index);
const F64FloorResult_t* const oval =
(const F64FloorResult_t* const)
IAArrayAccess(&result,index);
printf("(%g . #S(f64floorresult_t :div %g :rem %g))\n",
*ival, oval->div, oval->rem);
}
IAArrayReleaseReference(&result);
}
IAArrayReleaseReference(&source);
}
}
IATerminate();
return 0;
}
I see already the knives coming out of the satchels of your colleagues if you try to impose such a monstrosity upon them. They will hate you, you will hate yourself. Eventually, you will hate that you ever had the idea to even try.
Especially, if in a more suitable language, the same code might look like this:
(map 'list #'(lambda (x) (multiple-value-list (floor x)))
(loop repeat 20
for x = (random 1.0)
collecting x))
I was reading over some of the source code behind pngquant (here)
I got confused when I saw plus-equals seemingly assigning a new value to an array of structs (base += r in the code snippet below):
static void hist_item_sort_range(hist_item base[], unsigned int len, unsigned int sort_start)
{
for(;;) {
const unsigned int l = qsort_partition(base, len), r = l+1;
if (l > 0 && sort_start < l) {
len = l;
}
else if (r < len && sort_start > r) {
base += r; len -= r; sort_start -= r;
}
else break;
}
}
The hist_item definition is given as:
typedef struct {
f_pixel acolor;
float adjusted_weight, // perceptual weight changed to tweak how mediancut selects colors
perceptual_weight; // number of pixels weighted by importance of different areas of the picture
float color_weight; // these two change every time histogram subset is sorted
union {
unsigned int sort_value;
unsigned char likely_colormap_index;
} tmp;
} hist_item;
I apologize ahead of time, because I'm sure to those in the know this must be a really dumb question, but how is plus-equals operating on base, which appears to be an array of structs, and some integer r? It seems to me that this operation should be undefined for the combination of those two types.
I haven't had to write C for almost ten years, and I'm admittedly pretty rusty; however, searching for about thirty minutes only turned up answers to the wrong questions, and any help is appreciated. Thanks!
As explained in What is array decaying?
static void hist_item_sort_range(hist_item base[], unsigned int len, unsigned int sort_start)
becomes
static void hist_item_sort_range(hist_item* base, unsigned int len, unsigned int sort_start)
Where base is a pointer to the first element of the array. Therefore base += r; performs simple pointer arithmetic, i.e.: modifies the pointer to point to an offset of r elements from the start of the array.
Due to the += the original pointer is modified, so any access happens with offset from the now pointed to element.
To use the example from the comment:
After base += 1; accessing the "first" element via &base[0]; yields a pointer to the same element as &base[1]; before the increment
To optimize functions which have same pattern, I am considering two ways of implementation.
A environment of this function may used is inside of interrupts on embedded software. This is why I am faced of difficulty since considering of speed capacity is required.
In my opinion, following case 1 and 2 have same speed feature. However my colleague said there might be difference since first case need to access using pointer but second does not.
Which one is more faster?
I need your help to implement efficient code with speed.
typedef struct
{
unsigned char member1;
unsigned char member2;
..
unsigned char member10;
} my_struct
my_struct input[10];
void My_ISR1( void )
{
...
sub_func1( input[1] );
return 0;
}
void My_ISR2( void )
{
...
sub_func1( input[2] );
return 0;
}
void sub_func1( my_struct my_struct_input )
{
if( my_struct_input.member1 < my_struct_input.member2 )
{
...
}
...
return 0;
}
CASE2)
unsigned char member1of1;
unsigned char member2of1;
...
unsigned char member10of10;
void My_ISR1( void )
{
...
sub_func1( member1of1, ..., member10of1 );
return 0;
}
void My_ISR2( void )
{
...
sub_func1( member1of2, ..., member10of2 );
return 0;
}
void sub_func1( unsigned char member1,
unsigned char member2, ...,
unsigned char member 10 )
{
if( member1 < member2 )
{
...
}
...
return 0;
}
The only way to be sure if one implementation is faster than another, for your compiler, and your problem space, in your code, on your hardware, for your particular use case, is to measure it.
However, of the two options presented, I would expect the pass-by-struct to be slightly faster (by the way, in your code you are not passing by pointer)
In both presented cases, a copy of the variables is passed to the function.
In both cases, this results in a copy of 10 bytes, however given the struct is contiguous, this may be slightly faster.
However, a better option might be to pass by pointer eg:
void sub_func1( my_struct* my_struct_input )
{
if(my_struct_input->member1 < my_struct_input<member2)
///........
}
This way, instead of copying 10 individual variables, or a struct of 10 bytes, we are only copying one (presumably 32-bit, but it depends) address.
It does have the downside that you are now operating on the exact same struct as the caller, but that can be resolved using const pointers.
One further thing to consider, is that while the function call might be faster in one scenario or another, you have to look at the bigger picture. While passing a struct-pointer should be faster, you have to also consider the overhead in constructing the struct - if you have to assign the struct members from existing variables, this obviously adds extra processing, which must be factored into the consideration.
I was confronted with a tricky (IMO) question. I needed to compare two MAC addresses, in the most efficient manner.
The only thought that crossed my mind in that moment was the trivial solution - a for loop, and comparing locations, and so I did, but the interviewer was aiming to casting.
The MAC definition:
typedef struct macA {
char data[6];
} MAC;
And the function is (the one I was asked to implement):
int isEqual(MAC* addr1, MAC* addr2)
{
int i;
for(i = 0; i<6; i++)
{
if(addr1->data[i] != addr2->data[i])
return 0;
}
return 1;
}
But as mentioned, he was aiming for casting.
Meaning, to somehow cast the MAC address given to an int, compare both of the addresses, and return.
But when casting, int int_addr1 = (int)addr1;, only four bytes will be casted, right? Should I check the remaining ones? Meaning locations 4 and 5?
Both char and int are integer types so casting is legal, but what happens
in the described situation?
If he is really dissatisfied with this approach (which is essentially a brain fart, since you aren't comparing megabytes or gigabytes of data, so one shan't really be worrying about "efficiency" and "speed" in this case), just tell him that you trust the quality and speed of the standard library:
int isEqual(MAC* addr1, MAC* addr2)
{
return memcmp(&addr1->data, &addr2->data, sizeof(addr1->data)) == 0;
}
If your interviewer demands that you produce undefined behavior, I would probably look for a job elsewhere.
The correct initial approach would be to store the MAC address in something like a uint64_t, at least in-memory. Then comparisons would be trivial, and implementable efficiently.
Cowboy time:
typedef struct macA {
char data[6];
} MAC;
typedef struct sometimes_works {
long some;
short more;
} cast_it
typedef union cowboy
{
MAC real;
cast_it hack;
} cowboy_cast;
int isEqual(MAC* addr1, MAC* addr2)
{
assert(sizeof(MAC) == sizeof(cowboy_cast)); // Can't be bigger
assert(sizeof(MAC) == sizeof(cast_it)); // Can't be smaller
if ( ( ((cowboy_cast *)addr1)->hack.some == ((cowboy_cast *)addr2)->hack.some )
&& ( ((cowboy_cast *)addr1)->hack.more == ((cowboy_cast *)addr2)->hack.more ) )
return (0 == 0);
return (0 == 42);
}
There is nothing wrong with an efficient implementation, for all you know this has been determined to be hot code that is called many many times. And in any case, its okay for interview questions to have odd constraints.
Logical AND is a priori a branching instruction due to short-circuit evaluation even if it doesn't compile this way, so lets avoid it, we don't need it. Nor do we need to convert our return value to a true bool (true or false, not 0 or anything that's not zero).
Here is a fast solution on 32-bit:
XOR will capture the differences, OR will record difference in both parts, and NOT will negate the condition into EQUALS, not UNEQUAL. The LHS and RHS are independent computations, so a superscalar processor can do this in parallel.
int isEqual(MAC* addr1, MAC* addr2)
{
return ~((*(int*)addr2 ^ *(int*)addr1) | (int)(((short*)addr2)[2] ^ ((short*)addr1)[2]));
}
EDIT
The purpose of the above code was to show that this could be done efficiently without branching. Comments have pointed out this C++ classifies this as undefined behavior. While true, VS handles this fine. Without changing the interviewer's struct definition and function signature, in order to avoid undefined behavior an extra copy must be made. So the non-undefined behavior way without branching but with an extra copy would be as follows:
int isEqual(MAC* addr1, MAC* addr2)
{
struct IntShort
{
int i;
short s;
};
union MACU
{
MAC addr;
IntShort is;
};
MACU u1;
MACU u2;
u1.addr = *addr1; // extra copy
u2.addr = *addr2; // extra copy
return ~((u1.is.i ^ u2.is.i) | (int)(u1.is.s ^ u2.is.s)); // still no branching
}
This would work on most systems,and be faster than your solution.
int isEqual(MAC* addr1, MAC* addr2)
{
return ((int32*)addr1)[0] == ((int32*)addr2)[0] && ((int16*)addr1)[2] == ((int16*)addr2)[2];
}
would inline nicely too, could be handy at the center of loop on a system where you can check the details are viable.
Non-portable casting solution.
In a platform I use (PIC24 based), there is a type int48, so making a safe assumption char is 8 bits and the usual alignment requirements:
int isEqual(MAC* addr1, MAC* addr2) {
return *((int48_t*) &addr1->data) == *((int48_t*) &addr2->data);
}
Of course, this is not usable on many platforms, but then so are a number of solutions that are not portable either, depending on assumed int size, no padding, etc.
The highest portable solution (and reasonably fast given a good compiler) is the memcmp() offered by #H2CO3.
Going to a higher design level and using a wide enough integer type like uint64_t instead of struct macA, as suggested by Kerrek SB, is very appealing.
To do type punning correctly you have to use an union. Otherwise you will break the rules strict aliasing which certain compilers follow, and the result will be undefined.
int EqualMac( MAC* a , MAC* b )
{
union
{
MAC m ;
uint16_t w[3] ;
} ua , ub ;
ua.m = *a ;
ub.m = *b ;
if( ua.w[0] != ub.w[0] )
return 0 ;
if( ua.w[1] != ub.w[1] )
return 0 ;
if( ua.w[2] != ub.w[2] )
return 0 ;
return 1 ;
}
According to C99 it is safe to read from an union member that is not the last used to store a value in it.
If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.
You have a MAC structure (which contains an array of 6 bytes),
typedef struct {
char data[6];
} MAC;
Which agrees with this article about typedef for fixed length byte array.
The naive approach would be to assume the MAC address is word aligned (which is probably what the interviewer wanted), albeit not guaranteed.
typedef unsigned long u32;
typedef signed long s32;
typedef unsigned short u16;
typedef signed short s16;
int
MACcmp(MAC* mac1, MAC* mac2)
{
if(!mac1 || !mac2) return(-1); //check for NULL args
u32 m1 = *(u32*)mac1->data;
U32 m2 = *(u32*)mac2->data;
if( m1 != m2 ) return (s32)m1 - (s32)m2;
u16 m3 = *(u16*)(mac1->data+4);
u16 m2 = *(u16*)(mac2->data+4);
return (s16)m3 - (s16)m4;
}
Slightly safer would be to interpret the char[6] as a short[3] (MAC more likely to be aligned on even byte boundaries than odd),
typedef unsigned short u16;
typedef signed short s16;
int
MACcmp(MAC* mac1, MAC* mac2)
{
if(!mac1 || !mac2) return(-1); //check for NULL args
u16* p1 = (u16*)mac1->data;
u16* p2 = (u16*)mac2->data;
for( n=0; n<3; ++n ) {
if( *p1 != *p2 ) return (s16)*p1 - (s16)*p2;
}
return(0);
}
Assume nothing, and copy to word aligned storage, but the only reason for typecasting here is to satisfy the interviewer,
typedef unsigned short u16;
typedef signed short s16;
int
MACcmp(MAC* mac1, MAC* mac2)
{
if(!mac1 || !mac2) return(-1); //check for NULL args
u16 m1[3]; u16 p2[3];
memcpy(m1,mac1->data,6);
memcpy(m2,mac2->data,6);
for( n=0; n<3; ++n ) {
if( m1[n] != m2[n] ) return (s16)m1[n] - (s16)m2[n];
}
return(0);
}
Save yourself lots of work,
int
MACcmp(MAC* mac1, MAC* mac2)
{
if(!mac1 || !mac2) return(-1);
return memcmp(mac1->data,mac2->data,6);
}
Function memcmp will eventually do the loop itself. So by using it, you would basically just make things less efficient (due to the additional function-call).
Here is an optional solution:
typedef struct
{
int x;
short y;
}
MacAddr;
int isEqual(MAC* addr1, MAC* addr2)
{
return *(MacAddr*)addr1 == *(MacAddr*)addr2;
}
The compiler will most likely convert this code into two comparisons, since the MacAddr structure contains two fields.
Cavity: unless your CPU supports unaligned load/store operations, addr1 and addr2 must be aligned to 4 bytes (i.e., they must be located in addresses that are divisible by 4). Otherwise, a memory access violation will most likely occur when the function is executed.
You may divide the structure into 3 fields of 2 bytes each, or 6 fields of 1 byte each (reducing the alignment restriction to 2 or 1 respectively). But bare in mind that a single comparison in your source code is not necessarily a single comparison in the executable image (i.e., during runtime).
BTW, unaligned load/store operations by themselves may add runtime latency, if they require more "nops" in the CPU pipeline. This is really a matter of CPU architecture, which I doubt they meant to "dig into" that far in your job interview. However, in order to assert that the compiled code does not contain such operations (if indeed they are "expensive"), you could ensure that the variables are always aligned to 8 bytes AND add a #pragma (compiler directive) telling the compiler "not to worry about this".
May be he had in mind a definition of MAC that used unsigned char and was thinking to:
int isEqual(MAC* addr1, MAC* addr2) { return strncmp((*addr1).data,(*addr2).data,6)==0; }
which implies a cast from (unsigned char *) to (char *).
Anyway bad question.
By the way, for those truly looking for a performant answer, the following is branchless, and while it does more fetches (one per char), they should all be from the same cache line, so not very expensive.
int isEqual(MAC* addr1, MAC* addr2)
{
return
(addr1->data[0] == addr2->data[0])
& (addr1->data[1] == addr2->data[1])
& (addr1->data[2] == addr2->data[2])
& (addr1->data[3] == addr2->data[3])
& (addr1->data[4] == addr2->data[4])
& (addr1->data[5] == addr2->data[5])
;
}
See it live (and branchless) here
I know that memcmp() cannot be used to compare structs that have not been memset() to 0 because of uninitialized padding. However, in my program I have a struct with a few different types at the start, then several dozen of the same type until the end of the struct. My thought was to manually compare the first few types, then use a memcmp() on the remaining contiguous memory block of same typed members.
My question is, what does the C standard guarantee about structure padding? Can I reliably achieve this on any or all compilers? Does the C standard allow struct padding to be inserted between same type members?
I have implemented my proposed solution, and it seems to work exactly as intended with gcc:
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
struct foo
{
char a;
void *b;
int c;
int d;
int e;
int f;
};
static void create_struct(struct foo *p)
{
p->a = 'a';
p->b = NULL;
p->c = 1;
p->d = 2;
p->e = 3;
p->f = 4;
}
static int compare(struct foo *p1, struct foo *p2)
{
if (p1->a != p2->a)
return 1;
if (p1->b != p2->b)
return 1;
return
/* Note the typecasts to char * so we don't get a size in ints. */
memcmp(
/* A pointer to the start of the same type members. */
&(p1->c),
&(p2->c),
/* A pointer to the start of the last element to be compared. */
(char *)&(p2->f)
/* Plus its size to compare until the end of the last element. */
+sizeof(p2->f)
/* Minus the first element, so only c..f are compared. */
-(char *)&(p2->c)
) != 0;
}
int main(int argc, char **argv)
{
struct foo *p1, *p2;
int ret;
/* The loop is to ensure there isn't a fluke with uninitialized padding
* being the same.
*/
do
{
p1 = malloc(sizeof(struct foo));
p2 = malloc(sizeof(struct foo));
create_struct(p1);
create_struct(p2);
ret = compare(p1, p2);
free(p1);
free(p2);
if (ret)
puts("no match");
else
puts("match");
}
while (!ret);
return 0;
}
There is no guarantee of this in the C standard. From a practical standpoint it's true as part of the ABI for every current C implementation, and there seems to be no purpose in adding padding (e.g. it could not be used for checking against buffer overflows, since a conforming program is permitted to write to the padding). But strictly speaking it's not "portable".
Sadly, there is no C standard (that I have ever heard of) that allows you to control structure padding. There is the fact that automatic allocation that is initialized like this
struct something val = { 0 };
will cause all the members in val to be initialized to 0. But the padding in between is left to the implementation.
There are compiler extensions you can use like GCC's __attribute__((packed)) to eliminate most if not all structure padding, but aside from that you may be at a loss.
I also know that without major optimizations in place, most compilers won't bother to add structure padding in most cases, which would explain why this works under GCC.
That said, if your structure members cause odd alignment issues like this
struct something { char onebyte; int fourbyte; };
they will cause the compiler to add padding after the onebyte member to satisfy the alignment requirements of the fourbyte member.