Is this approach to hashing any generic object correct?

Is this approach to hashing any generic object correct? - c

Using the OpenJDK's hashCode, I tried to implement a generic hashing routine in C:
U32 hashObject(void *object_generic, U32 object_length) {
if (object_generic == NULL) return 0;
U8 *object = (U8*)object_generic;
U32 hash = 1;
for (U32 i = 0; i < object_length; ++i) {
// hash = 31 * hash + object[i]; // Original prime used in OpenJDK
hash = 92821 * hash + object[i]; // Better constant found here: https://stackoverflow.com/questions/1835976/what-is-a-sensible-prime-for-hashcode-calculation
}
return hash;
}
The idea is that I can pass a pointer to any C object (primitive type, struct, array, etc.) and the object will be uniquely hashed. However, since this is the first time I am doing something like this, I'd like to ask- Is this the right approach? Are there any pitfalls that I need to be aware of?

There are decidedly pitfalls. The below program using your function, for example, prints a different value for each equivalent object (and a different value every time it’s compiled) under gcc -O0:
#include <stddef.h>
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
struct foo {
char c;
int i;
};
static uint32_t hashObject(void const* object_generic, uint32_t object_length) {
if (object_generic == NULL) return 0;
uint8_t const* object = (uint8_t const*)object_generic;
uint32_t hash = 1;
for (uint32_t i = 0; i < object_length; ++i) {
hash = 92821 * hash + object[i];
}
return hash;
}
int main() {
struct foo a[2];
a[0].c = 'A';
a[0].i = 1;
a[1].c = 'A';
a[1].i = 1;
_Static_assert(
sizeof(struct foo) == offsetof(struct foo, i) + sizeof(int),
"struct has no end padding"
);
printf("%d\n", hashObject(&a[0], sizeof *a));
printf("%d\n", hashObject(&a[1], sizeof *a));
return EXIT_SUCCESS;
}
This happens because padding can contain anything.

In the comments you ask what would happen if you zero out the struct object before using them.
It would not help. The hashes could still be different because padding bytes take unspecified values when a value is stored into a struct object or a member of a struct object1. The unspecified values may change on every store.
There is an additional problem, with other types. Any scalar type (pointer, integers, and floating types) may have different representations of the same value. This is a similar problem as struct types have with padding bytes, mentioned above. The bit representations of scalar objects may change, even though the value did not, and the resulting hash will be different.
(Quoted from: ISO/IEC 9899:201x 6.2.6 Representation of types 6.2.6.1 General 6)
When a value is stored in an object of structure or union type, including in a member
object, the bytes of the object representation that correspond to any padding bytes take
unspecified values.

No.
std::vector<int> v1 = {1, 2, 3, 4};
std::vector<int> v2 = {1, 2, 3, 4};
std::cout << "hash1=" << hashobject(&v1, sizeof(v1))
<< "hash2=" << hashobject(&v1, sizeof(v1)) << std::endl;
would report two different hash values, which is probably not the intended behaviour.
PS: the question is about C rather than the C++, but the similar class can be in C.

Related

type-punning a char array struct member

Consider the following code:
typedef struct { char byte; } byte_t;
typedef struct { char bytes[10]; } blob_t;
int f(void) {
blob_t a = {0};
*(byte_t *)a.bytes = (byte_t){10};
return a.bytes[0];
}
Does this give aliasing problems in the return statement? You do have that a.bytes dereferences a type that does not alias the assignment in patch, but on the other hand, the [0] part dereferences a type that does alias.
I can construct a slightly larger example where gcc -O1 -fstrict-aliasing does make the function return 0, and I'd like to know if this is a gcc bug, and if not, what I can do to avoid this problem (in my real-life example, the assignment happens in a separate function so that both functions look really innocent in isolation).
Here is a longer more complete example for testing:
#include <stdio.h>
typedef struct { char byte; } byte_t;
typedef struct { char bytes[10]; } blob_t;
static char *find(char *buf) {
for (int i = 0; i < 1; i++) { if (buf[0] == 0) { return buf; }}
return 0;
}
void patch(char *b) {
*(byte_t *) b = (byte_t) {10};
}
int main(void) {
blob_t a = {0};
char *b = find(a.bytes);
if (b) {
patch(b);
}
printf("%d\n", a.bytes[0]);
}
Building with gcc -O1 -fstrict-aliasing produces 0

The main issue here is that those two structs are not compatible types. And so there can be various problems with alignment and padding.
That issue aside, the standard 6.5/7 only allows for this (the "strict aliasing rule"):
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
a type compatible with the effective type of the object,
...
an aggregate or union type that includes one of the aforementioned types among its members
Looking at *(byte_t *)a.bytes, then a.bytes has the effective type char[10]. Each individual member of that array has in turn the effective type char. You de-reference that with byte_t, which is not a compatible struct type nor does it have a char[10] among its members. It does have char though.
The standard is not exactly clear how to treat an object which effective type is an array. If you read the above part strictly, then your code does indeed violate strict aliasing, because you access a char[10] through a struct which doesn't have a char[10] member. I'd also be a bit concerned about the compiler padding either struct to meet alignment.
Generally, I'd simply advise against doing fishy things like this. If you need type punning, then use a union. And if you wish to use raw binary data, then use uint8_t instead of the potentially signed & non-portable char.

The error is in *(byte_t *)a.bytes = (byte_t){10};. The C spec has a special rule about character types (6.5§7), but that rule only applies when using character type to access any other type, not when using any type to access a character.

According to the Standard, the syntax array[index] is shorthand for *((array)+(index)). Thus, p->array[index] is equivalent to *((p->array) + (index)), which uses the address of p to compute the address of p->array, and then without regard for p's type, adds index (scaled by the size of the array-element type), and then dereferences the resulting pointer to yield an lvalue of the array-element type. Nothing in the wording of the Standard would imply that an access via the resulting lvalue is an access to an lvalue of the underlying structure type. Thus, if the struct member is an array of character type, the constraints of N1570 6.5p7 would allow an lvalue of that form to access storage of any type.
The maintainers of some compilers such as gcc, however, appear to view the laxity of the Standard there as a defect. This can be demonstrated via the code:
struct s1 { char x[10]; };
struct s2 { char x[10]; };
union s1s2 { struct s1 v1; struct s2 v2; } u;
int read_s1_x(struct s1 *p, int i)
{
return p->x[i];
}
void set_s2_x(struct s2 *p, int i, int value)
{
p->x[i] = value;
}
__attribute__((noinline))
int test(void *p, int i)
{
if (read_s1_x(p, 0))
set_s2_x(p, i, 2);
return read_s1_x(p, 0);
}
#include <stdio.h>
int main(void)
{
u.v2.x[0] = 1;
int result = test(&u, 0);
printf("Result = %d / %d", result, u.v2.x[0]);
}
The code abides the constraints in N1570 6.5p7 because it all accesses to any portion of u are performed using lvalues of character type. Nonetheless, the code generated by gcc will not allow for the possibility that the storage accessed by (*(struct s1))->x[0] might also be accessed by (*(struct s2))->x[i] despite the fact that both accesses use lvalues of character type.

Point to a function with an already - provided arguments [duplicate]

I would like this to work, but it does not:
#include <stdio.h>
typedef struct closure_s {
void (*incrementer) ();
void (*emitter) ();
} closure;
closure emit(int in) {
void incrementer() {
in++;
}
void emitter() {
printf("%d\n", in);
}
return (closure) {
incrementer,
emitter
};
}
main() {
closure test[] = {
emit(10),
emit(20)
};
test[0] . incrementer();
test[1] . incrementer();
test[0] . emitter();
test[1] . emitter();
}
It actually does compile and does work for 1 instance ... but the second one fails. Any idea how to get closures in C?
It would be truly awesome!

Using FFCALL,
#include <callback.h>
#include <stdio.h>
static void incrementer_(int *in) {
++*in;
}
static void emitter_(int *in) {
printf("%d\n", *in);
}
int main() {
int in1 = 10, in2 = 20;
int (*incrementer1)() = alloc_callback(&incrementer_, &in1);
int (*emitter1)() = alloc_callback(&emitter_, &in1);
int (*incrementer2)() = alloc_callback(&incrementer_, &in2);
int (*emitter2)() = alloc_callback(&emitter_, &in2);
incrementer1();
incrementer2();
emitter1();
emitter2();
free_callback(incrementer1);
free_callback(incrementer2);
free_callback(emitter1);
free_callback(emitter2);
}
But usually in C you end up passing extra arguments around to fake closures.
Apple has a non-standard extension to C called blocks, which do work much like closures.

The ANSI C has not a support for closure, as well as nested functions. Workaround for it is usage simple "struct".
Simple example closure for sum two numbers.
// Structure for keep pointer for function and first parameter
typedef struct _closure{
int x;
char* (*call)(struct _closure *str, int y);
} closure;
// An function return a result call a closure as string
char *
sumY(closure *_closure, int y) {
char *msg = calloc(20, sizeof(char));
int sum = _closure->x + y;
sprintf(msg, "%d + %d = %d", _closure->x, y, sum);
return msg;
}
// An function return a closure for sum two numbers
closure *
sumX(int x) {
closure *func = (closure*)malloc(sizeof(closure));
func->x = x;
func->call = sumY;
return func;
}
Usage:
int main (int argv, char **argc)
{
closure *sumBy10 = sumX(10);
puts(sumBy10->call(sumBy10, 1));
puts(sumBy10->call(sumBy10, 3));
puts(sumBy10->call(sumBy10, 2));
puts(sumBy10->call(sumBy10, 4));
puts(sumBy10->call(sumBy10, 5));
}
Result:
10 + 1 = 11
10 + 3 = 13
10 + 2 = 12
10 + 4 = 14
10 + 5 = 15
On C++11 it will be achived by use lambda expression.
#include <iostream>
int main (int argv, char **argc)
{
int x = 10;
auto sumBy10 = [x] (int y) {
std::cout << x << " + " << y << " = " << x + y << std::endl;
};
sumBy10(1);
sumBy10(2);
sumBy10(3);
sumBy10(4);
sumBy10(5);
}
A result, after compilation with a flag -std=c++11.
10 + 1 = 11
10 + 2 = 12
10 + 3 = 13
10 + 4 = 14
10 + 5 = 15

A Working Definition of a Closure with a JavaScript Example
A closure is a kind of object that contains a pointer or reference of some kind to a function to be executed along with the an instance of the data needed by the function.
An example in JavaScript from https://developer.mozilla.org/en-US/docs/Web/JavaScript/Closures is
function makeAdder(x) {
return function(y) { // create the adder function and return it along with
return x + y; // the captured data needed to generate its return value
};
}
which could then be used like:
var add5 = makeAdder(5); // create an adder function which adds 5 to its argument
console.log(add5(2)); // displays a value of 2 + 5 or 7
Some of the Obstacles to Overcome with C
The C programming language is a statically typed language, unlike JavaScript, nor does it have garbage collection, and some other features that make it easy to do closures in JavaScript or other languages with intrinsic support for closures.
One large obstacle for closures in Standard C is the lack of language support for the kind of construct in the JavaScript example in which the closure includes not only the function but also a copy of data that is captured when the closure is created, a way of saving state which can then be used when the closure is executed along with any additional arguments provided at the time the closure function is invoked.
However C does have some basic building blocks which can provide the tools for creating a kind of closure. Some of the difficulties are (1) memory management is the duty of the programmer, no garbage collection, (2) functions and data are separated, no classes or class type mechanics, (3) statically typed so no run time discovery of data types or data sizes, and (4) poor language facilities for capturing state data at the time the closure is created.
One thing that makes something of a closure facility possible with C is the void * pointer and using unsigned char as a kind of general purpose memory type which is then transformed into other types through casting.
An update with new approach
My original posted answer seems to have been helpful enough that people have upvoted it however it had a constraint or two that I didn't like.
Getting a notification of a recent upvote, I took a look at some of the other posted answers and realized that I could provide a second approach that would overcome the problem that bothered me.
A new approach that removes a problem of the original approach
The original approach required function arguments to be passed on the stack. This new approach eliminates that requirement. It also seems much cleaner. I'm keeping the original approach below.
The new approach uses a single struct, ClosureStruct, along with two functions to build the closure, makeClosure() and pushClosureArg().
This new approach also uses the variable argument functionality of stdarg.h to process the captured arguments in the closure data.
Using the following in a C source code file requires the following includes:
#include <stdio.h>
#include <stdlib.h>
#include <memory.h>
#include <stdarg.h>
typedef struct {
void (*p)(); // pointer to the function of this closure
size_t sargs; // size of the memory area allocated for closure data
size_t cargs; // current memory area in use for closure data
unsigned char * args; // pointer to the allocated closure data area
} ClosureStruct;
void * makeClosure(void (*p)(), size_t sargs)
{
// allocate the space for the closure management data and the closure data itself.
// we do this with a single call to calloc() so that we have only one pointer to
// manage.
ClosureStruct* cp = calloc(1, sizeof(ClosureStruct) + sargs);
if (cp) {
cp->p = p; // save a pointer to the function
cp->sargs = sargs; // save the total size of the memory allocated for closure data
cp->cargs = 0; // initialize the amount of memory used
cp->args = (unsigned char *)(cp + 1); // closure data is after closure management block
}
return cp;
}
void * pushClosureArg(void* cp, size_t sarg, void* arg)
{
if (cp) {
ClosureStruct* p = cp;
if (p->cargs + sarg <= p->sargs) {
// there is room in the closure area for this argument so make a copy
// of the argument and remember our new end of memory.
memcpy(p->args + p->cargs, arg, sarg);
p->cargs += sarg;
}
}
return cp;
}
This code is then used similar to the following:
// example functions that we will use with closures
// funcadd() is a function that accepts a closure with two int arguments
// along with three additional int arguments.
// it is similar to the following function declaration:
// void funcadd(int x1, int x2, int a, int b, int c);
//
void funcadd(ClosureStruct* cp, int a, int b, int c)
{
// using the variable argument functionality we will set our
// variable argument list address to the closure argument memory area
// and then start pulling off the arguments that are provided by the closure.
va_list jj;
va_start(jj, cp->args); // get the address of the first argument
int x1 = va_arg(jj, int); // get the first argument of the closure
int x2 = va_arg(jj, int);
printf("funcadd() = %d\n", a + b + c + x1 + x2);
}
int zFunc(ClosureStruct* cp, int j, int k)
{
va_list jj;
va_start(jj, cp->args); // get the address of the first argument
int i = va_arg(jj, int);
printf("zFunc() i = %d, j = %d, k = %d\n", i, j, k);
return i + j + k;
}
typedef struct { char xx[24]; } thing1;
int z2func( ClosureStruct* cp, int i)
{
va_list jj;
va_start(jj, cp->args); // get the address of the first argument
thing1 a = va_arg(jj, thing1);
printf("z2func() i = %d, %s\n", i, a.xx);
return 0;
}
int mainxx(void)
{
ClosureStruct* p;
int x;
thing1 xpxp = { "1234567890123" };
p = makeClosure(funcadd, 256);
x = 4; pushClosureArg(p, sizeof(int), &x);
x = 10; pushClosureArg(p, sizeof(int), &x);
p->p(p, 1, 2, 3);
free(p);
p = makeClosure(z2func, sizeof(thing1));
pushClosureArg(p, sizeof(thing1), &xpxp);
p->p(p, 45);
free(p);
p = makeClosure(zFunc, sizeof(int));
x = 5; pushClosureArg(p, sizeof(int), &x);
p->p(p, 12, 7);
return 0;
}
The output from the above usage is:
funcadd() = 20
z2func() i = 45, 1234567890123
zFunc() i = 5, j = 12, k = 7
However there is an issue with the above implementation, you have no way of getting the return value of a function that returns a value. In other words, the function zFunc() used in a closure above returns an int value which is ignored. If you try to capture the return value with something like int k = pint->p(pint, 12, 7); you will get an error message because the function pointer argument of ClosureStruct is void (*p)(); rather than int (*p)();.
To work around this restraint, we will add two C Preprocessor macros to help us create individual versions of the ClosureStruct struct that specify a function return type other than void.
#define NAME_CLOSURE(t) ClosureStruct_ ## t
#define DEF_CLOSURE(t) \
typedef struct { \
t (*p)(); \
size_t sargs; \
size_t cargs; \
unsigned char* args; \
} NAME_CLOSURE(t);
We then redefine the two functions, zFunc() and z2func(), as follows using the macros.
DEF_CLOSURE(int) // define closure struct that returns an int
int zFunc(NAME_CLOSURE(int)* cp, int j, int k)
{
va_list jj;
va_start(jj, cp->args); // get the address of the first argument
int i = va_arg(jj, int);
printf("zFunc() i = %d, j = %d, k = %d\n", i, j, k);
return i + j + k;
}
typedef struct { char xx[24]; } thing1;
int z2func( NAME_CLOSURE(int) * cp, int i)
{
va_list jj;
va_start(jj, cp->args); // get the address of the first argument
thing1 a = va_arg(jj, thing1);
printf("z2func() i = %d, %s\n", i, a.xx);
return 0;
}
And we use this as follows:
int mainxx(void)
{
ClosureStruct* p;
NAME_CLOSURE(int) *pint;
int x;
thing1 xpxp = { "1234567890123" };
p = makeClosure(funcadd, 256);
x = 4; pushClosureArg(p, sizeof(int), &x);
x = 10; pushClosureArg(p, sizeof(int), &x);
p->p(p, 1, 2, 3);
free(p);
pint = makeClosure(z2func, sizeof(thing1));
pushClosureArg(pint, sizeof(thing1), &xpxp);
int k = pint->p(pint, 45);
free(pint);
pint = makeClosure(zFunc, sizeof(int));
x = 5; pushClosureArg(pint, sizeof(int), &x);
k = pint->p(pint, 12, 7);
return 0;
}
First Implementation With Standard C and a Bit of Stretching Here and There
NOTE: The following example depends on a stack based argument passing convention as is used with most x86 32 bit compilers. Most compilers also allow for a calling convention to be specified other than stack based argument passing such as the __fastcall modifier of Visual Studio. The default for x64 and 64 bit Visual Studio is to use the __fastcall convention by default so that function arguments are passed in registers and not on the stack. See Overview of x64 Calling Conventions in the Microsoft MSDN as well as How to set function arguments in assembly during runtime in a 64bit application on Windows? as well as the various answers and comments in How are variable arguments implemented in gcc? .
One thing that we can do is to solve this problem of providing some kind of closure facility for C is to simplify the problem. Better to provide an 80% solution that is useful for a majority of applications than no solution at all.
One such simplification is to only support functions that do not return a value, in other words functions declared as void func_name(). We are also going to give up compile time type checking of the function argument list since this approach builds the function argument list at run time. Neither one of these things that we are giving up are trivial so the question is whether the value of this approach to closures in C outweighs what we are giving up.
First of all lets define our closure data area. The closure data area represents the memory area we are going to use to contain the information we need for a closure. The minimum amount of data I can think of is a pointer to the function to execute and a copy of the data to be provided to the function as arguments.
In this case we are going to provide any captured state data needed by the function as an argument to the function.
We also want to have some basic safe guards in place so that we will fail reasonably safely. Unfortunately the safety rails are a bit weak with some of the work arounds we are using to implement a form of closures.
The Source Code
The following source code was developed using Visual Studio 2017 Community Edition in a .c C source file.
The data area is a struct that contains some management data, a pointer to the function, and an open ended data area.
typedef struct {
size_t nBytes; // current number of bytes of data
size_t nSize; // maximum size of the data area
void(*pf)(); // pointer to the function to invoke
unsigned char args[1]; // beginning of the data area for function arguments
} ClosureStruct;
Next we create a function that will initialize a closure data area.
ClosureStruct * beginClosure(void(*pf)(), int nSize, void *pArea)
{
ClosureStruct *p = pArea;
if (p) {
p->nBytes = 0; // number of bytes of the data area in use
p->nSize = nSize - sizeof(ClosureStruct); // max size of the data area
p->pf = pf; // pointer to the function to invoke
}
return p;
}
This function is designed to accept a pointer to a data area which gives flexibility as to how the user of the function wants to manage memory. They can either use some memory on the stack or static memory or they can use heap memory via the malloc() function.
unsigned char closure_area[512];
ClosureStruct *p = beginClosure (xFunc, 512, closure_area);
or
ClosureStruct *p = beginClosure (xFunc, 512, malloc(512));
// do things with the closure
free (p); // free the malloced memory.
Next we provide a function that allows us to add data and arguments to our closure. The purpose of this function is to build up the closure data so that when closure function is invoked, the closure function will be provided any data it needs to do its job.
ClosureStruct * pushDataClosure(ClosureStruct *p, size_t size, ...)
{
if (p && p->nBytes + size < p->nSize) {
va_list jj;
va_start(jj, size); // get the address of the first argument
memcpy(p->args + p->nBytes, jj, size); // copy the specified size to the closure memory area.
p->nBytes += size; // keep up with how many total bytes we have copied
va_end(jj);
}
return p;
}
And to make this a bit simpler to use lets provide a wrapping macro which is generally handy but does have limitations since it is C Processor text manipulation.
#define PUSHDATA(cs,d) pushDataClosure((cs),sizeof(d),(d))
so we could then use something like the following source code:
unsigned char closurearea[256];
int iValue = 34;
ClosureStruct *dd = PUSHDATA(beginClosure(z2func, 256, closurearea), iValue);
dd = PUSHDATA(dd, 68);
execClosure(dd);
Invoking the Closure: The execClosure() Function
The last piece to this is the execClosure() function to execute the closure function with its data. What we are doing in this function is to copy the argument list supplied in the closure data structure onto the stack as we invoke the function.
What we do is cast the args area of the closure data to a pointer to a struct containing an unsigned char array and then dereference the pointer so that the C compiler will put a copy of the arguments onto the stack before it calls the function in the closure.
To make it easier to create the execClosure() function, we will create a macro that makes it easy to create the various sizes of structs we need.
// helper macro to reduce type and reduce chance of typing errors.
#define CLOSEURESIZE(p,n) if ((p)->nBytes < (n)) { \
struct {\
unsigned char x[n];\
} *px = (void *)p->args;\
p->pf(*px);\
}
Then we use this macro to create a series of tests to determine how to call the closure function. The sizes chosen here may need tweaking for particular applications. These sizes are arbitrary and since the closure data will rarely be of the same size, this is not efficiently using stack space. And there is the possibility that there may be more closure data than we have allowed for.
// execute a closure by calling the function through the function pointer
// provided along with the created list of arguments.
ClosureStruct * execClosure(ClosureStruct *p)
{
if (p) {
// the following structs are used to allocate a specified size of
// memory on the stack which is then filled with a copy of the
// function argument list provided in the closure data.
CLOSEURESIZE(p,64)
else CLOSEURESIZE(p, 128)
else CLOSEURESIZE(p, 256)
else CLOSEURESIZE(p, 512)
else CLOSEURESIZE(p, 1024)
else CLOSEURESIZE(p, 1536)
else CLOSEURESIZE(p, 2048)
}
return p;
}
We return the pointer to the closure in order to make it easily available.
An Example Using the Library Developed
We can use the above as follows. First a couple of example functions that don't really do much.
int zFunc(int i, int j, int k)
{
printf("zFunc i = %d, j = %d, k = %d\n", i, j, k);
return i + j + k;
}
typedef struct { char xx[24]; } thing1;
int z2func(thing1 a, int i)
{
printf("i = %d, %s\n", i, a.xx);
return 0;
}
Next we build our closures and execute them.
{
unsigned char closurearea[256];
thing1 xpxp = { "1234567890123" };
thing1 *ypyp = &xpxp;
int iValue = 45;
ClosureStruct *dd = PUSHDATA(beginClosure(z2func, 256, malloc(256)), xpxp);
free(execClosure(PUSHDATA(dd, iValue)));
dd = PUSHDATA(beginClosure(z2func, 256, closurearea), *ypyp);
dd = PUSHDATA(dd, 68);
execClosure(dd);
dd = PUSHDATA(beginClosure(zFunc, 256, closurearea), iValue);
dd = PUSHDATA(dd, 145);
dd = PUSHDATA(dd, 185);
execClosure(dd);
}
Which gives an output of
i = 45, 1234567890123
i = 68, 1234567890123
zFunc i = 45, j = 145, k = 185
Well What About Currying?
Next we could make a modification to our closure struct to allow us to do currying of functions.
typedef struct {
size_t nBytes; // current number of bytes of data
size_t nSize; // maximum size of the data area
size_t nCurry; // last saved nBytes for curry and additional arguments
void(*pf)(); // pointer to the function to invoke
unsigned char args[1]; // beginning of the data area for function arguments
} ClosureStruct;
with the supporting functions for currying and resetting of a curry point being
ClosureStruct *curryClosure(ClosureStruct *p)
{
p->nCurry = p->nBytes;
return p;
}
ClosureStruct *resetCurryClosure(ClosureStruct *p)
{
p->nBytes = p->nCurry;
return p;
}
The source code for testing this could be:
{
unsigned char closurearea[256];
thing1 xpxp = { "1234567890123" };
thing1 *ypyp = &xpxp;
int iValue = 45;
ClosureStruct *dd = PUSHDATA(beginClosure(z2func, 256, malloc(256)), xpxp);
free(execClosure(PUSHDATA(dd, iValue)));
dd = PUSHDATA(beginClosure(z2func, 256, closurearea), *ypyp);
dd = PUSHDATA(dd, 68);
execClosure(dd);
dd = PUSHDATA(beginClosure(zFunc, 256, closurearea), iValue);
dd = PUSHDATA(dd, 145);
dd = curryClosure(dd);
dd = resetCurryClosure(execClosure(PUSHDATA(dd, 185)));
dd = resetCurryClosure(execClosure(PUSHDATA(dd, 295)));
}
with the output of
i = 45, 1234567890123
i = 68, 1234567890123
zFunc i = 45, j = 145, k = 185
zFunc i = 45, j = 145, k = 295

GCC and clang have the blocks extension, which is essentially closures in C.

GCC supports inner functions, but not closures. C++0x will have closures. No version of C that I'm aware of, and certainly no standard version, provides that level of awesome.
Phoenix, which is part of Boost, provides closures in C++.

On this page you can find a description on how to do closures in C:
http://brodowsky.it-sky.net/2014/06/20/closures-in-c-and-scala/
The idea is that a struct is needed and that struct contains the function pointer, but gets provided to the function as first argument. Apart from the fact that it requires a lot of boiler plate code and the memory management is off course an issue, this works and provides the power and possibilities of other languages' closures.

You can achieve this with -fblocks flag, but it does not look so nice like in JS or TS:
#include <stdio.h>
#include <stdlib.h>
#include <Block.h>
#define NEW(T) ({ \
T* __ret = (T*)calloc(1, sizeof(T)); \
__ret; \
})
typedef struct data_t {
int value;
} data_t;
typedef struct object_t {
int (^get)(void);
void (^set)(int);
void (^free)(void);
} object_t;
object_t const* object_create(void) {
data_t* priv = NEW(data_t);
object_t* pub = NEW(object_t);
priv->value = 123;
pub->get = Block_copy(^{
return priv->value;
});
pub->set = Block_copy(^(int value){
priv->value = value;
});
pub->free = Block_copy(^{
free(priv);
free(pub);
});
return pub;
}
int main() {
object_t const* obj = object_create();
printf("before: %d\n", obj->get());
obj->set(321);
printf("after: %d\n", obj->get());
obj->free();
return 0;
}
clang main.c -o main.o -fblocks -fsanitize=address; ./main.o
before: 123
after: 321

The idiomatic way of doing it in is C is passing a function pointer and a void pointer to the context.
However, some time ago I came up with a different approach. Surprisingly, there is a family of builtin types in C that carries both a data and the code itself. Those are pointers to a function pointer.
The trick is use this single object to pass both the code by dereferencing a function pointer. And next passing the very same double function pointer as the context as a first argument. It looks a bit convoluted by actually it results in very flexible and readable machanism for closures.
See the code:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
// typedefing functions makes usually makes code more readable
typedef double double_fun_t(void*, double);
struct exponential {
// closure must be placed as the first member to allow safe casting
// between a pointer to `closure` and `struct exponential`
double_fun_t *closure;
double temperature;
};
double exponential(void *ctx_, double x) {
struct exponential *ctx = ctx_;
return exp(x / ctx->temperature);
}
// the "constructor" of the closure for exponential
double_fun_t **make_exponential(double temperature) {
struct exponential *e = malloc(sizeof *e);
e->closure = exponential;
e->temperature = temperature;
return &e->closure;
}
// now simple closure with no context, a pure x -> x*x mapping
double square(void *_unused, double x){
(void)_unused;
return x*x;
}
// use compound literal to transform a function to a closure
double_fun_t **square_closure = & (double_fun_t*) { square };
// the worker that process closures, note that `double_fun_t` is not used
// because `double(**)(void*,double)` is builtin type
double somme(double* liste, int length, double (**fun)(void*,double)){
double poids = 0;
for(int i=0;i<length;++i)
// calling a closure, note that `fun` is used for both obtaing
// the function pointer and for passing the context
poids = poids + (*fun)(fun, liste[i]);
return poids;
}
int main(void) {
double list[3] = { 1, 2, 3 };
printf("%g\n", somme(list, 3, square_closure));
// a dynamic closure
double_fun_t **exponential = make_exponential(42);
printf("%g\n", somme(list, 3, exponential));
free(exponential);
return 0;
}
The advantage of this approach is that the closure exports a pure interface for calling double->double functions. There is no need to introduce any boxing structures used by all clients of the closure. The only requirement is the "calling convention" which is very natural and does not require sharing any code.

Answer
#include <stdio.h>
#include <stdlib.h>
/*
File Conventions
----------------
alignment: similar statements only
int a = 10;
int* omg = {120, 5};
functions: dofunction(a, b, c);
macros: _do_macro(a, b, c);
variables: int dovariable=10;
*/
////Macros
#define _assert(got, expected, teardownmacro) \
do { \
if((got)!=(expected)) { \
fprintf(stderr, "line %i: ", __LINE__); \
fprintf(stderr, "%i != %i\n", (got), (expected)); \
teardownmacro; \
return EXIT_FAILURE; \
} \
} while(0);
////Internal Helpers
static void istarted() {
fprintf(stderr, "Start tests\n");
}
static void iended() {
fprintf(stderr, "End tests\n");
}
////Tests
int main(void)
{
///Environment
int localvar = 0;
int* localptr = NULL;
///Closures
#define _setup_test(mvar, msize) \
do { \
localptr=calloc((msize), sizeof(int)); \
localvar=(mvar); \
} while(0);
#define _teardown_test() \
do { \
free(localptr); \
localptr=NULL; \
} while(0);
///Tests
istarted();
_setup_test(10, 2);
_assert(localvar, 10, _teardown_test());
_teardown_test();
_setup_test(100, 5);
_assert(localvar, 100, _teardown_test());
_teardown_test();
iended();
return EXIT_SUCCESS;
}
Context
I was curious about how others accomplished this in C. I wasn't totally surprised when I didn't see this answer. Warning: This answer is not for beginners.
I live a lot more in the Unix style of thinking: lots of my personal programs and libraries are small and do one thing very well. Macros as "closures" are much safer in this context. I believe all the organization and specified conventions for readability is super important, so the code is readable by us later, and a macro looks like a macro and a function looks like a function. To clarify, not literally these personal conventions, just having some, that are specified and followed to distinguish different language constructs (macros and functions). We all should be doing that anyway.
Don't do afraid of macros. When it makes sense: use them. The advanced part is the when. My example is one example of the whens. They are ridiculously powerful and not that scary.
Rambling
I sometimes use a proper closure/lambda in other languages to execute a set of expressions over and over within a function. It's a little context aware private helper function. Regardless of its proper definition, that's something a closure can do. It helps me write less code. Another benefit of this is you don't need to reference a struct to know how to use it or understand what it's doing. The other answers do not have this benefit, and, if it wasn't obvious I hold readability very highly. I strive for simple legible solutions. This one time I wrote an iOS app and it was wonderful and as simple as I could get it. Then I wrote the same "app" in bash in like 5 lines of code and cursed.
Also embedded systems.

How to assign a predefined value to a struct member on type definition?

This is a long shot, but maybe there will be some ideas. On a system I programming, I have defined structures to program processor registers. The registers are comprised of several fields of a few bits each, with potentially "reserved" bits in between. When writing to a register, the reserved bits must be written as zeros.
For example:
typedef struct {
uint32_t power : 3;
uint32_t reserved : 24;
uint32_t speed : 5;
} ctrl_t;
void set_ctrl()
{
ctrl_t r = {
.power = 1;
.speed = 22;
.reserved = 0;
}
uint32_t *addr = 0x12345678;
*addr = *((uint32_t *) &r);
return;
}
I want to be able to set the reserved field to a default value (0 in this example), and to spare the need for an explicit assignment (which happens a lot in our system).
Note that if the instantiated object is static, then by default an uninitialized field will be 0. However, in the above example there is no guarantee, and also I need to set any arbitrary value.

Structure type definitions in C cannot express values for structure members. There is no mechanism for it. Structure instance definitions can do.
I want to be able to set the reserved field to a default value (0 in
this example), and to spare the need for an explicit assignment (which
happens a lot in our system).
Note that if the instantiated object is static, then by default an
uninitialized field will be 0. However, in the above example there is
no guarantee, and also I need to set any arbitrary value.
That the default value you want is 0 is fortuitous. You seem to have a misunderstanding, though: you cannot partially initialize a C object. If you provide an initializer in your declaration of a structure object, then any members not explicitly initialized get the same value that they would do if the object had static storage duration and no initializer.
Thus, you can do this:
void set_ctrl() {
ctrl_t r = {
.power = 1,
.speed = 22,
// not needed:
// .reserved = 0
};
// ...
If you want an easy way to initialize the whole structure with a set of default values, some non-zero, then you could consider writing a macro for the initializer:
#define CTRL_INITIALIZER { .power = 1, .speed = 22 }
// ...
void set_other_ctrl() {
ctrl_t r = CTRL_INITIALIZER;
// ...
Similarly, you can define a macro for partial content of an initializer:
#define CTRL_DEFAULTS .power = 1 /* no .speed = 22 */
// ...
void set_other_ctrl() {
ctrl_t r = { CTRL_DEFAULTS, .speed = 22 };
// ...
In this case you can even override the defaults:
ctrl_t r = { CTRL_DEFAULTS, .power = 2, .speed = 22 };
... but it is important to remember to use only designated member initializers, as above, not undesignated values.

It can't be done.
Values don't have "constructors" in the C++ sense in C. There's no way to guarantee that arbitrary code is run whenever a value of a certain type is created, so this can't be done. In fact "creation" of a value is quite a lose concept in C.
Consider this:
char buf[sizeof (ctrl_t)];
ctrl_t * const my_ctrl = (ctrl_t *) buf;
In this code, the pointer assignment would have to also include code to set bits of buf to various defaults, in order for it to work like you want.
In C, "what you see is what you get" often holds and the generated code is typically quite predictable, or better due to optimizations. But that kind of "magic" side-effect is really not how C tends to work.
It is probably better to not expose the "raw" register, but instead abstract out the existance of reserved bits:
void set_ctrl(uint8_t power, uint8_t speed)
{
const uint32_t reg = ((uint32_t) power << 29) | speed;
*(uint32_t *) 0x12345678 = reg;
}
This explicitly computes reg in a way that sets the unused bits to 0. You might of course add asserts to make sure the 3- and 5-bit range limits are not exceeded.

How do you cast from a bit-field to a pointer?

I've written the following bit of code that is producing a
warning: initialization makes pointer from integer without a cast
OR A
warning: cast to pointer from integer of different size
from gcc (GCC) 4.1.1 20070105 (Red Hat 4.1.1-52)
struct my_t {
unsigned int a : 1;
unsigned int b : 1;
};
struct my_t mine = {
.a = 1,
.b = 0
};
const void * bools[] = { "ItemA", mine->a, "ItemB", mine->b, 0, 0 };
int i;
for (i = 0; bools[i] != NULL; i += 2)
fprintf(stderr, "%s = %d\n", bools[i], (unsigned int) bools[i + 1] ? "true" : "false");
How do I get the warning to go away? No matter what I've tried casting, a warning seems to always appears.
Thanks,
Chenz

Hmm, why do you insist on using pointers as booleans? How about this alternative?
struct named_bool {
const char* name;
int val;
};
const struct named_bool bools[] = {{ "ItemA", 1 }, { "ItemB", 1 }, { 0, 0 }};

const void * bools[] = { "ItemA", mine->a, "ItemB", mine->b, 0, 0 };
There are several problems with this snippet:
mine isn't declared as a pointer type (at least not in the code you posted), so you shouldn't be using the -> component selection operator;
If you change that to use the . selection operator, you'd be attempting to store the boolean value in a or b as a pointer, which isn't what you want;
But that doesn't matter, since you cannot take the address of a bit-field (§ 6.5.3.2, paragraph 1).
If you're trying to associate a boolean value with another object, you'd be better off declaring a type like
struct checkedObject {void *objPtr; int check};
and initialize an array as
struct checkedObject[] = {{"ItemA", 1}, {"ItemB", 0}, {NULL, 0}};
Bit-fields have their uses, but this isn't one of them. You're really not saving any space in this case, since at least one complete addressable unit of storage (byte, word, whatever) needs to be allocated to hold the two bitfields.

Two problems:
Not sure why you are trying to convert unsigned int a:1 to a void*. If you are trying to reference it, the syntax would be &mine->a rather than mine->a, but...
You can't create a pointer to a bit in C (at least as far as I know). If you're trying to create a pointer to a bit, you may want to consider one the following options:
Create a pointer to the bitfield structure (i.e. struct my_t *), and (if necessary) use a separate number to indicate which bit to use. Example:
struct bit_ref {
struct my_t *bits;
unsigned int a_or_b; // 0 for bits->a, 1 for bits->b
}
Don't use a bit field. Use char for each flag, as it is the smallest data type that you can create a pointer to.
Do use a bit field, but implement it manually with boolean operations. Example:
typedef unsigned int my_t;
#define MY_T_A (1u << 0)
#define MY_T_B (1u << 1)
struct bit_ref {
struct my_t *bits;
unsigned int shift;
};
int deref(const struct bit_ref bit_ref)
{
return !!(bit_ref.bits & (1 << bit_ref.shift));
}

There's a few ways you could get rid of the warning, but still use a pointer value as a boolean. For example, you could do this:
const void * bools[] = { "ItemA", mine->a ? &bools : 0, "ItemB", mine->b ? &bools : 0, 0, 0 };
This uses a NULL pointer for false, and a non-null pointer (in this case, &bools, but a pointer to any object of the right storage duration would be fine) for true. You would also then remove the cast to unsigned int in the test, so that it is just:
fprintf(stderr, "%s = %d\n", bools[i], bools[i + 1] ? "true" : "false");
(A null pointer always evaluates as false, and a non-null pointer as true).
However, I do agree that you are better off creating an array of structs instead.

size of a datatype without using sizeof

I have a data type, say X, and I want to know its size without declaring a variable or pointer of that type and of course without using sizeof operator.
Is this possible? I thought of using standard header files which contain size and range of data types but that doesn't work with user defined data type.

To my mind, this fits into the category of "how do I add two ints without using ++, += or + ?". It's a waste of time. You can try and avoid the monsters of undefined behaviour by doing something like this.
size_t size = (size_t)(1 + ((X*)0));
Note that I don't declare a variable of type or pointer to X.

Look, sizeof is the language facility for this. The only one, so it is the only portable way to achieve this.
For some special cases you could generate un-portable code that used some other heuristic to understand the size of particular objects[*] (probably by making them keep track of their own size), but you'd have to do all the bookkeeping yourself.
[*] Objects in a very general sense rather than the OOP sense.

Well, I am an amateur..but I tried out this problem and I got the right answer without using sizeof. Hope this helps..
I am trying to find the size of an integer.
int *a,*s, v=10;
a=&v;
s=a;
a++;
int intsize=(int)a-(int)s;
printf("%d",intsize);

The correct answer to this interview question is "Why would I want to do that, when sizeof() does that for me, and is the only portable method of doing so?"

The possibility of padding prevent all hopes without the knowledge of the rules used for introducing it. And those are implementation dependent.

You could puzzle it out by reading the ABI for your particular processor, which explains how structures are laid out in memory. It's potentially different for each processor. But unless you're writing a compiler it's surprising you don't want to just use sizeof, which is the One Right Way to solve this problem.

if X is datatype:
#define SIZEOF(X) (unsigned int)( (X *)0+1 )
if X is a variable:
#define SIZEOF(X) (unsigned int)( (char *)(&X+1)-(char *)(&X) )

Try this:
int a;
printf("%u\n", (int)(&a+1)-(int)(&a));

Look into the compiler sources. You will get :
the size of standard data types.
the rules for padding of structs
and from this, the expected size of anything.
If you could at least allocate space for the variable, and fill some sentinel value into it, you could change it bit by bit, and see if the value changes, but this still would not tell you any information about padding.

Try This:
#include<stdio.h>
int main(){
int *ptr = 0;
ptr++;
printf("Size of int: %d",ptr);
return 0;

Available since C89 solution that in user code:
Does not declare a variable of type X.
Does not declare a pointer to type X.
Without using sizeof operator.
Easy enough to do using standard code as hinted by #steve jessop
offsetof(type, member-designator)
which expands to an integer constant expression that has type size_t, the value of which is the offset in bytes, to the structure member ..., from the beginning of its structure ... C11 §7.19 3
#include <stddef.h>
#include <stdio.h>
typedef struct {
X member;
unsigned char uc;
} sud03r_type;
int main() {
printf("Size X: %zu\n", offsetof(sud03r_type, uc));
return 0;
}
Note: This code uses "%zu" which requires C99 onward.

This is the code:
The trick is to make a pointer object, save its address, increment the pointer and then subtract the new address from the previous one.
Key point is when a pointer is incremented, it actually moves by the size equal to the object it is pointing, so here the size of the class (of which the object it is pointing to).
#include<iostream>
using namespace std;
class abc
{
int a[5];
float c;
};
main()
{
abc* obj1;
long int s1;
s1=(int)obj1;
obj1++;
long int s2=(int)obj1;
printf("%d",s2-s1);
}
Regards

A lot of these answers are assuming you know what your structure will look like. I believe this interview question is intended to ask you to think outside the box. I was looking for the answer but didn't find any solutions I liked here. I will make a better assumption saying
struct foo {
int a;
banana b;
char c;
...
};
By creating foo[2], I will now have 2 consecutive foo objects in memory. So...
foo[2] buffer = new foo[2];
foo a = buffer[0];
foo b = buffer[1];
return (&b-&a);
Assuming did my pointer arithmetic correctly, this should be the ticket - and its portable! Unfortunately things like padding, compiler settings, etc.. would all play a part too
Thoughts?

put this to your code
then check the linker output ( map file)
unsigned int uint_nabil;
unsigned long ulong_nabil;
you will get something like this ;
uint_nabil 700089a8 00000004
ulong_nabil 700089ac 00000004
4 is the size !!

One simple way of doing this would be using arrays.
Now, we know for the fact that in arrays elements of the same datatype are stored in a contiguous block of memory. So, by exploiting this fact I came up with following:
#include <iostream>
using namespace std;
int main()
{
int arr[2];
int* ptr = &arr[0];
int* ptr1 = &arr[1];
cout <<(size_t)ptr1-(size_t)ptr;
}
Hope this helps.

Try this,
#define sizeof_type( type ) ((size_t)((type*)1000 + 1 )-(size_t)((type*)1000))
For the following user-defined datatype,
struct x
{
char c;
int i;
};
sizeof_type(x) = 8
(size_t)((x*)1000 + 1 ) = 1008
(size_t)((x*)1000) = 1000

This takes into account that a C++ byte is not always 8 binary bits, and that only unsigned types have well defined overflow behaviour.
#include <iostream>
int main () {
unsigned int i = 1;
unsigned int int_bits = 0;
while (i!=0) {
i <<= 1;
++int_bits;
}
unsigned char uc = 1;
unsigned int char_bits = 0;
while (uc!=0) {
uc <<= 1;
++char_bits;
}
std::cout << "Type int has " << int_bits << "bits.\n";
std::cout << "This would be " << int_bits/8 << " IT bytes and "
<< int_bits/char_bits << " C++ bytes on your platform.\n";
std::cout << "Anyways, not all bits might be usable by you. Hah.\n";
}
Surely, you could also just #include <limit> or <climits>.

main()
{
clrscr();
int n;
float x,*a,*b;//line 1
a=&x;
b=(a+1);
printf("size of x is %d",
n=(char*)(b)-(char*)a);
}
By this code script the size of any data can be calculated without sizeof operator.Just change the float in line 1 with the type whose size you want to calculate

#include <stdio.h>
struct {
int a;
char c;
};
void main() {
struct node*temp;
printf("%d",(char*)(temp+1)-(char*)temp);
}

# include<stdio.h>
struct node
{
int a;
char c;
};
void main()
{
struct node*ptr;
ptr=(struct node*)0;
printf("%d",++ptr);
}

#include <bits/stdc++.h>
using namespace std;
int main()
{
// take any datatype hear
char *a = 0; // output: 1
int *b = 0; // output: 4
long *c = 0; // output: 8
a++;
b++;
c++;
printf("%d",a);
printf("%d",b);
printf("%d",c);
return 0;
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Is this approach to hashing any generic object correct? - c

Related

type-punning a char array struct member

Point to a function with an already - provided arguments [duplicate]

How to assign a predefined value to a struct member on type definition?

How do you cast from a bit-field to a pointer?

size of a datatype without using sizeof

Categories

Resources