What are the considerations when returning by value vs by reference - Set ADT C - c

When looking at the header file of Set ADT in C, I'm trying to understand why was the function setUnion or setIntersection declared that way:
Set setUnion(Set set1, Set set2);
Set setIntersection(Set set1, Set set2);
I couldn't find the implementation but I'm assuming that inside those functions, we allocate more space and create new set, and then add all the necessary elements.
I thought that set1 and set2 are passed by reference, so why not update one of them and save mem allocations and just return some enum that notifies whether the update succeeded or not? (for example we can update the left parameter).
If they're not passed by reference, how can I change the signature in order to do so?
Thank you!

The Set almost certainly is a pointer hidden behind a typedef, so there is an actual pass by reference of the internal struct which is all that counts.
More often than not one needs to calculate a union or an intersection of two sets without mutating either of them. In fact it is quite probable that
Set result = setIntersection(set1, set2);
freeSet(set1);
set1 = result;
Would not be any less performant than your proposed alternative
setIntersectionInPlace(set1, set2);
Whereas the more common case of calculating an intersection of immutable sets using setIntersectionInplace would need to be written
Set result = setCopy(set1);
setIntersectionInplace(result, set2);
Which would make a needless copy of a set1, which is larger, or equal in size to size of result

Related

Why does the type signature of linear array change compared to normal array?

I'm going through an example in A Taste of Linear Logic.
It first introduces the standard array with the usual operations defined (page 24):
Then suggests that a linear equivalent (using a linear logic for type signatures to restrict array copying) would have a slightly different type signature:
This is designed with the idea that array contains values that are cheap to copy but that the array itself is expensive to copy and thus should be passed along from use to use as a handle.
Question: The signatures for lookup and update correspond well to the standard signatures, but how do I interpret the signature for new?
In particular:
The function new does not seem to return an array. How can I get an array to use if one is not provided?
I think I do understand that Arr –o Arr x X is not derivable using linear logic and therefore a function to extract individual values without consuming the array is needed, but I don't understand why new doesn't provide that function directly
In practical terms, this is about garbage collection.
Linear logic avoids making copies as well as leaving unused values lying around. So when you create an array with new, you also need to make sure it's eventually cleaned up again.
How can you make sure it is cleaned up? Well, in this example they do it by not giving back the array as the result, but instead “lending” it to the caller. The function Arr ⊸ Arr ⊗ X must give an array back in the end, in addition to the result you're actually interested in. It's assumed that this will be a modified form of the array you started out with. Only the X is passed back to the caller, the Arr is deallocated.

Why is LINKED_SET unable to compare objects?

As I'd like to know if some object is into a LINKED_SET to prune it in my context, I'm unable to compare it as an object instead of its reference.
changeable_comparison_criterion: BOOLEAN
-- May `object_comparison' be changed?
-- (Answer: only if set empty; otherwise insertions might
-- introduce duplicates, destroying the set property.)
do
Result := is_empty
ensure then
only_on_empty: Result = is_empty
end
Into the SET class (as above) it seems that its not possible to change a set to compare_objects. So my questions are:
What is the semantic of not being able to compare objects into a SET
If my choice of LINKED_SET is wrong by misunderstanding of its semantic, how should I do for having a unique items collection based on object comparison and then being able to prune an item again based on object comparison again
The comparison criterion should be set right after container creation, then it works without a problem. If there are some objects in the set already, it becomes unclear what should be done to them if the comparison criterion changes.
For example, if there is a set {A, B} of two distinct objects A and B that have the same value, i.e. are equal, what should be done if the comparison criterion changes from compare_references to compare_objects? Clearly, the set now should have only one object, because according to the new setting, it cannot hold two or more equal objects. Does it mean, object A should be removed and B should be kept? Or should it be done in reverse order? The precondition you are referring to removes this ambiguity.
The solution is to modify the setting before there are any objects in the container:
create my_set.make
my_set.compare_objects

How do I create array with dynamic length rather than slice in golang?

For example: I want to use reflect to get a slice's data as an array to manipulate it.
func inject(data []int) {
sh := (*reflect.SliceHeader)(unsafe.Pointer(&data))
dh := (*[len(data)]int)(unsafe.Pointer(sh.Data))
printf("%v\n", dh)
}
This function will emit a compile error for len(data) is not a constant. How should I fix it?
To add to the #icza's comment, you can easily extract the underlying array by using &data[0]—assuming data is an initialized slice. IOW, there's no need to jump through the hoops here: the address of the first slice's element is actually the address of the first slot in the slice's underlying array—no magic here.
Since taking an address of an element of an array is creating
a reference to that memory—as long as the garbage collector is
concerned—you can safely let the slice itself go out of scope
without the fear of that array's memory becoming inaccessible.
The only thing which you can't really do with the resulting
pointer is passing around the result of dereferencing it.
That's simply because arrays in Go have their length encoded in
their type, so you'll be unable to create a function to accept
such array—because you do not know the array's length in advance.
Now please stop and think.
After extracting the backing array from a slice, you have
a pointer to the array's memory.
To sensibly carry it around, you'll also need to carry around
the array's length… but this is precisely what slices do:
they pack the address of a backing array with the length of the
data in it (and also the capacity).
Hence really I think you should reconsider your problem
as from where I stand I'm inclined to think it's a non-problem
to begin with.
There are cases where wielding pointers to the backing arrays
extracted from slices may help: for instance, when "pooling"
such arrays (say, via sync.Pool) to reduce memory churn
in certain situations, but these are concrete problems.
If you have a concrete problem, please explain it,
not your attempted solution to it—what #Flimzy said.
Update I think I should may be better explain the
you can't really do with the resulting
pointer is passing around the result of dereferencing it.
bit.
A crucial point about arrays in Go (as opposed to slices)
is that arrays—as everything in Go—are passed around
by value, and for arrays that means their data is copied.
That is, if you have
var a, b [8 * 1024 * 1024]byte
...
b = a
the statement b = a would really copy 8 MiB of data.
The same obviously applies to arguments of functions.
Slices sidestep this problem by holding a pointer
to the underlying (backing) array. So a slice value
is a little struct type containing
a pointer and two integers.
Hence copying it is really cheap but "in exchange" it
has reference semantics: both the original value and
its copy point to the same backing array—that is,
reference the same data.
I really advise you to read these two pieces,
in the indicated order:
https://blog.golang.org/go-slices-usage-and-internals
https://blog.golang.org/slices

In Matlab, can I access an element of an array, which is in turn a value of a container.Map?

Here is a code snippet, that shows what I want and the error, that follows:
a = [1, 2];
m = containers.Map('KeyType','char', 'ValueType','any');
m('stackoverflow.com') = a;
pull_the_first_element_of_the_stored_array = m('stackoverflow.com')(1);
??? Error: ()-indexing must appear last in an index expression.
How do I access an element of the array, which is in turn a value of a map object?
I could have done this:
temp = m('stackoverflow.com');
pull_the_first_element_of_the_stored_array = temp(1);
But I do not want to create an intermediate array only to pull a single value out of it.
EDIT : This is a duplicate of How can I index a MATLAB array returned by a function without first assigning it to a local variable? The answer is there.
This is another case where you can get around syntax limitations with small helper functions. EG:
getFirst = #(x)x(1);
pull_the_first_element_of_the_stored_array = getFirst(m('stackoverflow.com'));
This still needs two lines, but you can often reuse the function definition. More generally, you could write:
getNth = #(x, n) x(n);
And then use:
getNth (m('stackoverflow.com'),1);
Although this question is a duplicate of this previous question, I feel compelled to point out one small difference between the problems they are addressing, and how my previous answer could be adapted slightly...
The previous question dealt with how to get around the syntax issue involved in having a function call immediately followed by an indexing operation on the same line. This question instead deals with two indexing operations immediately following one another on the same line. The two solutions from my other answer (using SUBSREF or a helper function) also apply, but there is actually an alternative way to use SUBSREF that combines the two indexing operations, like so:
value = subsref(m,struct('type','()','subs',{'stackoverflow.com',{1}}));
Note how the sequential index subscripts 'stackoverflow.com' and 1 are combined into a cell array to create a 1-by-2 structure array to pass to SUBSREF. It's still an ugly one-liner, and I would still advocate using the temporary variable solution for the sake of readability.

C - How to implement Set data structure?

Is there any tricky way to implement a set data structure (a collection of unique values) in C? All elements in a set will be of the same type and there is a huge RAM memory.
As I know, for integers it can be done really fast'N'easy using value-indexed arrays. But I'd like to have a very general Set data type. And it would be nice if a set could include itself.
There are multiple ways of implementing set (and map) functionality, for example:
tree-based approach (ordered traversal)
hash-based approach (unordered traversal)
Since you mentioned value-indexed arrays, let's try the hash-based approach which builds naturally on top of the value-indexed array technique.
Beware of the advantages and disadvantages of hash-based vs. tree-based approaches.
You can design a hash-set (a special case of hash-tables) of pointers to hashable PODs, with chaining, internally represented as a fixed-size array of buckets of hashables, where:
all hashables in a bucket have the same hash value
a bucket can be implemented as a dynamic array or linked list of hashables
a hashable's hash value is used to index into the array of buckets (hash-value-indexed array)
one or more of the hashables contained in the hash-set could be (a pointer to) another hash-set, or even to the hash-set itself (i.e. self-inclusion is possible)
With large amounts of memory at your disposal, you can size your array of buckets generously and, in combination with a good hash method, drastically reduce the probability of collision, achieving virtually constant-time performance.
You would have to implement:
the hash function for the type being hashed
an equality function for the type being used to test whether two hashables are equal or not
the hash-set contains/insert/remove functionality.
You can also use open addressing as an alternative to maintaining and managing buckets.
Sets are usually implemented as some variety of a binary tree. Red black trees have good worst case performance.
These can also be used to build an map to allow key / value lookups.
This approach requires some sort of ordering on the elements of the set and the key values in a map.
I'm not sure how you would manage a set that could possibly contain itself using binary trees if you limit set membership to well defined types in C ... comparison between such constructs could be problematic. You could do it easily enough in C++, though.
The way to get genericity in C is by void *, so you're going to be using pointers anyway, and pointers to different objects are unique. This means you need a hash map or binary tree containing pointers, and this will work for all data objects.
The downside of this is that you can't enter rvalues independently. You can't have a set containing the value 5; you have to assign 5 to a variable, which means it won't match a random 5. You could enter it as (void *) 5, and for practical purposes this is likely to work with small integers, but if your integers can get into large enough sizes to compete with pointers this has a very small probability of failing.
Nor does this work with string values. Given char a[] = "Hello, World!"; char b[] = "Hello, World!";, a set of pointers would find a and b to be different. You would probably want to hash the values, but if you're concerned about hash collisions you should save the string in the set and do a strncmp() to compare the stored string with the probing string.
(There's similar problems with floating-point numbers, but trying to represent floating-point numbers in sets is a bad idea in the first place.)
Therefore, you'd probably want a tagged value, one tag for any sort of object, one for integer value, and one for string value, and possibly more for different sorts of values. It's complicated, but doable.
If the maximum number of elements in the set (the cardinality of the underlying data type) is small enough, you might want to consider using a plain old array of bits (or whatever you call them in your favourite language).
Then you have a simple set membership check: bit n is 1 if element n is in the set. You could even count 'ordinary' members from 1, and only make bit 0 equal to 1 if the set contains itself.
This approach will probably require some sort of other data structure (or function) to translate from the member data type to the position in the bit array (and back), but it makes basic set operations (union, intersection, membership test, difference, insertion, removal,compelment) very very easy. And it is only suitable for relatively small sets, you wouldn't want to use it for sets of 32-bit integers I don't suppose.

Resources