Why is LINKED_SET unable to compare objects? - eiffel

As I'd like to know if some object is into a LINKED_SET to prune it in my context, I'm unable to compare it as an object instead of its reference.
changeable_comparison_criterion: BOOLEAN
-- May `object_comparison' be changed?
-- (Answer: only if set empty; otherwise insertions might
-- introduce duplicates, destroying the set property.)
do
Result := is_empty
ensure then
only_on_empty: Result = is_empty
end
Into the SET class (as above) it seems that its not possible to change a set to compare_objects. So my questions are:
What is the semantic of not being able to compare objects into a SET
If my choice of LINKED_SET is wrong by misunderstanding of its semantic, how should I do for having a unique items collection based on object comparison and then being able to prune an item again based on object comparison again

The comparison criterion should be set right after container creation, then it works without a problem. If there are some objects in the set already, it becomes unclear what should be done to them if the comparison criterion changes.
For example, if there is a set {A, B} of two distinct objects A and B that have the same value, i.e. are equal, what should be done if the comparison criterion changes from compare_references to compare_objects? Clearly, the set now should have only one object, because according to the new setting, it cannot hold two or more equal objects. Does it mean, object A should be removed and B should be kept? Or should it be done in reverse order? The precondition you are referring to removes this ambiguity.
The solution is to modify the setting before there are any objects in the container:
create my_set.make
my_set.compare_objects

Related

How to modify array under specific JSONB key in PostgreSQL?

We're storing various heterogeneous data in a JSONB column called ext and under some keys we have arrays of values. I know how to replace the whole key (||). If I want to add one or two values I still need to extract the original values (that would be ext->'key2' in the example lower) - in some cases this may be too many.
I realize this is trivial problem in relational world and that PG still needs to overwrite the whole row anyway, but at least I don't need to pull the unchanged part of the data from DB to the application and push them back.
I can construct the final value of the array in the select, but I don't know how to merge this into the final value of ext so it is usable in UPDATE statement:
select ext, -- whole JSONB
ext->'key2', -- JSONB array
ARRAY(select jsonb_array_elements_text(ext->'key2')) || array['asdf'], -- array + concat
ext || '{"key2":["new", "value"]}' -- JSONB with whole "key2" key replaced (not what I want)
from (select '{"key1": "val1", "key2": ["val2-1", "val2-2"]}'::jsonb ext) t
So the question: How to write such a modification into the UPDATE statement?
Example uses jsonb_*_text function, some values are non-textual, e.g. numbers, that would need non _text function, but I know what type it is when I construct the query, no problem here.
We also need to remove the values from the arrays as well, in which case if the array is completely empty we would like to remove the key from the JSONB altogether.
Currently we achieve this with this expression in the UPDATE statement
coalesce(ext, '{}')::jsonb - <array of items to delete> || <jsonb with additions> (<parts> are symbolic here, we use single JDBC parameter for each value). If the final value of the array is empty, the key for that value goes into the first array, otherwise the final value appears int he JSONB after || operator.
To be clear:
I know the path to the JSONB value I want to change - it's actually always a single key on the top level.
I know whether that key stores single value (no problem for those) or array (that's where I don't have satisfying solution yet), because we know the definitions of each key, this is stored separately.
I need to add and/or remove multiple values I provide, but I don't know what is in the array at that moment - that's the whole point, so that application doesn't need to read it.
I may also want to replace the whole array under the key, but this is trivial case and I know how to do this.
Finally, if removal results in an empty array, we'd like to get rid of the key as well.
I could probably write a function doing it all if necessary but I've not committed to that yet.
Obviously, restructuring the data out of that JSONB column is not an option. Eventually I want to make it more flexible and data with these characteristics would go to some other table, but at this moment we're not able to do it with our application.
You can use jsonb_set to modify an array which is placed under some key.
To update a value in an array you should specify a zero-based index within the array in the below example.
To add a new element on a start/end - specify negative/positive index which is greter than array's length.
UPDATE <table>
SET ext = jsonb_set(ext, '{key2, <index>}', '5')
WHERE <condition>

What are the considerations when returning by value vs by reference - Set ADT C

When looking at the header file of Set ADT in C, I'm trying to understand why was the function setUnion or setIntersection declared that way:
Set setUnion(Set set1, Set set2);
Set setIntersection(Set set1, Set set2);
I couldn't find the implementation but I'm assuming that inside those functions, we allocate more space and create new set, and then add all the necessary elements.
I thought that set1 and set2 are passed by reference, so why not update one of them and save mem allocations and just return some enum that notifies whether the update succeeded or not? (for example we can update the left parameter).
If they're not passed by reference, how can I change the signature in order to do so?
Thank you!
The Set almost certainly is a pointer hidden behind a typedef, so there is an actual pass by reference of the internal struct which is all that counts.
More often than not one needs to calculate a union or an intersection of two sets without mutating either of them. In fact it is quite probable that
Set result = setIntersection(set1, set2);
freeSet(set1);
set1 = result;
Would not be any less performant than your proposed alternative
setIntersectionInPlace(set1, set2);
Whereas the more common case of calculating an intersection of immutable sets using setIntersectionInplace would need to be written
Set result = setCopy(set1);
setIntersectionInplace(result, set2);
Which would make a needless copy of a set1, which is larger, or equal in size to size of result

Concise way to change object across multiple arrays

I'm looking to know if changing an object in one array will change it in others.
I have an array of tasks, each of which is a hash object with :id, :user, :task
I then use duplicates = tasks.select{|task| sample code} to select some tasks from that array.
If I change a task in duplicates, will it change in tasks as well? If not, are there any good ways to search for that same task?
There is no such thing as "changing an object across arrays". An object does not know and does not care whether it is in an array or not.
Changing an object changes an object. Period. If that object is contained in multiple arrays, then you will observe that change, regardless of how you obtain the object. But it is not "changed across arrays". It is simply changed.
If you cut your hair, everybody who looks at you will observe that your hair is short, regardless of whether they perceive you as "student #1234", "the quarterback of the Backwater Swamprats", or "the youngest son of the Smith family". What reference is used to address you has no bearing on whether your hair is short or not.

Avoiding database when checking for existing values

Is there a way through hashes or bitwise operators or another algorithm to avoid using database when simply checking for previously appeared string or value?
Assuming, there is no way to store whole history of the strings appeared before, only little information can be stored.
You may be interested in Bloom filters. They don't let you authoritatively say, "yes, this value is in the set of interest", but they do let you say "yes, this value is probably in the set" vs. "no, this value definitely is not in the set". For many situations, that's enough to be useful.
The way it works is:
You create an array of Boolean values (i.e. of bits). The larger you can afford to make this array, the better.
You create a bunch of different hash functions that each take an input string and map it to one element of the array. You want these hash functions to be independent, so that even if one hash function maps two strings to the same element, a different hash function will most likely map them to different elements.
To record that a string is in the set, you apply each of your hash functions to it in turn — giving you a set of elements in the array — and you set all of the mapped-to elements to TRUE.
To check if a string is (probably) is in the set, you do the same thing, except that now you just check the mapped-to elements to see if they are TRUE. If all of them are TRUE, then the string is probably in the set; otherwise, it definitely isn't.
If you're interested in this approach, see https://en.wikipedia.org/wiki/Bloom_filter for detailed analysis that can help you tune the filter appropriately (choosing the right array-size and number of hash functions) to get useful probabilities.

Hiding vars in strings VS using objects with properties?

So, I've got a word analyzing program in Excel with which I hope to be able to import over 30 million words.
At first,I created a separate object for each of these words so that each word has a...
.value '(string), the actual word itself
.bool1 '(boolean)
.bool2 '(boolean)
.bool3 '(boolean)
.isUsed '(boolean)
.cancel '(boolean)
When I found out I may have 30 million of these objects (all stored in a single collection), I thought that this could be a monster to compile. And so I decided that all my words would be strings, and that I would stick them into an array.
So my array idea is to append each of the 30 million strings by adding 5 spaces (for my 5 bools) at the beginning of each string, with each empty space representing a false bool val. e.g,
If instr(3, arr(n), " ") = 1 then
'my 3rd bool val is false.
Elseif instr(3, arr(n), "*") = 1 then '(I'll insert a '*' to denote true)
'my third bool val is true.
End If
Anyway, what do you guys think? Which way (collection or array) should I go about this (for optimization specifically)?
(I wanted to make this a comment but it became too long)
An answer would depend on how you want to access and process the words, once stored.
There are significant benefits and distinct advantages for 3 candidates:
Arrays are very efficient to populate and retrieve all items at once (ex. range to array and array back to range), but much slower at re-sizing and inserting items in the middle. Each Redim copies the entire memory block to a larger location, and if Preserve is used, all values copied over as well. This may translate to perceived slowness for every operation (in a potential application)
More details (arrays vs collections) here (VB specific but it applies to VBA as well)
Collections are linked lists with hash-tables - quite slow to populate but after that you get instant access to any element in the collection, and just as fast at reordering (sorting) and re-sizing. This can translate into a slow opening file, but all other operations are instant. Other aspects:
Retrieve keys as well as the items associated with those keys
Handle case-sensitive keys
Items can be other collections, arrays, objects
While keys must be unique, they are also optional
An item can be returned in reference to its key, or in reference to its index value
Keys are always strings, and always case insensitive
Items are accessible and retrievable, but its keys are not
Cannot remove all items at once (either one by one, or destroy then recreate the Collection
Enumerating with For...Each...Next, lists all items
More info here and here
Dictionaries: same as collections but with the extra benefit of the .Exists() method which, in some scenarios, makes them much faster than collections. Other aspects:
Keys are mandatory and always unique to that Dictionary
An item can only be returned in reference to its key
The key can take any data type; for string keys, by default a Dictionary is case sensitive
Exists() method to test for the existence of a particular key (and item)
Collections have no similar test; instead, you must attempt to retrieve a value from the Collection, and handle the resulting error if the key is not found
Items AND keys are always accessible and retrievable to the developer
Item property is read/write, so it allows changing the item associated with a particular key
Allows you to remove all items in a single step without destroying the Dictionary itself
Using For...Each...Next dictionaries will enumerate the keys
A Dictionary supports implicit adding of an item using the Item property.
In Collections, items must be added explicitly
More details here
Other links: optimizing loops and optimizing strings (same site)

Resources