perl compare sorted array using smart match - arrays

The following is the code, actually, #arr0 and #arr1 are not equal, even after sorting, they are not equal, but why "eq" will be printed? at first, I thought about the return value of sort funtion, but it did return an array,so what's the reason?
my #arr0 = (1,2);
my #arr1 = ("a","b");
if ( (sort #arr0) ~~ (sort #arr1) ) {
print "eq\n";
};

[Note: All links to documentation in this answer are to the documentation for version 5.12.1. This ensures the answer is useful for the original poster - it might make it less useful for other people.]
It's important to realise that arrays and lists are not the same. This is one case where the behaviour is different.
It's also important to read the documentation for sort(), which starts by saying:
In list context, this sorts the LIST and returns the sorted list value. In scalar context, the behaviour of sort() is undefined.
There are two important things there. Firstly, in list context, sort() returns a list, not an array. And secondly, in scalar context, its behaviour is undefined.
Now let's look at the smartmatch documentation. That's a big table of left- and right-operands that I won't reproduce here. But note that it doesn't mention lists at all. So, almost certainly, smartmatch is calling sort() in scalar context and doing either a string or numeric comparison
on the results (one of the last few rows in the table).
But we know that sort()'s behaviour in scalar context is undefined. So who knows what value smartmatch is comparing. But I guess that whatever random value it is returning, it is (at least) returning the same random value for both of your lists. Which means they appear to be equal.
As you've said in a comment, it works when you save the sorted results in arrays and pass arrays to smartmatch. That's because arrays have special behaviours defined in the smartmatch table.
Arrays are not lists
Don't call sort() in scalar context
Update: As ThisSuitIsNotBlack mentions in the comments, smartmatch has been rather unstable since it was introduced in Perl 5.10. Its behaviour has been tweaked in pretty much every Perl release since then and its final form still isn't completely agreed. For that reason, I strongly discourage you from using it at all.

Related

Why does the type signature of linear array change compared to normal array?

I'm going through an example in A Taste of Linear Logic.
It first introduces the standard array with the usual operations defined (page 24):
Then suggests that a linear equivalent (using a linear logic for type signatures to restrict array copying) would have a slightly different type signature:
This is designed with the idea that array contains values that are cheap to copy but that the array itself is expensive to copy and thus should be passed along from use to use as a handle.
Question: The signatures for lookup and update correspond well to the standard signatures, but how do I interpret the signature for new?
In particular:
The function new does not seem to return an array. How can I get an array to use if one is not provided?
I think I do understand that Arr –o Arr x X is not derivable using linear logic and therefore a function to extract individual values without consuming the array is needed, but I don't understand why new doesn't provide that function directly
In practical terms, this is about garbage collection.
Linear logic avoids making copies as well as leaving unused values lying around. So when you create an array with new, you also need to make sure it's eventually cleaned up again.
How can you make sure it is cleaned up? Well, in this example they do it by not giving back the array as the result, but instead “lending” it to the caller. The function Arr ⊸ Arr ⊗ X must give an array back in the end, in addition to the result you're actually interested in. It's assumed that this will be a modified form of the array you started out with. Only the X is passed back to the caller, the Arr is deallocated.

Perl structure flow to C

I've started working on a program which is in Perl and has to be transformed into C.
There are a lot of subroutines which have structure member accessing which is unfamiliar to me, because I have little to no knowledge about Perl syntax and structure flow.
Example:
$ref->{$struct2[$value]->{field1}}->{struct_insideStruct2}->{$ref2->{field}}
$ref is a third structure
$ref2 is a local copy of a parameter which is of type struct1
My question is: How do you create a line like this in C?
Do I need to create nested multiple structures?
I need to understand how multiple access operators in Perl works and if I can create something similiar in C, thanks in advance.
I recommend to not try to directly translate between languages, as this likely results in a clumsy and unnatural code. That would certainly be the case here, as commented further down. The best I can do for this quest is to explain what the expression does
$ref -> { $struct2[$value]->{field1} }
-> { struct_insideStruct2 }
-> { $ref2->{field} }
The $ref is a reference to a hash (associative array); it's OK to think of it as a pointer to a hash. One can tell because the -> ("arrow") operator dereferences, and the {...} on its right means that on its left there must be a hash reference; this returns a value that it points to.
In this case, the key with which it is dereferenced (the index into the associative array) involves an element of the array #struct2 at index $value; that element is another hash reference, being dereferenced (indexed into) with a key field1 (string literal†).
What this returns is another hash reference, which is then indexed into (dereferenced) with the key struct_insideStruct2 (string), and this again returns a hash reference.
That last one is indexed with a key which itself is produced by dereferencing another hash reference, $ref2, with a key field (string).
This is an example of a Perl complex data structure. How do you like it? I don't, not very much. Even in Perl, ideally I'd like to see this rewritten as a class, as it goes too deep and wide and so it packs too much complexity without any supporting structure which a class can provide.
If you still wish to indeed and really do that kinda thing in C, you can. May want to find a good hash implementation (or use structs and nest them carefully), and probably to dust off your function pointer syntax and such. But I would recommend to not get into all that.
Instead, once you understand the deep-nested data structure explained above, and the data it represents, find a way to implement what it means and does in your code in a native C way. We always want to use logic, techniques, and idioms native to the language at hand.
Along with linked documentation also see the short and sweet perlintro. The full reference for Perl's references is perlref.
† Normally such "barewords" need be under quotes, 'string' (or using "", or q() or qq() operators ...). But if that is a sole thing between {} then the quoting may be omitted.

Minimal element of an array

An abstract question, not related to any particular language:
If I have a function as follows
min(int, int) :: int
which returns the smallest value in an array, and
concat([int], [int]) :: [int]
which combines two arrays, how should I write a function like
minInArray([int]) :: Int
which returns the smallest element in an array, but where the ouput could be chained like so, even with an empty input array:
min(minInArray(array1), minInArray(array2)) == minInArray(concat(array1, array2))
In other words, is there any commonly-used neutral element which minInArray could return on empty input, which wouldn't mess up min()?
One option would be to return some neutral value like null or NaN if the array has no elements, and then if the min() function is run and one of the arguments is the neutral value, then you just return the min of the other array. Another option would be to return the closest value the language has to +Infinity if the array is empty; this works and does not require modifying min(), but does have the side effect of returning an infinite value sometimes when the minInArray() function is called. This infinite value could work as a truly neutral value that works with the default min() function, but it may cause some confusion if the minimum value in an array really is infinite.
minInArray(arr1) to return null if arr1 is empty.
min() should return only non-null values over null. Meaning min() will only return null if both parameters are null. Otherwise, it will return the minimum non-null value.
While thinking about the issue we've come to seemingly the only solution possible:
if an array is empty - we should return the maximum possible value for int to satisfy the condition.
Not that nice actually...
Just to add some perspectives (not that this is a duplicate of the listed questions) -
All of these throw errors of some kind when asked to calculate min of an empty list or array: Java, Scala, Python, numpy, Javascript, C#. Probably more, but that's as far as I looked. I'm sure there are some that don't, but I'd expect most of those to be languages which have traded understandability and clarity for speed.
This question is about a specific language, but has answers relevant to all languages.
Note here how one can get around the issue in something like Python.
For Haskell in particular, note the advice in this question.
And lastly here's a response for a more general case of your question.
In general, it is always most important for code to work, but a close second to that is it must be understandable to humans. Perhaps it doesn't matter for your current project, if you'll be the only one dealing with that function, but the last thing I'd expect when calling a 'get_minimum' function, is Int.MAX.
I understand it makes the coding simple, but I'd urge you to beware of code that is easy to write and tricky to understand. A little more time spent making the code easy to read, with as much as possible having an immediately obvious meaning, will always save much more time later on.

Avoiding database when checking for existing values

Is there a way through hashes or bitwise operators or another algorithm to avoid using database when simply checking for previously appeared string or value?
Assuming, there is no way to store whole history of the strings appeared before, only little information can be stored.
You may be interested in Bloom filters. They don't let you authoritatively say, "yes, this value is in the set of interest", but they do let you say "yes, this value is probably in the set" vs. "no, this value definitely is not in the set". For many situations, that's enough to be useful.
The way it works is:
You create an array of Boolean values (i.e. of bits). The larger you can afford to make this array, the better.
You create a bunch of different hash functions that each take an input string and map it to one element of the array. You want these hash functions to be independent, so that even if one hash function maps two strings to the same element, a different hash function will most likely map them to different elements.
To record that a string is in the set, you apply each of your hash functions to it in turn — giving you a set of elements in the array — and you set all of the mapped-to elements to TRUE.
To check if a string is (probably) is in the set, you do the same thing, except that now you just check the mapped-to elements to see if they are TRUE. If all of them are TRUE, then the string is probably in the set; otherwise, it definitely isn't.
If you're interested in this approach, see https://en.wikipedia.org/wiki/Bloom_filter for detailed analysis that can help you tune the filter appropriately (choosing the right array-size and number of hash functions) to get useful probabilities.

Why are lists used infrequently in Go?

Is there a way to create an array/slice in Go without a hard-coded array size? Why is List ignored?
In all the languages I've worked with extensively: Delphi, C#, C++, Python - Lists are very important because they can be dynamically resized, as opposed to arrays.
In Golang, there is indeed a list.Liststruct, but I see very little documentation about it - whether in Go By Example or the three Go books that I have - Summerfield, Chisnal and Balbaert - they all spend a lot of time on arrays and slices and then skip to maps. In souce code examples I also find little or no use of list.List.
It also appears that, unlike Python, Range is not supported for List - big drawback IMO. Am I missing something?
Slices are lovely, but they still need to be based on an array with a hard-coded size. That's where List comes in.
Just about always when you are thinking of a list - use a slice instead in Go. Slices are dynamically re-sized. Underlying them is a contiguous slice of memory which can change size.
They are very flexible as you'll see if you read the SliceTricks wiki page.
Here is an excerpt :-
Copy
b = make([]T, len(a))
copy(b, a) // or b = append([]T(nil), a...)
Cut
a = append(a[:i], a[j:]...)
Delete
a = append(a[:i], a[i+1:]...) // or a = a[:i+copy(a[i:], a[i+1:])]
Delete without preserving order
a[i], a = a[len(a)-1], a[:len(a)-1]
Pop
x, a = a[len(a)-1], a[:len(a)-1]
Push
a = append(a, x)
Update: Here is a link to a blog post all about slices from the go team itself, which does a good job of explaining the relationship between slices and arrays and slice internals.
I asked this question a few months ago, when I first started investigating Go. Since then, every day I have been reading about Go, and coding in Go.
Because I did not receive a clear-cut answer to this question (although I had accepted one answer) I'm now going to answer it myself, based on what I have learned, since I asked it:
Is there a way to create an array /slice in Go without a hard coded
array size?
Yes. Slices do not require a hard coded array to slice from:
var sl []int = make([]int, len, cap)
This code allocates slice sl, of size len with a capacity of cap - len and cap are variables that can be assigned at runtime.
Why is list.List ignored?
It appears the main reasons list.List seem to get little attention in Go are:
As has been explained in #Nick Craig-Wood's answer, there is
virtually nothing that can be done with lists that cannot be done
with slices, often more efficiently and with a cleaner, more
elegant syntax. For example the range construct:
for i := range sl {
sl[i] = i
}
cannot be used with list - a C style for loop is required. And in
many cases, C++ collection style syntax must be used with lists:
push_back etc.
Perhaps more importantly, list.List is not strongly typed - it is very similar to Python's lists and dictionaries, which allow for mixing various types together in the collection. This seems to run contrary
to the Go approach to things. Go is a very strongly typed language - for example, implicit type conversions never allowed in Go, even an upCast from int to int64 must be
explicit. But all the methods for list.List take empty interfaces -
anything goes.
One of the reasons that I abandoned Python and moved to Go is because
of this sort of weakness in Python's type system, although Python
claims to be "strongly typed" (IMO it isn't). Go'slist.Listseems to
be a sort of "mongrel", born of C++'s vector<T> and Python's
List(), and is perhaps a bit out of place in Go itself.
It would not surprise me if at some point in the not too distant future, we find list.List deprecated in Go, although perhaps it will remain, to accommodate those rare situations where, even using good design practices, a problem can best be solved with a collection that holds various types. Or perhaps it's there to provide a "bridge" for C family developers to get comfortable with Go before they learn the nuances of slices, which are unique to Go, AFAIK. (In some respects slices seem similar to stream classes in C++ or Delphi, but not entirely.)
Although coming from a Delphi/C++/Python background, in my initial exposure to Go I found list.List to be more familiar than Go's slices, as I have become more comfortable with Go, I have gone back and changed all my lists to slices. I haven't found anything yet that slice and/or map do not allow me to do, such that I need to use list.List.
I think that's because there's not much to say about them as the container/list package is rather self-explanatory once you absorbed what is the chief Go idiom for working with generic data.
In Delphi (without generics) or in C you would store pointers or TObjects in the list, and then cast them back to their real types when obtaining from the list. In C++ STL lists are templates and hence parameterized by type, and in C# (these days) lists are generic.
In Go, container/list stores values of type interface{} which is a special type capable to represent values of any other (real) type—by storing a pair of pointers: one to the type info of the contained value, and a pointer to the value (or the value directly, if it's size is no greater than the size of a pointer). So when you want to add an element to the list, you just do that as function parameters of type interface{} accept values coo any type. But when you extract values from the list, and what to work with their real types you have to either type-asert them or do a type switch on them—both approaches are just different ways to do essentially the same thing.
Here is an example taken from here:
package main
import ("fmt" ; "container/list")
func main() {
var x list.List
x.PushBack(1)
x.PushBack(2)
x.PushBack(3)
for e := x.Front(); e != nil; e=e.Next() {
fmt.Println(e.Value.(int))
}
}
Here we obtain an element's value using e.Value() and then type-assert it as int a type of the original inserted value.
You can read up on type assertions and type switches in "Effective Go" or any other introduction book. The container/list package's documentation summaries all the methods lists support.
Note that Go slices can be expanded via the append() builtin function. While this will sometimes require making a copy of the backing array, it won't happen every time, since Go will over-size the new array giving it a larger capacity than the reported length. This means that a subsequent append operation can be completed without another data copy.
While you do end up with more data copies than with equivalent code implemented with linked lists, you remove the need to allocate elements in the list individually and the need to update the Next pointers. For many uses the array based implementation provides better or good enough performance, so that is what is emphasised in the language. Interestingly, Python's standard list type is also array backed and has similar performance characteristics when appending values.
That said, there are cases where linked lists are a better choice (e.g. when you need to insert or remove elements from the start/middle of a long list), and that is why a standard library implementation is provided. I guess they didn't add any special language features to work with them because these cases are less common than those where slices are used.
From: https://groups.google.com/forum/#!msg/golang-nuts/mPKCoYNwsoU/tLefhE7tQjMJ
It depends a lot on the number of elements in your lists,
whether a true list or a slice will be more efficient
when you need to do many deletions in the 'middle' of the list.
#1
The more elements, the less attractive a slice becomes.
#2
When the ordering of the elements isn't important,
it is most efficient to use a slice and
deleting an element by replacing it by the last element in the slice and
reslicing the slice to shrink the len by 1
(as explained in the SliceTricks wiki)
So
use slice
1. If order of elements in list is Not important, and you need to delete, just
use List swap the element to delete with last element, & re-slice to (length-1)
2. when elements are more (whatever more means)
There are ways to mitigate the deletion problem --
e.g. the swap trick you mentioned or
just marking the elements as logically deleted.
But it's impossible to mitigate the problem of slowness of walking linked lists.
So
use slice
1. If you need speed in traversal
Unless the slice is updated way too often (delete, add elements at random locations) the memory contiguity of slices will offer excellent cache hit ratio compared to linked lists.
Scott Meyer's talk on the importance of cache..
https://www.youtube.com/watch?v=WDIkqP4JbkE
list.List is implemented as a doubly linked list. Array-based lists (vectors in C++, or slices in golang) are better choice than linked lists in most conditions if you don't frequently insert into the middle of the list. The amortized time complexity for append is O(1) for both array list and linked list even though array list has to extend the capacity and copy over existing values. Array lists have faster random access, smaller memory footprint, and more importantly friendly to garbage collector because of no pointers inside the data structure.

Resources