How to optimize a list search in SML/NJ? - arrays

I am writing a piece of code in SML/NJ and need, at some point, to access a list, which I have already created. I know that, in C, for example, accessing an array takes constant time. So, I thought that that would be the case for ML as well. Apparently, however, the built-in List.nth(l,i) function has a complexity that is linear to the size of the list given as argument.
I then turned to arrays, but I think that the Array.sub function has, also, linear complexity.
So, given the fact the accesing a tuple, like #2(12,5.6,"foo"), has an O(1) complexity, I would like to ask whether there is a way, I could use a tuple, instead of a list,but access it dynamically.
For example, say I want to write a function that takes a tuple, with only boolean values, and an integer n, and returns True if the n-th element of the tuple is true.Something like:
fun isTrue (n,tup) =
if #n(tup) then true
else false;
I know this isn't valid SML, so is there a way to write such a function?
Thanks a lot in advance for your time!

The sml function has O(1) complexity, so feel free to use it!
e.x.
`fun isTrue (n,tup) =
if Array.sub(tup,n) then true
else false;`
As far as the tuple is concerned, you can only use a specific number, not a variable in a tuple.
e.x.
fun isTrue (n,tup) =
if #2 (tup) then true
else false;

Related

Does Ruby recalculate array size every time when you call `array.size`?

Could someone provide some proofs, whether Ruby recalculates array size every time when you call array.size, array.length or array.count?
Thanks in advance.
Update
To make things clearer, by recalculate I mean, whether Ruby needs to loop through the whole array again and again to calculate the number of its elements every time when we call array.size.
Pragmatically Speaking, Array#length is Dynamic in Ruby
Your question can't really be answered canonically, because the lookup and storage implementations of arrays is often platform- and VM-specific. However, as a practical matter, from the Ruby interpreter's persepctive the answer is yes because each call sends a message to an Array object, asking it to return its current length.
Some languages store the current length of the array as an element of the array itself. Other approaches exist, too. In Ruby 2.7.1:
static VALUE
rb_ary_length(VALUE ary)
{
long len = RARRAY_LEN(ary);
return LONG2NUM(len);
}
the C implementation appears to retrieve the stored length of the array at the time of the call, but you'd have to dig deeper into the source code if you want to understand all the ins-and-outs of how the VM optimizes this (or not).

Iterating for `setindex!`

I have some specially-defined arrays in Julia which you can think of being just a composition of many arrays. For example:
type CompositeArray{T}
x::Vector{T}
y::Vector{T}
end
with an indexing scheme
getindex(c::CompositeArray,i::Int) = i <= length(c) ? c.x[i] : c.y[i-length(c.x)]
I do have one caveat: the higher indexing scheme just goes to x itself:
getindex(c::CompositeArray,i::Int...) = c.x[i...]
Now the iterator through these can easily be made as the chain of the iterator on x and then on y. This makes iterating through the values have almost no extra cost. However, can something similar be done for iteration to setindex!?
I was thinking of having a separate dispatch on CartesianIndex{2} just for indexing x vs y and the index, and building an eachindex iterator for that, similar to what CatViews.jl does. However, I'm not certain how that will interact with the i... dispatch, or whether it will be useful in this case.
In addition, will broadcasting automatically use this fast iteration scheme if it's built on eachindex?
Edits:
length(c::CompositeArray) = length(c.x) + length(c.y)
In the real case, x can be any AbstractArray (and thus has a linear index), but since only the linear indexing is used (except for that one user-facing getindex function), the problem really boils down to finding out how to do this with x a Vector.
Making X[CartesianIndex(2,1)] mean something different from X[2,1] is certainly not going to end well. And I would expect similar troubles from the fact that X[100,1] may mean something different from X[100] or if length(X) != prod(size(X)). You're free to break the rules, but you shouldn't be surprised when functions in Base and other packages expect you to follow them.
The safe way to do this would be to make eachindex(::CompositeArray) return a custom iterator over objects that you control entirely. Maybe just throw a wrapper around and forward methods to CartesianRange and CartesianIndex{2} if that data structure is helpful. Then when you get one of these custom index types, you know that SplitIndex(CartesianIndex(1,2)) is indeed intending to refer to the first element in the second array.

Find the first "1" in zero array

I have an array "0000011111"
I need to find the first occurrence of "1".
How can I do that in efficient way ?
my solution is: (I think there is a better way)
$array = array(0,0,1,1,1);
for($i=0;$i<count($array);$i++)
{
if($array[$i] == 1)
{
var_dump($i);
return;
}
}
Your solution is already as efficient as possible, but there's a built-in method in PHP that will do this for you:
$array = array(0,0,1,1,1);
var_dump(array_search(1, $array)); // int(2)
Note that array_search will return the boolean FALSE in the case where there are no 1s in the array.
EDIT
I made the assumption that the original code is PHP just because it looked that way. :-)
Unfortunately, since there is no necessity for any of the numbers to be "1" and since you are only going through the array once, this is the most efficient solution. Binary search or any such algorithm wont work as this array is quite obviously not sorted.
Sample inputs:
0101101
1000101
In either case, binary search would not work.
If you can somehow convert the array efficiently to a number, It is possible to find the first 1 efficiently with log base 2.
var number = 0b010000010;
console.log(Math.floor(Math.log2(number)))
EDIT The main reason for doing this is because there are hardware instructions for doing log base 2 that make it constant time.
Of course if you cannot store your array as a binary string, because it is too long or something like that, this solution is not for you.

Array.isDefinedAt for n-dimensional arrays in scala

Is there an elegant way to express
val a = Array.fill(2,10) {1}
def do_to_elt(i:Int,j:Int) {
if (a.isDefinedAt(i) && a(i).isDefinedAt(j)) f(a(i)(j))
}
in scala?
I recommend that you not use arrays of arrays for 2D arrays, for three main reasons. First, it allows inconsistency: not all columns (or rows, take your pick) need to be the same size. Second, it is inefficient--you have to follow two pointers instead of one. Third, very few library functions exist that work transparently and usefully on arrays of arrays as 2D arrays.
Given these things, you should either use a library that supports 2D arrays, like scalala, or you should write your own. If you do the latter, among other things, this problem magically goes away.
So in terms of elegance: no, there isn't a way. But beyond that, the path you're starting on contains lots of inelegance; you would probably do best to step off of it quickly.
You just need to check the array at index i with isDefinedAt if it exists:
def do_to_elt(i:Int, j:Int): Unit =
if (a.isDefinedAt(i) && a(i).isDefinedAt(j)) f(a(i)(j))
EDIT: Missed that part about the elegant solution as I focused on the error in the code before your edit.
Concerning elegance: no, per se there is no way to express it in a more elegant way. Some might tell you to use the pimp-my-library-Pattern to make it look more elegant but in fact it does not in this case.
If your only use case is to execute a function with an element of a multidimensional array when the indices are valid then this code does that and you should use it. You could generalize the method by changing the signature of to take the function to apply to the element and maybe a value if the indices are invalid like this:
def do_to_elt[A](i: Int, j: Int)(f: Int => A, g: => A = ()) =
if (a.isDefinedAt(i) && a(i).isDefinedAt(j)) f(a(i)(j)) else g
but I would not change anything beyond this. This also does not look more elegant but widens your use case.
(Also: If you are working with arrays you mostly do that for performance reasons and in that case it might even be better to not use isDefinedAt but perform validity checks based on the length of the arrays.)

efficient sort with custom comparison, but no callback function

I have a need for an efficient sort that doesn't have a callback, but is as customizable as using qsort(). What I want is for it to work like an iterator, where it continuously calls into the sort API in a loop until it is done, doing the comparison in the loop rather than off in a callback function. This way the custom comparison is local to the calling function (and therefore has access to local variables, is potentially more efficient, etc). I have implemented this for an inefficient selection sort, but need it to be efficient, so prefer a quick sort derivative.
Has anyone done anything like this? I tried to do it for quick sort, but trying to turn the algorithm inside out hurt my brain too much.
Below is how it might look in use.
// the array of data we are sorting
MyData array[5000], *firstP, *secondP;
// (assume data is filled in)
Sorter sorter;
// initialize sorter
int result = sortInit (&sorter, array, 5000,
(void **)&firstP, (void **)&secondP, sizeof(MyData));
// loop until complete
while (sortIteration (&sorter, result) == 0) {
// here's where we do the custom comparison...here we
// just sort by member "value" but we could do anything
result = firstP->value - secondP->value;
}
Turning the sort function inside out as you propose isn't likely to make it faster. You're trading indirection on the comparison function for indirection on the item pointers.
It appears you want your comparison function to have access to state information. The quick-n-dirty way to create global variables or a global structure, assuming you don't have more than one thread going at once. The qsort function won't return until all the data is sorted, so in a single threaded environment this should be safe.
The only other thing I would suggest is to locate a source to qsort and modify it to take an extra parameter, a pointer to your state structure. You can then pass this pointer into your comparison function.
Take an existing implementation of qsort and update it to reference the Sorter object for its local variables. Instead of calling a compare function passed in, it would update its state and return to the caller.
Because of recursion in qsort, you'll need to keep some sort of a state stack in your Sorter object. You could accomplish that with an array or a linked-list using dynamic allocation (less efficient). Since most qsort implementations use tail recursion for the larger half and make a recursive call to qsort for the smaller half of the pivot point, you can sort at least 2n elements if your array can hold n states.
A simple solution is to use a inlineble sort function and a inlineble compare callback. When compiled with optimisation, both call get flatten into each other exactly like you want. The only downside is that your choice of sort algorithm is limited because if you recurse or alloc more memory you potentially lose any benefit from doing this. Method with small overhead, like this, work best with small data set.
You can use generic sort function with compare method, size, offset and stride.This way custom comparison can be done by parameter rather then callback. With this way you can use any algorithm. Just use some macro to fill in the most common case because you will have a lot of function argument.
Also, check out the STB library (https://github.com/nothings/stb).
It has sorting function similar to this among many other useful C tools.
What you're asking for has already been done -- it's called std::sort, and it's already in the C++ standard library. Better support for this (among many other things) is part of why well-written C++ is generally faster than C.
You could write a preprocessor macro to output a sort routine, and have the macro take a comparison expression as an argument.
#define GENERATE_SORT(name, type, comparison_expression) \
void name(type* begin, type* end) \
{ /* ... when needed, fill a and b and use comparison_expression */ }
GENERATE_SORT(sort_ints, (*a<*b))
void foo()
{
int array[10];
sort_ints(array, array+10);
}
Two points. I).
_asm
II). basic design limits of compilers.
Compilers have, as a basic purpose, the design goal of avoiding assembler or machine code. They achieve this by imposing certain limits. In this case, we give up a flexibility that we can easily do in assembly code. i.e. split the generated code of the sort into two pieces at the call to the compare function. copy the first half to somewhere. next copy the generated code of the compare function to there, just after the previous copied code of the first part. then copy the last half of the sort code. Finally, we have to deal with a whole series of minor details. See also the concept of "hot patching" running programs.

Resources