Using Array[Boolean] in Scala to find out progress of foreach - arrays

I have a class in Scala that has a method to perform a bunch of calculations sequentially using foreach on a list which is provided in the constructor. The class has a field val progress: Array[Boolean] = list.map(_ => false).toArray. Some of these calculations can take a long time so at the end of each one I set the appropriate index in progress to true. Then I can get progress to determine where I am in the calculations from outside the class.
This does not seem like the best approach in Scala (because I'm using a mutable data structure) so any advice to improve it would be much appreciated.

I don't think your approach is bad. The alternative is to use a var progress: List[Boolean] as an immutable data structure and have a long list of immutable lists pointed at by that variable. You don't really gain anything, you lose the ability to reserve the exact memory you will need in a single step and memory allocation is going to make this slower.
There is a reason why mutable data structures exist and that is because they are incredibly useful and very needed, same as why you can still define var instead of val, the important piece is not that one is "bad" and the other "good", it is a matter of knowing when you can use val and sacrifice mutability in exchange for security. In your example you just can't.
Side note: Instead of using
val progress: Array[Boolean] = list.map(_ => false).toArray
This is much clearer and faster IMHO:
val progress = Array.fill(list.size)(false)

Well, it depends on what you want to do with that information. If you are interested in specific events (e.g., 50% done or something like that), you could pass a listener into your foreach method and ask to be notified. But if you really need to inquire about the current state at any time, then ... well, if you need to know the state, then you have to keep the state, there is no way around that :)
Array of booleans seems to be an overkill (you could just keep the current index instead), but you mentioned that you were planning to keep se additional info around as well, so, it looks reasonable.

Related

Iterating for `setindex!`

I have some specially-defined arrays in Julia which you can think of being just a composition of many arrays. For example:
type CompositeArray{T}
x::Vector{T}
y::Vector{T}
end
with an indexing scheme
getindex(c::CompositeArray,i::Int) = i <= length(c) ? c.x[i] : c.y[i-length(c.x)]
I do have one caveat: the higher indexing scheme just goes to x itself:
getindex(c::CompositeArray,i::Int...) = c.x[i...]
Now the iterator through these can easily be made as the chain of the iterator on x and then on y. This makes iterating through the values have almost no extra cost. However, can something similar be done for iteration to setindex!?
I was thinking of having a separate dispatch on CartesianIndex{2} just for indexing x vs y and the index, and building an eachindex iterator for that, similar to what CatViews.jl does. However, I'm not certain how that will interact with the i... dispatch, or whether it will be useful in this case.
In addition, will broadcasting automatically use this fast iteration scheme if it's built on eachindex?
Edits:
length(c::CompositeArray) = length(c.x) + length(c.y)
In the real case, x can be any AbstractArray (and thus has a linear index), but since only the linear indexing is used (except for that one user-facing getindex function), the problem really boils down to finding out how to do this with x a Vector.
Making X[CartesianIndex(2,1)] mean something different from X[2,1] is certainly not going to end well. And I would expect similar troubles from the fact that X[100,1] may mean something different from X[100] or if length(X) != prod(size(X)). You're free to break the rules, but you shouldn't be surprised when functions in Base and other packages expect you to follow them.
The safe way to do this would be to make eachindex(::CompositeArray) return a custom iterator over objects that you control entirely. Maybe just throw a wrapper around and forward methods to CartesianRange and CartesianIndex{2} if that data structure is helpful. Then when you get one of these custom index types, you know that SplitIndex(CartesianIndex(1,2)) is indeed intending to refer to the first element in the second array.

Ruby: Hash, Arrays and Objects for storage information

I am learning Ruby, reading few books, tutorials, foruns and so one... so, I am brand new to this.
I am trying to develop a stock system so I can learn doing.
My questions are the following:
I created the following to store transactions: (just few parts of the code)
transactions.push type: "BUY", date: Date.strptime(date.to_s, '%d/%m/%Y'), quantity: quantity, price: price.to_money(:BRL), fees: fees.to_money(:BRL)
And one colleague here suggested to create a Transaction class to store this.
So, for the next storage information that I had, I did:
#dividends_from_stock << DividendsFromStock.new(row["Approved"], row["Value"], row["Type"], row["Last Day With"], row["Payment Day"])
Now, FIRST question: which way is better? Hash in Array or Object in Array? And why?
This #dividends_from_stock is returned by the method 'dividends'.
I want to find all the dividends that were paid above a specific date:
puts ciel3.dividends.find_all {|dividend| Date.parse(dividend.last_day_with) > Date.parse('12/05/2014')}
I get the following:
#<DividendsFromStock:0x2785e60>
#<DividendsFromStock:0x2785410>
#<DividendsFromStock:0x2784a68>
#<DividendsFromStock:0x27840c0>
#<DividendsFromStock:0x1ec91f8>
#<DividendsFromStock:0x2797ce0>
#<DividendsFromStock:0x2797338>
#<DividendsFromStock:0x2796990>
Ok with this I am able to spot (I think) all the objects that has date higher than the 12/05/2014. But (SECOND question) how can I get the information regarding the 'value' (or other information) stored inside the objects?
Generally it is always better to define classes. Classes have names. They will help you understand what is going on when your program gets big. You can always see the class of each variable like this: var.class. If you use hashes everywhere, you will be confused because these calls will always return Hash. But if you define classes for things, you will see your class names.
Define methods in your classes that return the information you need. If you define a method called to_s, Ruby will call it behind the scenes on the object when you print it or use it in an interpolation (puts "Some #{var} here").
You probably want a first-class model of some kind to represent the concept of a trade/transaction and a list of transactions that serves as a ledger.
I'd advise steering closer to a database for this instead of manipulating toy objects in memory. Sequel can be a pretty simple ORM if used minimally, but ActiveRecord is often a lot more beginner friendly and has fewer sharp edges.
Using naked hashes or arrays is good for prototyping and seeing if something works in principle. Beyond that it's important to give things proper classes so you can relate them properly and start to refine how these things fit together.
I'd even start with TransactionHistory being a class derived from Array where you get all that functionality for free, then can go and add on custom things as necessary.
For example, you have a pretty gnarly interface to DividendsFromStock which could be cleaned up by having that format of row be accepted to the initialize function as-is.
Don't forget to write a to_s or inspect method for any custom classes you want to be able to print or have a look at. These are usually super simple to write and come in very handy when debugging.
thank you!
I will answer my question, based on the information provided by tadman and Ilya Vassilevsky (and also B. Seven).
1- It is better to create a class, and the objects. It will help me organize my code, and debug. Localize who is who and doing what. Also seems better to use with DB.
2- I am a little bit shamed with my question after figure out the solution. It is far simpler than I was thinking. Just needed two steps:
willpay = ciel3.dividends.find_all {|dividend| Date.parse(dividend.last_day_with) > Date.parse('10/09/2015')}
willpay.each do |dividend|
puts "#{ciel3.code} has approved #{dividend.type} on #{dividend.approved} and will pay by #{dividend.payment_day} the value of #{dividend.value.format} per share, for those that had the asset on #{dividend.last_day_with}"
puts
end

Optional array vs. empty array in Swift

I have a simple Person class in Swift that looks about like this:
class Person {
var name = "John Doe"
var age = 18
var children = [Person]?
\\ init function goes here, but does not initialize children array
}
Instead of declaring children to be an optional array, I could simply declare it and initialize it as an empty array like this:
var children = [Person]()
I am trying to decide which approach is better. Declaring the array as an optional array means that it will not take up any memory at all, whereas an empty array has at least some memory allocated for it, correct? So using the optional array means that there will be at least some memory saving. I guess my first question is: Is there really any actual memory saving involved here, or are my assumptions about this incorrect?
On the other hand, if it is optional then each time I try to use it I will have to check to see if it is nil or not before adding or removing objects from it. So there will be be some loss of efficiency there (but not much, I imagine).
I kind of like the optional approach. Not every Person will have children, so why not let children be nil until the Person decides to settle down and raise a family?
At any rate, I would like to know if there are any other specific advantages or disadvantages to one approach or the other. It is a design question that will come up over and over again.
I'm going to make the opposite case from Yordi - an empty array just as clearly says "this Person has no children", and will save you a ton of hassle. children.isEmpty is an easy check for the existence of kids, and you won't ever have to unwrap or worry about an unexpected nil.
Also, as a note, declaring something as optional doesn't mean it takes zero space - it's the .None case of an Optional<Array<Person>>.
The ability to choose between an empty array or an optional gives us the ability to apply the one that better describe the data from a semantic point of view.
I would choose:
An empty array if the list can be empty, but it's a transient status and in the end it should have at least one element. Being non optional makes clear that the array should not be empty
An optional if it's possible for the list to be empty for the entire life cycle of the container entity. Being an optional makes clear that the array can be empty
Let me make some examples:
Purchase order with master and details (one detail per product): a purchase order can have 0 details, but that's a transient status, because it wouldn't make sense having a purchase order with 0 products
Person with children: a person can have no children for his entire life. It is not a transient status (although not permanent as well), but using an optional it's clear that it's legit for a person to have no children.
Note that my opinion is only about making the code more clear and self-explainatory - I don't think there is any significant difference in terms of performance, memory usage, etc. for choosing one option or the other.
Interestingly enough, we have recently had few discussions regarding this very same question at work.
Some suggest that there are subtle semantic differences. E.g. nil means a person has no children whatsoever, but then what does 0 mean? Does it mean "has children, the whole 0 of them"? Like I said, pure semantics "has 0 children" and "has no children" makes no difference when working with this model in code. In that case why not choosing more straightforwards and less guard-let-?-y approach?
Some suggest that keeping a nil there may be an indication that, for example, when fetching model from backend something went wrong and we got error instead of children. But I think model should not try to have this type of semantics and nil should not be used as indication of some error in the past.
I personally think that the model should be as dumb as possible and the dumbest option in this case is empty array.
Having an optional will make you drag that ? until the end of days and use guard let, if let or ?? over and over again.
You will have to have extra unwrapping logic for NSCoding implementation, you will have to do person.children?.count ?? 0 instead of straightforward person.children.count when you display that model in any view controller.
The final goal of all that manipulation is to display something on UI.
Would you really say
"This person has no children" and "This person has 0 children" for nil and empty array correspondingly? I hope you would not :)
Last Straw
Finally, and this is really the strongest argument I have
What is the type of subviews property of UIView: it's var subviews: [UIView] { get }
What is the type of children property of SKNode: it's var children: [SKNode] { get }
There's tons of examples like this in Cocoa framework: UIViewController::childViewControllers and more.
Even from pure Swift world: Dictionary::keys though this may be a bit far fetched.
Why is it OK for person to have nil children, but not for SKNode? For me the analogy is perfect. Hey, even the SKNode's method name is children :)
My view: there must be an obvious reason for keeping those arrays as optionals, like a really good one, otherwise empty array offers same semantics with less unwrapping.
The Last Last Straw
Finally, some references to very good articles, each of those
http://www.theswiftlearner.com/2015/05/08/empty-or-optional-arrays/
https://www.natashatherobot.com/ios-optional-vs-empty-data-source-swift/
In Natasha's post, you will find a link to NSHipster's blog post and in Swiftification paragraph you can read this:
For example, instead of marking NSArray return values as nullable, many APIs have been modified to return an empty array—semantically these have the same value (i.e., nothing), but a non-optional array is far simpler to work with
Sometimes there's a difference between something not existing and being empty.
Let's say we have an app where a user can modify a list of phone numbers and we save said modifications as modifiedPhoneNumberList. If no modification has ever occurred the array should be nil. If the user has modified the parsed numbers by deleting them all the array should be empty.
Empty means we're going to delete all the existing phone numbers, nil means we keep all the existing phone numbers. The difference matters here.
When we can't differentiate between a property being empty or not existing or it doesn't matter empty is the way to go. If a Person were to lose their only child we should simply have to remove that child and have an empty array rather than have to check if the count is 1 then set the entire array to nil.
I always use empty arrays.
In my humble opinion, the most important purpose of optionals in Swift is to safely wrap some value that may be nil. An array already act as this type of wrapper - you can ask the array if it has anything inside & access its value(s) safely with for loops, mapping, etc. Do we need to put a wrapper within a wrapper? I don't think so.
Swift is designed to take advantage of optional value's and optional unwrapping.
You could also declare the array as nil, as it will save you a very small (almost not noticable) amount of memory.
I would go with an optional array instead of an array that represents a nil value to keep Swift's Design Patterns happy :)
I also think
if let children = children {
}
looks nicer than :
if(children != nil){
}

Worthwhile to convert ArrayList to array for performance?

I recently discovered the ArrayList's "toArray() : Object[]" method and I wonder if I should leverage that to increase performance.
I know the ArrayList is a more expensive than an Object[] array, but it of course is more convenient since it automatically resizes. But I figure maybe if I use the ArrayList to build my object list, I can turn it into an Object[] array when I do the more intense data operations.
Is it good practice to turn ArrayList into any array Object[] after the ArrayList is finished building? Sometimes if I am looping through 200K object ArrayLists and comparing it against another ArrayList of objects, it takes awhile. Would I benefit performance-wise?
First of all you should never make assumptions about performance. You need to measure it. I've just proven my "feeling" wrong. I was quite sure that the performance of ArrayList vs Object[] would be very much comparable. With initialCapacity set, ArrayList is just a simple wrapper on an array. And those wrapper methods are surely inlined by the JVM.
Turns out I was wrong. I wrote a simple test to get some real numbers. And on my machine (Oracle Java 7 64bit, Linux) the numbers are:
ArrayList write: 105.8ms
String[] write: 39.8ms
ArrayList read: 64.1ms
String[] read: 40.9ms
So ArrayList is about 100% slower on set() and 50% slower on get().
That was without autoboxing kicking in. I also run a test on ArrayList<Integer> vs int[]:
ArrayList<Integer> write: 2660.0ms
int[] write: 27.5ms
ArrayList<Integer> read: 59.5ms
int[] read: 20.2ms
GC did not run during the String[] test, but during int[] test it run about 25 times.
Nevertheless copying ArrayList to Object[] just to sort does not make sense:
14.98ms Collections.sort(list)
15.32ms Arrays.sort(list.toArray(new String[list.size()]));
To complete previous anwsers, when performance matters, it is also a good practice to look at the real implementation
of ArrayList and analyse the particular operations you will use the most. You can look at the overhead in get and set operations (here overhead is induced by type genericity, range checking and func call). This will help you to decides if it worse the work.
(As a general remark , do not optimize everywhere, but only the few points where it really matters)
It really depends on the situation. Sometimes it's better to convert it, otherwise it isn't. You have to weigh the pros and cons against eachother - if you have to do a lot of actions on the object than an ArrayList can be the best solution. Surely because other people in your project might depend on an ArrayList.
Otherwise, if you know that developping the application won't be harmed by using Array[] (meaning no one else depends on it, and you don't mind doing some extra work writing your own sort etc) than an Array[] can be the best choice.
Array[] = more performant for the program
ArrayList -> Easier to write / Saves time.
It all depends on what you want to do :)

Is it correct to return IndexesSeq instead of Array if an immutable array is needed in Scala?

A function of mine produces an array - an ordered, contiguously numbered set of records. But as far as I know Scala Array is a mutable collection, while functional approach suggests it would make more sense to return an immutable collection in a general case. So I just call Array.toIndexedSeq to return an IndexedSeq instead of Array. Can this be considered a correct thing to do? Doesn't it introduce any inobvious behaviour which can influence the function and the result usage and be probably considered undesirable? Are there any better practices for the issue?
Can this be considered a correct thing to do?
Yes.
Doesn't it introduce any inobvious behaviour which can influence the
function and the result usage and be probably considered undesirable?
No, not that I know of.
Are there any better practices for the issue?
If possible, try to eliminate the use of array altogether, unless of course the performance is paramount.
The only possibly non-obvious thing I can think of is that array.toIndexedSeq doesn't create a simple wrapper over the array itself like Java's Collections.unmodifiable* methods do, but copies the elements into a new collection. (Otherwise later changes in the array could cause the "immutable" sequence to suddenly mutate.)

Resources