Iterate through multi map in Java - loops

I have a multimap in where my key is a String and the values are Integers. I would like to iterate through all those Integers, in order to calculate the mean value of them, for finally, just store the key and the mean value.
This is what I have written at the moment
int visits = 0;
for (String key : result.keys()) {
Object[] val = result.get(key).toArray();
for (int i=0; i<val.length; i++){
visits+=(Integer)val[i];
}
visits=visits/val.length;
result.removeAll(key);
result.put(key, visits);
}
But I'm getting this error
at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
at java.util.AbstractList$Itr.next(AbstractList.java:343)
at com.google.common.collect.AbstractMapBasedMultimap$Itr.next(AbstractMapBasedMultimap.java:1150)
at com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48)
at subset.calcMax.meanCalc(calcMax.java:147)
at subset.calcMax.main(calcMax.java:208)
it points to the line for (String key : result.keys()) but the error is not in this iteration, because if I delete what is in the for loop it works. So my problem is in the iteration through the values that are for each key.
I would appreciate your help.
Thanks in advance!

As explained in the comments, collections throw ConcurrentModificationExceptions when modified while being iterated. It's bad practice to mutate the source collection anyway, so you're better off creating a new collection and returning that.
I would write:
ImmutableMultimap<String, Integer> computeMeanVisits(Multimap<String, Integer> multimap) {
ImmutableMultimap.Builder<String, Integer> result = ImmutableMultimap.builder();
for (String key : multimap.keySet()) {
Collection<Integer> values = multimap.get(key);
result.put(key, mean(values));
}
return result.build();
}
int mean(Collection<Integer> values) {
int sum = 0;
for (Integer value : values) {
sum += value;
}
return sum / values.size();
}
As an aside:
I don't like your use of .toArray() to iterate on the values. In Java, it's usually preferred to manipulate collections directly. Direct Array manipulations should be reserved for very specific, high performance code, or when you have to deal with bad APIs that only accept arrays. Note that in your example, using .toArray() also makes you lose genericity, forcing you to cast each value to an Integer.
you should use the .keySet() method instead of the keys() method, which returns a Multiset. When iterating over this Multiset, keys associated with multiple values will appear multiple times.
your "visits" variable is not reset to 0 before computing a new mean

Related

Iterating through a list of mongoose documents returns 0 [duplicate]

I've been told not to use for...in with arrays in JavaScript. Why not?
The reason is that one construct:
var a = []; // Create a new empty array.
a[5] = 5; // Perfectly legal JavaScript that resizes the array.
for (var i = 0; i < a.length; i++) {
// Iterate over numeric indexes from 0 to 5, as everyone expects.
console.log(a[i]);
}
/* Will display:
undefined
undefined
undefined
undefined
undefined
5
*/
can sometimes be totally different from the other:
var a = [];
a[5] = 5;
for (var x in a) {
// Shows only the explicitly set index of "5", and ignores 0-4
console.log(x);
}
/* Will display:
5
*/
Also consider that JavaScript libraries might do things like this, which will affect any array you create:
// Somewhere deep in your JavaScript library...
Array.prototype.foo = 1;
// Now you have no idea what the below code will do.
var a = [1, 2, 3, 4, 5];
for (var x in a){
// Now foo is a part of EVERY array and
// will show up here as a value of 'x'.
console.log(x);
}
/* Will display:
0
1
2
3
4
foo
*/
The for-in statement by itself is not a "bad practice", however it can be mis-used, for example, to iterate over arrays or array-like objects.
The purpose of the for-in statement is to enumerate over object properties. This statement will go up in the prototype chain, also enumerating over inherited properties, a thing that sometimes is not desired.
Also, the order of iteration is not guaranteed by the spec., meaning that if you want to "iterate" an array object, with this statement you cannot be sure that the properties (array indexes) will be visited in the numeric order.
For example, in JScript (IE <= 8), the order of enumeration even on Array objects is defined as the properties were created:
var array = [];
array[2] = 'c';
array[1] = 'b';
array[0] = 'a';
for (var p in array) {
//... p will be "2", "1" and "0" on IE
}
Also, speaking about inherited properties, if you, for example, extend the Array.prototype object (like some libraries as MooTools do), that properties will be also enumerated:
Array.prototype.last = function () { return this[this.length-1]; };
for (var p in []) { // an empty array
// last will be enumerated
}
As I said before to iterate over arrays or array-like objects, the best thing is to use a sequential loop, such as a plain-old for/while loop.
When you want to enumerate only the own properties of an object (the ones that aren't inherited), you can use the hasOwnProperty method:
for (var prop in obj) {
if (obj.hasOwnProperty(prop)) {
// prop is not inherited
}
}
And some people even recommend calling the method directly from Object.prototype to avoid having problems if somebody adds a property named hasOwnProperty to our object:
for (var prop in obj) {
if (Object.prototype.hasOwnProperty.call(obj, prop)) {
// prop is not inherited
}
}
There are three reasons why you shouldn't use for..in to iterate over array elements:
for..in will loop over all own and inherited properties of the array object which aren't DontEnum; that means if someone adds properties to the specific array object (there are valid reasons for this - I've done so myself) or changed Array.prototype (which is considered bad practice in code which is supposed to work well with other scripts), these properties will be iterated over as well; inherited properties can be excluded by checking hasOwnProperty(), but that won't help you with properties set in the array object itself
for..in isn't guaranteed to preserve element ordering
it's slow because you have to walk all properties of the array object and its whole prototype chain and will still only get the property's name, ie to get the value, an additional lookup will be required
Because for...in enumerates through the object that holds the array, not the array itself. If I add a function to the arrays prototype chain, that will also be included. I.e.
Array.prototype.myOwnFunction = function() { alert(this); }
a = new Array();
a[0] = 'foo';
a[1] = 'bar';
for(x in a){
document.write(x + ' = ' + a[x]);
}
This will write:
0 = foo
1 = bar
myOwnFunction = function() { alert(this); }
And since you can never be sure that nothing will be added to the prototype chain just use a for loop to enumerate the array:
for(i=0,x=a.length;i<x;i++){
document.write(i + ' = ' + a[i]);
}
This will write:
0 = foo
1 = bar
As of 2016 (ES6) we may use for…of for array iteration, as John Slegers already noticed.
I would just like to add this simple demonstration code, to make things clearer:
Array.prototype.foo = 1;
var arr = [];
arr[5] = "xyz";
console.log("for...of:");
var count = 0;
for (var item of arr) {
console.log(count + ":", item);
count++;
}
console.log("for...in:");
count = 0;
for (var item in arr) {
console.log(count + ":", item);
count++;
}
The console shows:
for...of:
0: undefined
1: undefined
2: undefined
3: undefined
4: undefined
5: xyz
for...in:
0: 5
1: foo
In other words:
for...of counts from 0 to 5, and also ignores Array.prototype.foo. It shows array values.
for...in lists only the 5, ignoring undefined array indexes, but adding foo. It shows array property names.
Short answer: It's just not worth it.
Longer answer: It's just not worth it, even if sequential element order and optimal performance aren't required.
Long answer: It's just not worth it...
Using for (var property in array) will cause array to be iterated over as an object, traversing the object prototype chain and ultimately performing slower than an index-based for loop.
for (... in ...) is not guaranteed to return the object properties in sequential order, as one might expect.
Using hasOwnProperty() and !isNaN() checks to filter the object properties is an additional overhead causing it to perform even slower and negates the key reason for using it in the first place, i.e. because of the more concise format.
For these reasons an acceptable trade-off between performance and convenience doesn't even exist. There's really no benefit unless the intent is to handle the array as an object and perform operations on the object properties of the array.
In isolation, there is nothing wrong with using for-in on arrays. For-in iterates over the property names of an object, and in the case of an "out-of-the-box" array, the properties corresponds to the array indexes. (The built-in propertes like length, toString and so on are not included in the iteration.)
However, if your code (or the framework you are using) add custom properties to arrays or to the array prototype, then these properties will be included in the iteration, which is probably not what you want.
Some JS frameworks, like Prototype modifies the Array prototype. Other frameworks like JQuery doesn't, so with JQuery you can safely use for-in.
If you are in doubt, you probably shouldn't use for-in.
An alternative way of iterating through an array is using a for-loop:
for (var ix=0;ix<arr.length;ix++) alert(ix);
However, this have a different issue. The issue is that a JavaScript array can have "holes". If you define arr as:
var arr = ["hello"];
arr[100] = "goodbye";
Then the array have two items, but a length of 101. Using for-in will yield two indexes, while the for-loop will yield 101 indexes, where the 99 has a value of undefined.
In addition to the reasons given in other answers, you may not want to use the "for...in" structure if you need to do math with the counter variable because the loop iterates through the names of the object's properties and so the variable is a string.
For example,
for (var i=0; i<a.length; i++) {
document.write(i + ', ' + typeof i + ', ' + i+1);
}
will write
0, number, 1
1, number, 2
...
whereas,
for (var ii in a) {
document.write(i + ', ' + typeof i + ', ' + i+1);
}
will write
0, string, 01
1, string, 11
...
Of course, this can easily be overcome by including
ii = parseInt(ii);
in the loop, but the first structure is more direct.
Aside from the fact that for...in loops over all enumerable properties (which is not the same as "all array elements"!), see http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf, section 12.6.4 (5th edition) or 13.7.5.15 (7th edition):
The mechanics and order of enumerating the properties ... is not specified...
(Emphasis mine.)
That means if a browser wanted to, it could go through the properties in the order in which they were inserted. Or in numerical order. Or in lexical order (where "30" comes before "4"! Keep in mind all object keys -- and thus, all array indexes -- are actually strings, so that makes total sense). It could go through them by bucket, if it implemented objects as hash tables. Or take any of that and add "backwards". A browser could even iterate randomly and be ECMA-262 compliant, as long as it visited each property exactly once.
In practice, most browsers currently like to iterate in roughly the same order. But there's nothing saying they have to. That's implementation specific, and could change at any time if another way was found to be far more efficient.
Either way, for...in carries with it no connotation of order. If you care about order, be explicit about it and use a regular for loop with an index.
Mainly two reasons:
One
Like others have said, You might get keys which aren't in your array or that are inherited from the prototype. So if, let's say, a library adds a property to the Array or Object prototypes:
Array.prototype.someProperty = true
You'll get it as part of every array:
for(var item in [1,2,3]){
console.log(item) // will log 1,2,3 but also "someProperty"
}
you could solve this with the hasOwnProperty method:
var ary = [1,2,3];
for(var item in ary){
if(ary.hasOwnProperty(item)){
console.log(item) // will log only 1,2,3
}
}
but this is true for iterating over any object with a for-in loop.
Two
Usually the order of the items in an array is important, but the for-in loop won't necessarily iterate in the right order, that's because it treats the array as an object, which is the way it is implemented in JS, and not as an array.
This seems like a small thing, but it can really screw up applications and is hard to debug.
I don't think I have much to add to eg. Triptych's answer or CMS's answer on why using for...in should be avoided in some cases.
I do, however, would like to add that in modern browsers there is an alternative to for...in that can be used in those cases where for...in can't be used. That alternative is for...of :
for (var item of items) {
console.log(item);
}
Note :
Unfortunately, no version of Internet Explorer supports for...of (Edge 12+ does), so you'll have to wait a bit longer until you can use it in your client side production code. However, it should be safe to use in your server side JS code (if you use Node.js).
Because it enumerates through object fields, not indexes. You can get value with index "length" and I doubt you want this.
The problem with for ... in ... — and this only becomes a problem when a programmer doesn't really understand the language; it's not really a bug or anything — is that it iterates over all members of an object (well, all enumerable members, but that's a detail for now). When you want to iterate over just the indexed properties of an array, the only guaranteed way to keep things semantically consistent is to use an integer index (that is, a for (var i = 0; i < array.length; ++i) style loop).
Any object can have arbitrary properties associated with it. There would be nothing terrible about loading additional properties onto an array instance, in particular. Code that wants to see only indexed array-like properties therefore must stick to an integer index. Code that is fully aware of what for ... in does and really need to see all properties, well then that's ok too.
TL&DR: Using the for in loop in arrays is not evil, in fact quite the opposite.
I think the for in loop is a gem of JS if used correctly in arrays. You are expected to have full control over your software and know what you are doing. Let's see the mentioned drawbacks and disprove them one by one.
It loops through inherited properties as well: First of all any extensions to the Array.prototype should have been done by using Object.defineProperty() and their enumerable descriptor should be set to false. Any library not doing so should not be used at all.
Properties those you add to the inheritance chain later get counted: When doing array sub-classing by Object.setPrototypeOf or by Class extend. You should again use Object.defineProperty() which by default sets the writable, enumerable and configurable property descriptors to false. Lets see an array sub-classing example here...
function Stack(...a){
var stack = new Array(...a);
Object.setPrototypeOf(stack, Stack.prototype);
return stack;
}
Stack.prototype = Object.create(Array.prototype); // now stack has full access to array methods.
Object.defineProperty(Stack.prototype,"constructor",{value:Stack}); // now Stack is a proper constructor
Object.defineProperty(Stack.prototype,"peak",{value: function(){ // add Stack "only" methods to the Stack.prototype.
return this[this.length-1];
}
});
var s = new Stack(1,2,3,4,1);
console.log(s.peak());
s[s.length] = 7;
console.log("length:",s.length);
s.push(42);
console.log(JSON.stringify(s));
console.log("length:",s.length);
for(var i in s) console.log(s[i]);
So you see.. for in loop is now safe since you cared about your code.
The for in loop is slow: Hell no. It's by far the fastest method of iteration if you are looping over sparse arrays which are needed time to time. This is one of the most important performance tricks that one should know. Let's see an example. We will loop over a sparse array.
var a = [];
a[0] = "zero";
a[10000000] = "ten million";
console.time("for loop on array a:");
for(var i=0; i < a.length; i++) a[i] && console.log(a[i]);
console.timeEnd("for loop on array a:");
console.time("for in loop on array a:");
for(var i in a) a[i] && console.log(a[i]);
console.timeEnd("for in loop on array a:");
Also, due to semantics, the way for, in treats arrays (i.e. the same as any other JavaScript object) is not aligned with other popular languages.
// C#
char[] a = new char[] {'A', 'B', 'C'};
foreach (char x in a) System.Console.Write(x); //Output: "ABC"
// Java
char[] a = {'A', 'B', 'C'};
for (char x : a) System.out.print(x); //Output: "ABC"
// PHP
$a = array('A', 'B', 'C');
foreach ($a as $x) echo $x; //Output: "ABC"
// JavaScript
var a = ['A', 'B', 'C'];
for (var x in a) document.write(x); //Output: "012"
Here are the reasons why this is (usually) a bad practice:
for...in loops iterate over all their own enumerable properties and the enumerable properties of their prototype(s). Usually in an array iteration we only want to iterate over the array itself. And even though you yourself may not add anything to the array, your libraries or framework might add something.
Example:
Array.prototype.hithere = 'hithere';
var array = [1, 2, 3];
for (let el in array){
// the hithere property will also be iterated over
console.log(el);
}
for...in loops do not guarantee a specific iteration order. Although is order is usually seen in most modern browsers these days, there is still no 100% guarantee.
for...in loops ignore undefined array elements, i.e. array elements which not have been assigned yet.
Example::
const arr = [];
arr[3] = 'foo'; // resize the array to 4
arr[4] = undefined; // add another element with value undefined to it
// iterate over the array, a for loop does show the undefined elements
for (let i = 0; i < arr.length; i++) {
console.log(arr[i]);
}
console.log('\n');
// for in does ignore the undefined elements
for (let el in arr) {
console.log(arr[el]);
}
In addition to the other problems, the "for..in" syntax is probably slower, because the index is a string, not an integer.
var a = ["a"]
for (var i in a)
alert(typeof i) // 'string'
for (var i = 0; i < a.length; i++)
alert(typeof i) // 'number'
An important aspect is that for...in only iterates over properties contained in an object which have their enumerable property attribute set to true. So if one attempts to iterate over an object using for...in then arbitrary properties may be missed if their enumerable property attribute is false. It is quite possible to alter the enumerable property attribute for normal Array objects so that certain elements are not enumerated. Though in general the property attributes tend to apply to function properties within an object.
One can check the value of a properties' enumerable property attribute by:
myobject.propertyIsEnumerable('myproperty')
Or to obtain all four property attributes:
Object.getOwnPropertyDescriptor(myobject,'myproperty')
This is a feature available in ECMAScript 5 - in earlier versions it was not possible to alter the value of the enumerable property attribute (it was always set to true).
The for/in works with two types of variables: hashtables (associative arrays) and array (non-associative).
JavaScript will automatically determine the way its passes through the items. So if you know that your array is really non-associative you can use for (var i=0; i<=arrayLen; i++), and skip the auto-detection iteration.
But in my opinion, it's better to use for/in, the process required for that auto-detection is very small.
A real answer for this will depend on how the browser parsers/interpret the JavaScript code. It can change between browsers.
I can't think of other purposes to not using for/in;
//Non-associative
var arr = ['a', 'b', 'c'];
for (var i in arr)
alert(arr[i]);
//Associative
var arr = {
item1 : 'a',
item2 : 'b',
item3 : 'c'
};
for (var i in arr)
alert(arr[i]);
Because it will iterate over properties belonging to objects up the prototype chain if you're not careful.
You can use for.. in, just be sure to check each property with hasOwnProperty.
It's not necessarily bad (based on what you're doing), but in the case of arrays, if something has been added to Array.prototype, then you're going to get strange results. Where you'd expect this loop to run three times:
var arr = ['a','b','c'];
for (var key in arr) { ... }
If a function called helpfulUtilityMethod has been added to Array's prototype, then your loop would end up running four times: key would be 0, 1, 2, and helpfulUtilityMethod. If you were only expecting integers, oops.
You should use the for(var x in y) only on property lists, not on objects (as explained above).
Using the for...in loop for an array is not wrong, although I can guess why someone told you that:
1.) There is already a higher order function, or method, that has that purpose for an array, but has more functionality and leaner syntax, called 'forEach': Array.prototype.forEach(function(element, index, array) {} );
2.) Arrays always have a length, but for...in and forEach do not execute a function for any value that is 'undefined', only for the indexes that have a value defined. So if you only assign one value, these loops will only execute a function once, but since an array is enumerated, it will always have a length up to the highest index that has a defined value, but that length could go unnoticed when using these loops.
3.) The standard for loop will execute a function as many times as you define in the parameters, and since an array is numbered, it makes more sense to define how many times you want to execute a function. Unlike the other loops, the for loop can then execute a function for every index in the array, whether the value is defined or not.
In essence, you can use any loop, but you should remember exactly how they work. Understand the conditions upon which the different loops reiterate, their separate functionalities, and realize they will be more or less appropriate for differing scenarios.
Also, it may be considered a better practice to use the forEach method than the for...in loop in general, because it is easier to write and has more functionality, so you may want to get in the habit of only using this method and standard for, but your call.
See below that the first two loops only execute the console.log statements once, while the standard for loop executes the function as many times as specified, in this case, array.length = 6.
var arr = [];
arr[5] = 'F';
for (var index in arr) {
console.log(index);
console.log(arr[index]);
console.log(arr)
}
// 5
// 'F'
// => (6) [undefined x 5, 6]
arr.forEach(function(element, index, arr) {
console.log(index);
console.log(element);
console.log(arr);
});
// 5
// 'F'
// => Array (6) [undefined x 5, 6]
for (var index = 0; index < arr.length; index++) {
console.log(index);
console.log(arr[index]);
console.log(arr);
};
// 0
// undefined
// => Array (6) [undefined x 5, 6]
// 1
// undefined
// => Array (6) [undefined x 5, 6]
// 2
// undefined
// => Array (6) [undefined x 5, 6]
// 3
// undefined
// => Array (6) [undefined x 5, 6]
// 4
// undefined
// => Array (6) [undefined x 5, 6]
// 5
// 'F'
// => Array (6) [undefined x 5, 6]
A for...in loop always enumerates the keys.
Objects properties keys are always String, even the indexed properties of an array :
var myArray = ['a', 'b', 'c', 'd'];
var total = 0
for (elem in myArray) {
total += elem
}
console.log(total); // 00123
for...in is useful when working on an object in JavaScript, but not for an Array, but still we can not say it's a wrong way, but it's not recommended, look at this example below using for...in loop:
let txt = "";
const person = {fname:"Alireza", lname:"Dezfoolian", age:35};
for (const x in person) {
txt += person[x] + " ";
}
console.log(txt); //Alireza Dezfoolian 35
OK, let's do it with Array now:
let txt = "";
const person = ["Alireza", "Dezfoolian", 35];
for (const x in person) {
txt += person[x] + " ";
}
console.log(txt); //Alireza Dezfoolian 35
As you see the result the same...
But let's try something, let's prototype something to Array...
Array.prototype.someoneelse = "someoneelse";
Now we create a new Array();
let txt = "";
const arr = new Array();
arr[0] = 'Alireza';
arr[1] = 'Dezfoolian';
arr[2] = 35;
for(x in arr) {
txt += arr[x] + " ";
}
console.log(txt); //Alireza Dezfoolian 35 someoneelse
You see the someoneelse!!!... We actually looping through new Array object in this case!
So that's one of the reasons why we need to use for..in carefully, but it's not always the case...
Since JavaScript elements are saved as standard object properties, it
is not advisable to iterate through JavaScript arrays using for...in
loops because normal elements and all enumerable properties will be
listed.
From https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Indexed_collections
although not specifically addressed by this question, I would add that there's a very good reason not to ever use for...in with a NodeList (as one would obtain from a querySelectorAll call, as it doesn't see the returned elements at all, instead iterating only over the NodeList properties.
in the case of a single result, I got:
var nodes = document.querySelectorAll(selector);
nodes
▶ NodeList [a._19eb]
for (node in nodes) {console.log(node)};
VM505:1 0
VM505:1 length
VM505:1 item
VM505:1 entries
VM505:1 forEach
VM505:1 keys
VM505:1 values
which explained why my for (node in nodes) node.href = newLink; was failing.
for in loop converts the indices to string when traversing through an array.
For example, In the below code, in the second loop where initialising j with i+1, i is the index but in a string ("0", "1" etc) and number + string in js is a string. if js encounters "0" + 1 it will return "01".
var maxProfit = function(prices) {
let maxProfit = 0;
for (let i in prices) {
for (let j = i + 1; j < prices.length; j++) {
console.log(prices[j] - prices[i], "i,j", i, j, typeof i, typeof j);
if ((prices[j] - prices[i]) > maxProfit) maxProfit = (prices[j] - prices[i]);
}
}
return maxProfit;
};
maxProfit([7, 1, 5, 3, 6, 4]);

Duplicate Int in Array , Dictionary or Set in SWIFT

Reading up on Sets and Arrays I find that a Set cannot, or is not able to store duplicate values ( Ints, Strings, etc ).
Knowing this, if we are to solve for finding a duplicate Int in an array and one method is to convert the Array to a Set, how come we don't get an error once the Array is a Set?
The methods below simply return a Bool value if the array contains duplicates.
import UIKit
func containsDuplicatesDictionary(a: [Int]) -> Bool {
var aDict = [Int : Int]()
for value in a {
if let count = aDict[value] {
aDict[value] = count + 1
return true
} else {
aDict[value] = 1
}
}
return false
}
containsDuplicatesDictionary(a: [1,2,2,4,5])
func containsDuplicatesSet(a: [Int]) -> Bool {
return Set(a).count != a.count
}
containsDuplicatesSet(a: [1,2,2,4])
The first function, containsDuplicatesDictionary, I convert the array to a Dictionary, of course this takes a for loop as well. The Set method can be done in one line, which is really nice. But I guess since I am new to this, I would think converting the array would throw an error immediately since theres duplicate values.
What am I missing when it's converted
Thank you.
Set, by design is an unordered, unique collection of elements. The implementation of Set takes care of duplicate values itself, when you try to add a duplicate value, it checks whether the value is already present in the Set or not and if it is, the value is not added.
When you call the initializer of Set that takes a sequence as its input parameter (this is what you use when writing Set(a), where a is of type [Int], under the hood, the initializer adds the elements one by one checking whether any of the new elements are already present in the Set or not.
You could make a custom initializer method for Set that would throw an error if you would try to add a duplicate value to it, but it wouldn't really have any advantages for any users of Swift, hence the current implementation that just doesn't add the value if it is already present in the Set and doesn't throw an error. This way, you can safely and easily get rid of any duplicates in a non-unique collection of elements (such as an array).

Efficient algorithm for difference of array and a known subsequence?

I'm passing an array to a library function which returns an array which is a subsequence of the input array. That is to say the orders of the first and second array are identical but the second array may be lacking any number of elements of the first array. There will be no duplicates in either array!
I want to then build a new array of all the elements which were in the input but are not in the output of the function.
For some reason though it sounds trivial I keep getting it wrong, especially at the ends of the arrays it seems.
Example 1 (typical):
input array a:
[ yyz, ltn, tse, uln, ist, gva, doh, hhn, vlc, ios, app, tlv, lcy ]
input array b:
[ yyz, ltn, tse, uln, ist, gva, doh, hhn, vlc, tlv, lcy ]
output array "diff":
[ ios, app ]
Example 2 (minimal, reveals some bugs when the difference is at the end of the strings):
input array a:
[ usa ]
input array b:
[ ]
output array "diff":
[ usa ]
(I'm going to implement it in JavaScript / jQuery but I'm more interested in a generic algorithm in pseudocode since I'll actually be dealing with arrays of objects. So please I'm looking for algorithms which specifically use array indexing rather than pointers like I would in C/C++)
As the second array b is a subset of the first array a with the same order, you can walk both in parallel, compare the current values, and take the current value of a if it is different from the current value of b:
var a = ['yyz','ltn','tse','uln','ist','gva','doh','hhn','vlc','ios','app','tlv','lcy'],
b = ['yyz','ltn','tse','uln','ist','gva','doh','hhn','vlc','tlv','lcy'],
diff = [];
var i=0, j=0, n=a.length, m=b.length;
while (i<n && j<m) {
if (a[i] !== b[j]) {
diff.push(a[i]);
} else {
j++;
}
i++;
}
while (i<n) {
diff.push(a[i++]);
}
Or if you prefer just one while loop:
// …
while (i<n) {
if (j<m && a[i] === b[j]) {
j++;
} else {
diff.push(a[i]);
}
i++;
}
In java i would probably do something like this if I hade to use Arrays. You will have to loop over all your objects you get back and you will have to compare them to all of thoese you sent in so you will in the worst case have a O(n^2) complexity I belive, but, you can probably improve this by sorting your list you send in and the use pointers to to check each position (but since you didnt want to use pointers I leave this sample out) then you might be able to compare this in O(n).
public void doYourJob(){
Object[] allObjects = new Object[10]; //hold all original values
Object[] recivedArray = yourBlackBox(allObjects); //send in the array an gets the smaller one
Object[] missingArray = new Object[allObjects.length - recivedArray.length];
for(Object inObj : allObjects){
boolean foundObject = false;
for(Object obj : recivedArray){
if(inObj.equals(obj)){
foundObject = true;
break;
}
}
if(!foundObject)
missingArray add inObj //add the missing object. This is not correct java code. =)
}
}
If I were aloud to use something from the Collection interface then this would be much simpler since you can use a "myArray.contains()" method.
With Lists instead
public void doYourJob(){
List<Object> allObjects = new ArrayList<Object>(); //hold all original values
List<Object> recivedArray = yourBlackBox(allObjects); //send in the array an gets the smaller one
List<Object> missingArray = new ArrayList<Object>();
for(Object inObj : allObjects){
if(!recivedArray.contains(inObj))
missingArray.add(inObj);
}
}
Do you have a guaranteed ordering imposed on your arrays? If so, it should be relatively simple to do something like:
# our inputs are array1 and array2, array2 is the one with 0 or more missing elements
ix1 = 0
ix2 = 0
diff = new array
while ix2 < length(array2)
while (ix1 < length(array1)) and (array1[ix1] != array2[ix2])
add array1[ix1] to diff
ix1 = ix1 + 1
ix1 = ix1 + 1
ix2 = ix2 + i
return diff
If you do not have an ordering, you can either impose one (sort both arrays) or you can use a hash table.
hash = new hash
diff = new array
for each element in array1
hash[element] = 1
for each element in array2
hash[element] = hash[element] + 1
for each key in hash
if hash[key] == 1
add hash[key] to diff
Both of these should run in (roughly) O(n), if (and only if) adding an element to an array is O(1) (if you double the size of the result array every time it gets filled, it's at least asymptotically O(1)).

Why should I use foreach instead of for (int i=0; i<length; i++) in loops?

It seems like the cool way of looping in C# and Java is to use foreach instead of C style for loops.
Is there a reason why I should prefer this style over the C style?
I'm particularly interested in these two cases, but please address as many cases as you need to explain your points.
I wish to perform an operation on each item in a list.
I am searching for an item in a list, and wish to exit when that item is found.
Imagine that you're the head chef for a restaurant, and you're all preparing a huge omelette for a buffet. You hand a carton of a dozen eggs to each of two of the kitchen staff, and tell them to get cracking, literally.
The first one sets up a bowl, opens the crate, grabs each egg in turn - from left to right across the top row, then the bottom row - breaking it against the side of the bowl and then emptying it into the bowl. Eventually he runs out of eggs. A job well done.
The second one sets up a bowl, opens the crate, and then dashes off to get a piece of paper and a pen. He writes the numbers 0 through 11 next to the compartments of the egg carton, and the number 0 on the paper. He looks at the number on the paper, finds the compartment labelled 0, removes the egg and cracks it into the bowl. He looks at the 0 on the paper again, thinks "0 + 1 = 1", crosses out the 0 and writes 1 on the paper. He grabs the egg from compartment 1 and cracks it. And so on, until the number 12 is on the paper and he knows (without looking!) that there are no more eggs. A job well done.
You'd think the second guy was a bit messed in the head, right?
The point of working in a high-level language is to avoid having to describe things in a computer's terms, and to be able to describe them in your own terms. The higher-level the language, the more true this is. Incrementing a counter in a loop is a distraction from what you really want to do: process each element.
Further to that, linked-list type structures can't be processed efficiently by incrementing a counter and indexing in: "indexing" means starting over counting from the beginning. In C, we can process a linked list that we made ourselves by using a pointer for the loop "counter" and dereferencing it. We can do this in modern C++ (and to an extent in C# and Java) using "iterators", but this still suffers from the indirectness problem.
Finally, some languages are high-enough level that the idea of actually writing a loop to "perform an operation on each item in a list" or "search for an item in a list" is appalling (in the same way that the head chef shouldn't have to tell the first kitchen staff member how to ensure that all the eggs are cracked). Functions are provided that set up that loop structure, and you tell them - via a higher-order function, or perhaps a comparison value, in the searching case - what to do within the loop. (In fact, you can do these things in C++, although the interfaces are somewhat clumsy.)
Two major reasons I can think of are:
1) It abstracts away from the underlying container type. This means, for example, that you don't have to change the code that loops over all the items in the container when you change the container -- you're specifying the goal of "do this for every item in the container", not the means.
2) It eliminates the possibility of off-by-one errors.
In terms of performing an operation on each item in a list, it's intuitive to just say:
for(Item item: lst)
{
op(item);
}
It perfectly expresses the intent to the reader, as opposed to manually doing stuff with iterators. Ditto for searching for items.
foreach is simpler and more readable
It can be more efficient for constructions like linked lists
Not all collections support random access; the only way to iterate a HashSet<T> or a Dictionary<TKey, TValue>.KeysCollection is foreach.
foreach allows you to iterate through a collection returned by a method without an extra temporary variable:
foreach(var thingy in SomeMethodCall(arguments)) { ... }
One benefit for me is that it's less easy to make mistakes such as
for(int i = 0; i < maxi; i++) {
for(int j = 0; j < maxj; i++) {
...
}
}
UPDATE:
This is one way the bug happens. I make a sum
int sum = 0;
for(int i = 0; i < maxi; i++) {
sum += a[i];
}
and then decide to aggregate it more. So I wrap the loop in another.
int total = 0;
for(int i = 0; i < maxi; i++) {
int sum = 0;
for(int i = 0; i < maxi; i++) {
sum += a[i];
}
total += sum;
}
Compile fails, of course, so we hand edit
int total = 0;
for(int i = 0; i < maxi; i++) {
int sum = 0;
for(int j = 0; j < maxj; i++) {
sum += a[i];
}
total += sum;
}
There are now at least TWO mistakes in the code (and more if we've muddled maxi and maxj ) which will only be detected by runtime errors. And if you don't write tests... and it's a rare piece of code - this will bite someone ELSE - badly.
That is why it's a good idea to extract the inner loop into a method:
int total = 0;
for(int i = 0; i < maxi; i++) {
total += totalTime(maxj);
}
private int totalTime(int maxi) {
int sum = 0;
for(int i = 0; i < maxi; i++) {
sum += a[i];
}
return sum;
}
and it's more readable.
foreach will perform identically to a for in all scenarios[1], including straightforward ones such as you describe.
However, foreach has certain non-performance-related advantages over for:
Convenience. You do not need to keep an extra local i around (which has no purpose in life other than facilitating the loop), and you do not need to fetch the current value into a variable yourself; the loop construct has already taken care of that.
Consistency. With foreach, you can iterate over sequences which are not arrays with the same ease. If you want to use for to loop over a non-array ordered sequence (e.g. a map/dictionary) then you have to write the code a little differently. foreach is the same in all cases it covers.
Safety. With great power comes great responsibility. Why open opportunities for bugs related to incrementing the loop variable if you don't need it in the first place?
So as we see, foreach is "better" to use in most situations.
That said, if you need the value of i for other purposes, or if you are handling a data structure that you know is an array (and there is an actual specific reason for it being an array), the increased functionality that the more down-to-the-metal for offers will be the way to go.
[1] "In all scenarios" really means "all scenarios where the collection is friendly to being iterated", which would actually be "most scenarios" (see comments below). I really think that an iteration scenario involving an iteration-unfriendly collection would have to be engineered, however.
You should probably consider also LINQ if you are targeting C# as a language, since this is another logical way to do loops.
By perform an operation on each item in a list do you mean modify it in place in the list, or simply do something with the item (e.g. print it, accumulate it, modify it, etc.)? I suspect it is the latter, since foreach in C# won't allow you to modify the collection you are looping over, or at least not in a convenient way...
Here are two simple constructs, first using forand then using foreach, which visit all strings in a list and turn them into uppercase strings:
List<string> list = ...;
List<string> uppercase = new List<string> ();
for (int i = 0; i < list.Count; i++)
{
string name = list[i];
uppercase.Add (name.ToUpper ());
}
(note that using the end condition i < list.Count instead of i < length with some precomputer length constant is considered a good practice in .NET, since the compiler would anyway have to check for the upper bound when list[i] is invoked in the loop; if my understanding is correct, the compiler is able in some circumstances to optimize away the upper bound check it would normally have done).
Here is the foreach equivalent:
List<string> list = ...;
List<string> uppercase = new List<string> ();
foreach (name in list)
{
uppercase.Add (name.ToUpper ());
}
Note: basically, the foreach construct can iterate over any IEnumerable or IEnumerable<T> in C#, not just over arrays or lists. The number of elements in the collection might therefore not be known beforehand, or might even be infinite (in which case you certainly would have to include some termination condition in your loop, or it won't exit).
Here are a few equivalent solutions I can think of, expressed using C# LINQ (and which introduces the concept of a lambda expression, basically an inline function taking an x and returning x.ToUpper () in the following examples):
List<string> list = ...;
List<string> uppercase = new List<string> ();
uppercase.AddRange (list.Select (x => x.ToUpper ()));
Or with the uppercase list populated by its constructor:
List<string> list = ...;
List<string> uppercase = new List<string> (list.Select (x => x.ToUpper ()));
Or the same using the ToList function:
List<string> list = ...;
List<string> uppercase = list.Select (x => x.ToUpper ()).ToList ();
Or still the same with type inference:
List<string> list = ...;
var uppercase = list.Select (x => x.ToUpper ()).ToList ();
or if you don't mind getting the result as an IEnumerable<string> (an enumerable collection of strings), you could drop the ToList:
List<string> list = ...;
var uppercase = list.Select (x => x.ToUpper ());
Or maybe another one with the C# SQL-like from and select keywords, which is fully equivalent:
List<string> list = ...;
var uppercase = from name in list
select name => name.ToUpper ();
LINQ is very expressive and very often, I feel that the code is more readable than a plain loop.
Your second question, searching for an item in a list, and wish to exit when that item is found can also be very conveniently be implemented using LINQ. Here is an example of a foreach loop:
List<string> list = ...;
string result = null;
foreach (name in list)
{
if (name.Contains ("Pierre"))
{
result = name;
break;
}
}
Here is the straightforward LINQ equivalent:
List<string> list = ...;
string result = list.Where (x => x.Contains ("Pierre")).FirstOrDefault ();
or with the query syntax:
List<string> list = ...;
var results = from name in list
where name.Contains ("Pierre")
select name;
string result = results.FirstOrDefault ();
The results enumeration is only executed on demand, which means that effectively, the list will only be iterated until the condition is met, when invoking the FirstOrDefault method on it.
I hope this brings some more context to the for or foreach debate, at least in the .NET world.
As Stuart Golodetz answered, it's an abstraction.
If you're only using i as an index, as opposed to using the value of i for some other purpose like
String[] lines = getLines();
for( int i = 0 ; i < 10 ; ++i ) {
System.out.println( "line " + i + lines[i] ) ;
}
then there's no need to know the current value of i, and being able to just leads to the possibility of errors:
Line[] pages = getPages();
for( int i = 0 ; i < 10 ; ++i ) {
for( int j = 0 ; j < 10 ; ++i )
System.out.println( "page " + i + "line " + j + page[i].getLines()[j];
}
As Andrew Koenig says, "Abstraction is selective ignorance"; if you don't need to know the details of how you iterate some collection, then find a way to ignore those details, and you'll write more robust code.
Reasons to use foreach:
It prevents errors from creeping in (e.g. you forgot to i++ in the for loop) that could cause the loop to malfunction. There are lots of ways to screw up for loops, but not many ways to screw up foreach loops.
It looks much cleaner / less cryptic.
A for loop may not even be possible in some cases (for example, if you have an IEnumerable<T>, which cannot be indexed like an IList<T> can).
Reasons to use for:
These kinds of loops have a slight performance advantage when iterating over flat lists (arrays) because there is no extra level of indirection created by using an enumerator. (However, this performance gain is minimal.)
The object you want to enumerate does not implement IEnumerable<T> -- foreach only operates on enumerables.
Other specialized situations; for example, if you are copying from one array to another, foreach will not give you an index variable that you can use to address the destination array slot. for is about the only thing that makes sense in such cases.
The two cases you list in your question are effectively identical when using either loop -- in the first, you just iterate all the way to the end of the list, and in the second you break; once you have found the item you are looking for.
Just to explain foreach further, this loop:
IEnumerable<Something> bar = ...;
foreach (var foo in bar) {
// do stuff
}
is syntactic sugar for:
IEnumerable<Something> bar = ...;
IEnumerator<Something> e = bar.GetEnumerator();
try {
Something foo;
while (e.MoveNext()) {
foo = e.Current;
// do stuff
}
} finally {
((IDisposable)e).Dispose();
}
If you are iterating over a collection that implements IEnumerable, it is more natural to use foreach because the next member in the iteration is assigned at the same time that the test for reaching the end is done. E.g.,
foreach (string day in week) {/* Do something with the day ... */}
is more straightforward than
for (int i = 0; i < week.Length; i++) { day = week[i]; /* Use day ... */ }
You can also use a for loop in your class's own implementation of IEnumerable. Simply have your GetEnumerator() implementation use the C# yield keyword in the body of your loop:
yield return my_array[i];
Java has both of the loop types you have pointed to. You can use either of the for loop variants depending on your need. Your need can be like this
You want to rerun the index of your search item in the list.
You want to get the item itself.
In the first case you should use the classic (c style) for loop. but in the second case you should use the foreach loop.
The foreach loop can be used in the first case also. but in that case you need to maintain your own index.
If you can do what you need with foreach then use it; if not -- for example, if you need the index variable itself for some reason -- then use for. Simple!
(And your two scenarios are equally possible with either for or foreach.)
one reason not to use foreach at least in java is that it will create an iterator object which will eventually be garbage collected. Thus if you are trying to write code that avoids garbage collection it is better to avoid foreach. However, I believe it is ok for pure arrays because it doesn't create an iterator.
I could think of several reasons
you can't mess up indexes, also in mobile environment you do not have compiler optimizations and lousily written for loop could do several bounderay checks, where as for each loop does only 1.
you can't change data input size (add / remove elements) while iterating it. Your code does not brake that easily. If you need to filter or transform data, then use other loops.
you can iterate over data structures, that can't be accesses by index, but can be crawled over. For each just needs that you implement iterable interface (java) or extend IEnumerable (c#).
you can have smaller boiler plate, for example when parsing XML it's difference between SAX and StAX, first needs in-memory copy of the DOM to refer to an element latter just iterates over data (it is not as fast, but it is memory efficient)
Note that if you are searching for an item in the list with for each, you most likely are doing it wrongly. Consider using hashmap or bimap to skip the searching all together.
Assuming that programmer want's to use for loop as for each using iterators, there exists a common bug of skipping elements. So in that scene it is more safer.
for ( Iterator<T> elements = input.iterator(); elements.hasNext(); ) {
// Inside here, nothing stops programmer from calling `element.next();`
// more then once.
}
Talking about clean code, a foreach statement is much quicker to read than a for statement!
Linq (in C#) can do much the same, but novice developers tend to have a hard time reading them!
It looks like most items are covered... the following are some extra notes that I do not see mentioned regarding your specific questions. These are hard rules as opposed to style preferences one way or the other:
I wish to perform an operation on each item in a list
In a foreach loop, you can not change the value of the iteration variable, so if you are looking to change the value of a specific item in your list you have to use for.
It is also worth noting that the "cool" way is now to use LINQ; there are plenty of resources you can search for if you are interested.
foreach is order of magnitude slower for implementation heavy collection.
I have proof. These are my findings
I used the following simple profiler to test their performance
static void Main(string[] args)
{
DateTime start = DateTime.Now;
List<string> names = new List<string>();
Enumerable.Range(1, 1000).ToList().ForEach(c => names.Add("Name = " + c.ToString()));
for (int i = 0; i < 100; i++)
{
//For the for loop. Uncomment the other when you want to profile foreach loop
//and comment this one
//for (int j = 0; j < names.Count; j++)
// Console.WriteLine(names[j]);
//for the foreach loop
foreach (string n in names)
{
Console.WriteLine(n);
}
}
DateTime end = DateTime.Now;
Console.WriteLine("Time taken = " + end.Subtract(start).TotalMilliseconds + " milli seconds");
And I got the following results
Time taken = 11320.73 milli seconds (for loop)
Time taken = 11742.3296 milli seconds (foreach loop)
A foreach also notifies you if the collection you're enumerating through changes (i.e. you HAD 7 items in your collection...until another operation on a separate thread removed one and now you only have 6 #_#)
Just wanted to add that whoever thinks that foreach gets translated into for and therefore has no performance difference is dead wrong. There are many things that happen under the hood, i.e. the enumeration of the object type which is NOT in a simple for loop. It looks more like an iterator loop:
Iterator iter = o.getIterator();
while (iter.hasNext()){
obj value = iter.next();
...do something
}
which is significantly different than a simple for loop. If you dont understand why, then lookup vtables. Furthermore, who knows whats in the hasNext function? For all we know it could be:
while (!haveiwastedtheprogramstimeenough){
}
now advance
Exageration aside, there are function of unknown implementation and efficiency being called. Since compilers dont optimize accross function boundaries, there is NO optimization happening here, just your simple vtable lookup and function call. This is not just theory, in practice, i have seen significant speedups by switching from foreach to for on the standard C# ArrayList. To be fair, it was an arraylist with about 30,000 items, but still.

Find object matches from arrays

Let's say I have 4 arrays:
[1,3,54,4]
[54,2,3,9]
[3,2,9,54]
[54,8,4,3]
I need to get the objects (in this case integers but they will be custom object) that are present in (common to) all of the arrays. In the case above I would need the result to be: [54,3] as those are the only items two that are in all four arrays.
Order does not matter, speed matters greatly, array sizes and the number of arrays will vary greatly.
I'm using C# 4 and ASP.NET. The arrays will be List although they could just be converted.
Thanks :)
How about:
ISet<int> intersection = new HashSet<int>(firstArray);
intersection.IntersectWith(secondArray);
intersection.IntersectWith(thirdArray);
intersection.IntersectWith(fourthArray);
Note that this should be more efficient than the more obvious:
var x = firstArray.Intersect(secondArray)
.Intersect(thirdArray)
.Intersect(fourthArray);
as the latter will create a new hash set for each method call.
Obviously with multiple arrays you'd just loop, e.g.
static ISet<T> IntersectAll<T>(IEnumerable<IEnumerable<T>> collections)
{
using (IEnumerator<T> iterator = collections.GetEnumerator())
{
if (!iterator.MoveNext())
{
return new HashSet<T>();
}
HashSet<T> items = new HashSet<T>(iterator.Current);
while (iterator.MoveNext())
{
items.IntersectWith(iterator.Current);
}
return items;
}
}

Resources