I want to loop on a slice of an array. I have basically two main options.
ar.each_with_index{|e,i|
next if i < start_ind
break if i > end_ind
foo(e)
#maybe more code...
}
Another option, which I think is more elegant, would be to run:
ar[start_ind..end_ind].each{|e|
foo(e)
#maybe more code...
}
My concern is Ruby potentially creating a huge array under the hood and doing a lot of memory allocation. Or is there something "smarter" at play that does not create a copy?
You could do a loop of index values... not as elegant as your second solution but economical.
(start_ind..end_ind).each do |index|
foo(ar[index])
# maybe more code
end
You may want to refer to methods' C source code, but it takes a bit of time to read the code. May I help you in this
First: each_index
It's source code in C is tricky, but boils down to something similar to 'each' which looks like
VALUE rb_ary_each(VALUE ary) {
long i;
RETURN_SIZED_ENUMERATOR(ary, 0, 0, ary_enum_length);
for (i=0; i<RARRAY_LEN(ary); i++) {
rb_yield(RARRAY_AREF(ary, i));
}
return ary;
}
It does not create any other array internally by itself. What it effectively does is it simply loops through elements, takes each element and passes it into the block provided (rb_yield part). What's actually inside the block that you provide is a different story.
Second: [...].each
You actually have to notice it is two function calls. The second being 'each' is of little interest to us since it is described above The first function call is '[]'. Logically you expect it to output an subarray as variable, which has to be stored at least temporary.
Let's verify. Source code for C is rather long, but the piece of the greatest importance to you is:
VALUE rb_ary_aref(int argc, const VALUE *argv, VALUE ary) {
// some code
if (argc == 2) {
beg = NUM2LONG(argv[0]);
len = NUM2LONG(argv[1]);
if (beg < 0) {
beg += RARRAY_LEN(ary);
}
return rb_ary_subseq(ary, beg, len);
}
// some more code
}
It's actually for a function call like ar[start_ind, end_ind] and not ar[start_ind..end_ind]. The difference is immaterial, but this way is easier to understand.
The thing that answers your question is called "rb_ary_subseq". As you may guess from its name or learn from its source, it actually does create a new array. So it would create a copy under the hood of size equal or less of the array given.
You'd want to consider computational cost of functional calls, but the question is about memory.
i've been trying to make a program that takes (for example) 3 cards at random.
But i don't want my program to grab the same card twice, so that means it can't have duplicates, but i don't know how to do this with a image Array.
String[] card = {
"Aclubs.png",
"2clubs.png",
"3clubs.png",
};
PImage[] cards = new PImage [card.length];
void setup() {
size(1000,1000);
randomCards();
drawCards();
}
int randomCards() {
int i = (round(random(0,2)));
cards[i] = loadImage(card[i]);
return i;
}
void drawCards() {
for (int g = 0; g < 12000; g = g+round((displayWidth * 0.9))/12) {
image(cards[randomCards()], 25+g, 50);
}
}
Instead of using an array, use an ArrayList. Then remove the cards you use. Here's a small example:
ArrayList<String> things = new ArrayList<String>();
things.add("cat");
things.add("dog");
things.add("lizard");
while (!things.isEmpty()) {
int index = int(random(things.size()));
String thing = things.remove(index);
println(thing);
}
Of course, this isn't the only way to do it. You could use a Java Set, or you could use a data structure that holds what you've already picked, or you could store all of the options in a data structure, then shuffle it, then just chose from an index that you increment. Or you could use one of the array functions in the reference to do it.
It's hard to answer general "how do I do this" type questions. Stack Overflow is designed for more specific "I tried X, expected Y, but got Z instead" type questions. So you really should get into the habit of trying things out first. Ask yourself how you would do this in real life, then look at the reference to see if there are any classes or functions that would help with that. Break your problem down into smaller pieces. Write down how you would do this in real life, in English. Pretend you're handing those instructions to a friend. Could they follow your instructions to accomplish the goal? When you have those instructions written out, that's an algorithm that you can start thinking about implementing in code. Staring at code you've already written won't get you very far. Then when you do get stuck, you can ask a more specific question, and it'll be a lot easier to help you.
//runs through initial values and set them to null and zero;
for(int g =0;g<Arraysize;g++){Array1[g].word="NULL";Array1[g].usage=0;}
//struct
int Arraysize = 100;
struct HeavyWords{
string word;
int usage;
};
//runs through txt file and checks if word has already been stored, if it didn't,
it adds it as the next point in the struct, if it has, it adds to the usage int at that point in my array of structs
while (myfile >> Bookword)
{totalwords++;cout<<Bookword<<endl;
bool foundWord = false;
for(int q = 0;q<counter;q++)
{
if(Array1[q].word == Bookword)
{
Array1[q].usage++;
foundWord = true;
}
}
if(foundWord == false) {
Array1[counter].word = Bookword;
Array1[counter].usage = 1;
counter++;
//cout<<counter<<endl;
}
//double size of array when the counter reaches array size
if(counter==Arraysize)
{
HeavyWords * Array2;
Array2 = new HeavyWords[2*Arraysize];
for (int k= 0;k<Arraysize;k++)
{
Array2[k].word = Array1[k].word;
}
Arraysize = 2*Arraysize;
Arraydouble++;
HeavyWords* cursor = Array1;
Array1 = Array2;
delete [] cursor;
}
}
//I just started programming in C++ so im apologize if this code is an explosion of nonesence.
//here is my code,
//I have been racking my brain as to why it is not correctly storing the usage of each word, but when I run it, it gives me the incorrect amount of times certain words are used
//would really love if someone could tell me where my logic went wrong
Immediate problem - While copying Array1 to Array2 you are not copying the usage.
Solution - copy the usage. A statement such as Array2[k] = Array1[k] would do.
Suggestions:
You are also not breaking out in the first part of the code when you find a match in the Array1 for the word you are looking for. You code would needlessly continue to iterate over the entire array, when e.g. a match would have been found at say 10th index and you could have come out of the for loop.
You are re inventing the wheel. You need an expandable array; C++ STL has one readymade for you - it is called vector.
Also Array/Vector does NOT look to be right choice for what you are trying to do. On each word you are doing a linear search on the Array1. A map from C++ STL would neatly AND efficiently do what you are trying to do. Your code would also be much shorter. You can look up on how to code with maps. If you write some code, I can help further. Or wait; someone here would write out entire code for you :).
I have almost completed the code for this problem, which I shall state as under:
Given:
Array of length 'n' (say n = 10000) declared as below,
char **records = malloc(10000*sizeof(*records));
Each record[i] is a char pointer and points to a non-empty string.
records[i] = malloc(11);
The strings are of fixed length (10 chars + '\0').
Requirement:
Return the most frequently occurring string in the above array.
But now, I am interested in obtaining a slightly less brutal algorithm than the primitive one which I have currently, which is to sift through the entire array in two for loops :(, storing strings encountered by the two loops in a temporary array of similar size ('n' - in case all are unique strings) for comparison with the next strings. The inner loop iterates from 'outer loop position + 1' to 'n'. At the same time, I have an integer array, of similar size - 'n', for counting repeat occurrences, with each i th element corresponding to the i th (unique) string in the comparison array. Then find the largest integer and use its index in the comparison array to return the most frequently occurring string.
I hope I am clear enough. I am quite ashamed of the algo myself, but it had to be done. I am sure there is a much smarter way to do this in C.
Have a great Sunday,
Cheers!
Without being good at nice algorithms (Google, Wikipedia and Stackoverflow are good enough for me), one solution that comes out at the top of my head is to sort the array, then use a single loop to go through the entries. As long as the current string is the same as the previous, increase a counter for that string. When done you have a "list" of strings and their occurrence, which can then be sorted if needed.
In most languages, the usual approach would be to construct a hashtable, mapping strings to counts. This has O(N) complexity.
For example, in Python (although usually you would use collections.Counter for this, and even this code can be made more concise using more specialised Python knowledge, but I've made it explicit for demonstration).
def most_common(strings):
counts = {}
for s in strings:
if s not in counts:
counts[s] = 0
counts[s] += 1
return max(counts, key=counts.get)
But in C, you don't have a hashtable in the standard library (although in C++ you can use hash_map from the STL), so a sort and scan can be done instead. It's O(N.log(N)) complexity, which is worse than optimal, but quite practical.
Here's some C (actually C99) code that implements this.
int compare_strings(const void*s0, const void*s1) {
return strcmp((const char*)s0, (const char*)s1);
}
const char *most_common(const char **records, size_t n) {
qsort(records, n, sizeof(records[0]), compare_strings);
const char *best = 0; // The most common string found so far.
size_t max = 0; // The longest run found.
size_t run = 0; // The length of the current run.
for (size_t i = 0; i < n; i++) {
if (!compare_strings(records[i], records[i - run])) {
run += 1;
} else {
run = 1;
}
if (run > max) {
best = records[i];
max = run;
}
}
return best;
}
It seems like the cool way of looping in C# and Java is to use foreach instead of C style for loops.
Is there a reason why I should prefer this style over the C style?
I'm particularly interested in these two cases, but please address as many cases as you need to explain your points.
I wish to perform an operation on each item in a list.
I am searching for an item in a list, and wish to exit when that item is found.
Imagine that you're the head chef for a restaurant, and you're all preparing a huge omelette for a buffet. You hand a carton of a dozen eggs to each of two of the kitchen staff, and tell them to get cracking, literally.
The first one sets up a bowl, opens the crate, grabs each egg in turn - from left to right across the top row, then the bottom row - breaking it against the side of the bowl and then emptying it into the bowl. Eventually he runs out of eggs. A job well done.
The second one sets up a bowl, opens the crate, and then dashes off to get a piece of paper and a pen. He writes the numbers 0 through 11 next to the compartments of the egg carton, and the number 0 on the paper. He looks at the number on the paper, finds the compartment labelled 0, removes the egg and cracks it into the bowl. He looks at the 0 on the paper again, thinks "0 + 1 = 1", crosses out the 0 and writes 1 on the paper. He grabs the egg from compartment 1 and cracks it. And so on, until the number 12 is on the paper and he knows (without looking!) that there are no more eggs. A job well done.
You'd think the second guy was a bit messed in the head, right?
The point of working in a high-level language is to avoid having to describe things in a computer's terms, and to be able to describe them in your own terms. The higher-level the language, the more true this is. Incrementing a counter in a loop is a distraction from what you really want to do: process each element.
Further to that, linked-list type structures can't be processed efficiently by incrementing a counter and indexing in: "indexing" means starting over counting from the beginning. In C, we can process a linked list that we made ourselves by using a pointer for the loop "counter" and dereferencing it. We can do this in modern C++ (and to an extent in C# and Java) using "iterators", but this still suffers from the indirectness problem.
Finally, some languages are high-enough level that the idea of actually writing a loop to "perform an operation on each item in a list" or "search for an item in a list" is appalling (in the same way that the head chef shouldn't have to tell the first kitchen staff member how to ensure that all the eggs are cracked). Functions are provided that set up that loop structure, and you tell them - via a higher-order function, or perhaps a comparison value, in the searching case - what to do within the loop. (In fact, you can do these things in C++, although the interfaces are somewhat clumsy.)
Two major reasons I can think of are:
1) It abstracts away from the underlying container type. This means, for example, that you don't have to change the code that loops over all the items in the container when you change the container -- you're specifying the goal of "do this for every item in the container", not the means.
2) It eliminates the possibility of off-by-one errors.
In terms of performing an operation on each item in a list, it's intuitive to just say:
for(Item item: lst)
{
op(item);
}
It perfectly expresses the intent to the reader, as opposed to manually doing stuff with iterators. Ditto for searching for items.
foreach is simpler and more readable
It can be more efficient for constructions like linked lists
Not all collections support random access; the only way to iterate a HashSet<T> or a Dictionary<TKey, TValue>.KeysCollection is foreach.
foreach allows you to iterate through a collection returned by a method without an extra temporary variable:
foreach(var thingy in SomeMethodCall(arguments)) { ... }
One benefit for me is that it's less easy to make mistakes such as
for(int i = 0; i < maxi; i++) {
for(int j = 0; j < maxj; i++) {
...
}
}
UPDATE:
This is one way the bug happens. I make a sum
int sum = 0;
for(int i = 0; i < maxi; i++) {
sum += a[i];
}
and then decide to aggregate it more. So I wrap the loop in another.
int total = 0;
for(int i = 0; i < maxi; i++) {
int sum = 0;
for(int i = 0; i < maxi; i++) {
sum += a[i];
}
total += sum;
}
Compile fails, of course, so we hand edit
int total = 0;
for(int i = 0; i < maxi; i++) {
int sum = 0;
for(int j = 0; j < maxj; i++) {
sum += a[i];
}
total += sum;
}
There are now at least TWO mistakes in the code (and more if we've muddled maxi and maxj ) which will only be detected by runtime errors. And if you don't write tests... and it's a rare piece of code - this will bite someone ELSE - badly.
That is why it's a good idea to extract the inner loop into a method:
int total = 0;
for(int i = 0; i < maxi; i++) {
total += totalTime(maxj);
}
private int totalTime(int maxi) {
int sum = 0;
for(int i = 0; i < maxi; i++) {
sum += a[i];
}
return sum;
}
and it's more readable.
foreach will perform identically to a for in all scenarios[1], including straightforward ones such as you describe.
However, foreach has certain non-performance-related advantages over for:
Convenience. You do not need to keep an extra local i around (which has no purpose in life other than facilitating the loop), and you do not need to fetch the current value into a variable yourself; the loop construct has already taken care of that.
Consistency. With foreach, you can iterate over sequences which are not arrays with the same ease. If you want to use for to loop over a non-array ordered sequence (e.g. a map/dictionary) then you have to write the code a little differently. foreach is the same in all cases it covers.
Safety. With great power comes great responsibility. Why open opportunities for bugs related to incrementing the loop variable if you don't need it in the first place?
So as we see, foreach is "better" to use in most situations.
That said, if you need the value of i for other purposes, or if you are handling a data structure that you know is an array (and there is an actual specific reason for it being an array), the increased functionality that the more down-to-the-metal for offers will be the way to go.
[1] "In all scenarios" really means "all scenarios where the collection is friendly to being iterated", which would actually be "most scenarios" (see comments below). I really think that an iteration scenario involving an iteration-unfriendly collection would have to be engineered, however.
You should probably consider also LINQ if you are targeting C# as a language, since this is another logical way to do loops.
By perform an operation on each item in a list do you mean modify it in place in the list, or simply do something with the item (e.g. print it, accumulate it, modify it, etc.)? I suspect it is the latter, since foreach in C# won't allow you to modify the collection you are looping over, or at least not in a convenient way...
Here are two simple constructs, first using forand then using foreach, which visit all strings in a list and turn them into uppercase strings:
List<string> list = ...;
List<string> uppercase = new List<string> ();
for (int i = 0; i < list.Count; i++)
{
string name = list[i];
uppercase.Add (name.ToUpper ());
}
(note that using the end condition i < list.Count instead of i < length with some precomputer length constant is considered a good practice in .NET, since the compiler would anyway have to check for the upper bound when list[i] is invoked in the loop; if my understanding is correct, the compiler is able in some circumstances to optimize away the upper bound check it would normally have done).
Here is the foreach equivalent:
List<string> list = ...;
List<string> uppercase = new List<string> ();
foreach (name in list)
{
uppercase.Add (name.ToUpper ());
}
Note: basically, the foreach construct can iterate over any IEnumerable or IEnumerable<T> in C#, not just over arrays or lists. The number of elements in the collection might therefore not be known beforehand, or might even be infinite (in which case you certainly would have to include some termination condition in your loop, or it won't exit).
Here are a few equivalent solutions I can think of, expressed using C# LINQ (and which introduces the concept of a lambda expression, basically an inline function taking an x and returning x.ToUpper () in the following examples):
List<string> list = ...;
List<string> uppercase = new List<string> ();
uppercase.AddRange (list.Select (x => x.ToUpper ()));
Or with the uppercase list populated by its constructor:
List<string> list = ...;
List<string> uppercase = new List<string> (list.Select (x => x.ToUpper ()));
Or the same using the ToList function:
List<string> list = ...;
List<string> uppercase = list.Select (x => x.ToUpper ()).ToList ();
Or still the same with type inference:
List<string> list = ...;
var uppercase = list.Select (x => x.ToUpper ()).ToList ();
or if you don't mind getting the result as an IEnumerable<string> (an enumerable collection of strings), you could drop the ToList:
List<string> list = ...;
var uppercase = list.Select (x => x.ToUpper ());
Or maybe another one with the C# SQL-like from and select keywords, which is fully equivalent:
List<string> list = ...;
var uppercase = from name in list
select name => name.ToUpper ();
LINQ is very expressive and very often, I feel that the code is more readable than a plain loop.
Your second question, searching for an item in a list, and wish to exit when that item is found can also be very conveniently be implemented using LINQ. Here is an example of a foreach loop:
List<string> list = ...;
string result = null;
foreach (name in list)
{
if (name.Contains ("Pierre"))
{
result = name;
break;
}
}
Here is the straightforward LINQ equivalent:
List<string> list = ...;
string result = list.Where (x => x.Contains ("Pierre")).FirstOrDefault ();
or with the query syntax:
List<string> list = ...;
var results = from name in list
where name.Contains ("Pierre")
select name;
string result = results.FirstOrDefault ();
The results enumeration is only executed on demand, which means that effectively, the list will only be iterated until the condition is met, when invoking the FirstOrDefault method on it.
I hope this brings some more context to the for or foreach debate, at least in the .NET world.
As Stuart Golodetz answered, it's an abstraction.
If you're only using i as an index, as opposed to using the value of i for some other purpose like
String[] lines = getLines();
for( int i = 0 ; i < 10 ; ++i ) {
System.out.println( "line " + i + lines[i] ) ;
}
then there's no need to know the current value of i, and being able to just leads to the possibility of errors:
Line[] pages = getPages();
for( int i = 0 ; i < 10 ; ++i ) {
for( int j = 0 ; j < 10 ; ++i )
System.out.println( "page " + i + "line " + j + page[i].getLines()[j];
}
As Andrew Koenig says, "Abstraction is selective ignorance"; if you don't need to know the details of how you iterate some collection, then find a way to ignore those details, and you'll write more robust code.
Reasons to use foreach:
It prevents errors from creeping in (e.g. you forgot to i++ in the for loop) that could cause the loop to malfunction. There are lots of ways to screw up for loops, but not many ways to screw up foreach loops.
It looks much cleaner / less cryptic.
A for loop may not even be possible in some cases (for example, if you have an IEnumerable<T>, which cannot be indexed like an IList<T> can).
Reasons to use for:
These kinds of loops have a slight performance advantage when iterating over flat lists (arrays) because there is no extra level of indirection created by using an enumerator. (However, this performance gain is minimal.)
The object you want to enumerate does not implement IEnumerable<T> -- foreach only operates on enumerables.
Other specialized situations; for example, if you are copying from one array to another, foreach will not give you an index variable that you can use to address the destination array slot. for is about the only thing that makes sense in such cases.
The two cases you list in your question are effectively identical when using either loop -- in the first, you just iterate all the way to the end of the list, and in the second you break; once you have found the item you are looking for.
Just to explain foreach further, this loop:
IEnumerable<Something> bar = ...;
foreach (var foo in bar) {
// do stuff
}
is syntactic sugar for:
IEnumerable<Something> bar = ...;
IEnumerator<Something> e = bar.GetEnumerator();
try {
Something foo;
while (e.MoveNext()) {
foo = e.Current;
// do stuff
}
} finally {
((IDisposable)e).Dispose();
}
If you are iterating over a collection that implements IEnumerable, it is more natural to use foreach because the next member in the iteration is assigned at the same time that the test for reaching the end is done. E.g.,
foreach (string day in week) {/* Do something with the day ... */}
is more straightforward than
for (int i = 0; i < week.Length; i++) { day = week[i]; /* Use day ... */ }
You can also use a for loop in your class's own implementation of IEnumerable. Simply have your GetEnumerator() implementation use the C# yield keyword in the body of your loop:
yield return my_array[i];
Java has both of the loop types you have pointed to. You can use either of the for loop variants depending on your need. Your need can be like this
You want to rerun the index of your search item in the list.
You want to get the item itself.
In the first case you should use the classic (c style) for loop. but in the second case you should use the foreach loop.
The foreach loop can be used in the first case also. but in that case you need to maintain your own index.
If you can do what you need with foreach then use it; if not -- for example, if you need the index variable itself for some reason -- then use for. Simple!
(And your two scenarios are equally possible with either for or foreach.)
one reason not to use foreach at least in java is that it will create an iterator object which will eventually be garbage collected. Thus if you are trying to write code that avoids garbage collection it is better to avoid foreach. However, I believe it is ok for pure arrays because it doesn't create an iterator.
I could think of several reasons
you can't mess up indexes, also in mobile environment you do not have compiler optimizations and lousily written for loop could do several bounderay checks, where as for each loop does only 1.
you can't change data input size (add / remove elements) while iterating it. Your code does not brake that easily. If you need to filter or transform data, then use other loops.
you can iterate over data structures, that can't be accesses by index, but can be crawled over. For each just needs that you implement iterable interface (java) or extend IEnumerable (c#).
you can have smaller boiler plate, for example when parsing XML it's difference between SAX and StAX, first needs in-memory copy of the DOM to refer to an element latter just iterates over data (it is not as fast, but it is memory efficient)
Note that if you are searching for an item in the list with for each, you most likely are doing it wrongly. Consider using hashmap or bimap to skip the searching all together.
Assuming that programmer want's to use for loop as for each using iterators, there exists a common bug of skipping elements. So in that scene it is more safer.
for ( Iterator<T> elements = input.iterator(); elements.hasNext(); ) {
// Inside here, nothing stops programmer from calling `element.next();`
// more then once.
}
Talking about clean code, a foreach statement is much quicker to read than a for statement!
Linq (in C#) can do much the same, but novice developers tend to have a hard time reading them!
It looks like most items are covered... the following are some extra notes that I do not see mentioned regarding your specific questions. These are hard rules as opposed to style preferences one way or the other:
I wish to perform an operation on each item in a list
In a foreach loop, you can not change the value of the iteration variable, so if you are looking to change the value of a specific item in your list you have to use for.
It is also worth noting that the "cool" way is now to use LINQ; there are plenty of resources you can search for if you are interested.
foreach is order of magnitude slower for implementation heavy collection.
I have proof. These are my findings
I used the following simple profiler to test their performance
static void Main(string[] args)
{
DateTime start = DateTime.Now;
List<string> names = new List<string>();
Enumerable.Range(1, 1000).ToList().ForEach(c => names.Add("Name = " + c.ToString()));
for (int i = 0; i < 100; i++)
{
//For the for loop. Uncomment the other when you want to profile foreach loop
//and comment this one
//for (int j = 0; j < names.Count; j++)
// Console.WriteLine(names[j]);
//for the foreach loop
foreach (string n in names)
{
Console.WriteLine(n);
}
}
DateTime end = DateTime.Now;
Console.WriteLine("Time taken = " + end.Subtract(start).TotalMilliseconds + " milli seconds");
And I got the following results
Time taken = 11320.73 milli seconds (for loop)
Time taken = 11742.3296 milli seconds (foreach loop)
A foreach also notifies you if the collection you're enumerating through changes (i.e. you HAD 7 items in your collection...until another operation on a separate thread removed one and now you only have 6 #_#)
Just wanted to add that whoever thinks that foreach gets translated into for and therefore has no performance difference is dead wrong. There are many things that happen under the hood, i.e. the enumeration of the object type which is NOT in a simple for loop. It looks more like an iterator loop:
Iterator iter = o.getIterator();
while (iter.hasNext()){
obj value = iter.next();
...do something
}
which is significantly different than a simple for loop. If you dont understand why, then lookup vtables. Furthermore, who knows whats in the hasNext function? For all we know it could be:
while (!haveiwastedtheprogramstimeenough){
}
now advance
Exageration aside, there are function of unknown implementation and efficiency being called. Since compilers dont optimize accross function boundaries, there is NO optimization happening here, just your simple vtable lookup and function call. This is not just theory, in practice, i have seen significant speedups by switching from foreach to for on the standard C# ArrayList. To be fair, it was an arraylist with about 30,000 items, but still.