Why do most/all languages not have a multi-loop break function? - loops

I haven't seen any programming language that has a simple function to break out of nested loops. I'm wondering:
Is there something at a low level making this impossible?
Here is an example of what I mean (C#):
while (true)
{
foreach (String s in myStrings)
{
break 2; // This would break out of both the foreach and while
}
}
Just break; would be like break 1;, and take you out of only the innermost loop.
Some languages have messy alternatives like goto, which work but are not as elegant as the above proposed solution.

No there's nothing on a low level that would prevent creating something like what you propose. And you also have a lot of languages implemented in a highlevel language that definitely would allow for this kind of behaviour.
There are however several considerations when designing a language that would talk against this sort of construct, at least there is from my point of view.
The main argument being that it opens up for a very complex structure in something that is probably allready overly complex. Nested loops are inherently hard to follow and figure out what is happening.
If you'd add your construct it could make them even more complex.
Unless you consider a return statement as a sort of break of course :)

Perl does have something very similar to this feature. Instead of the number of nestings, you use a label much like goto.
FOREVER:while (1) {
for my $string (#strings) {
last FOREVER;
}
}
The feature is intended to remove ambiguities when using loop controls in deeply nested loops. Using a label improves readability and it protects you should your level of nesting change. It reduces the amount of knowledge the inner loops have about the outer loops, though they still have knowledge.
The nesting is also non-lexical, it will cross subroutine boundaries. This is where it gets weird and more goto-like.
FOREVER:while (1) {
for my $string (#strings) {
do_something;
}
}
sub do_something {
last FOREVER;
}
This is not considered a good feature for all the reasons #Codor lays out in their answer. This sort of feature encourages very complex logic. You're nearly always better off restructuring deeply nested loops into multiple subroutine calls.
sub forever {
while (1) {
for my $string (#strings) {
return;
}
}
}
What you're asking for is, essentially, a restricted goto. It carries with it most of the same arguments for and against.
while (1) {
for my $string (#strings) {
goto FOREVER;
}
}
FOREVER:print "I have escaped!\n"
The idea of saying "break out of the nth nested loop" is worse than a goto from a structural perspective. It means inner code has intimate knowledge of its surroundings. Should the nesting change, all of the inner loop controls may break. It creates a barrier to refactoring. For example, should you want to perform an extract method refactoring on the inner loop...
while (1) {
twiddle_strings(#strings);
}
sub twiddle_strings {
for my $string (#strings) {
last 2;
}
}
Now the code controlling the outer loop is in a completely different function from the inner loop. What if the outer loop changes?
while (1) {
while(wait_for_lock) {
twiddle_strings(#strings);
}
}
sub twiddle_strings {
for my $string (#strings) {
last 2;
}
}

PHP has it since version 4. And IMHO it's not very good - it's quite easy to abuse it. Especially when you add levels to iterations or remove some or change code logic inside iterations. Code refactoring / optimization usually begins with iterations overview and trying to reduce cycles to conserve CPU usage. During this kind of optimization it's easy to miss a continue level and finding this kind of introduced bug is not an easy task if multiple people are working on a project.
I'd prefer any time goto since it's (usually) much safer. Unless you know exactly what you are doing. Also, goto keeps BASIC in my mind.

Although this is more speculation, there are some arguments both for and against such a possibility.
Pro:
Might be very elegant in some cases.
Con:
Might result in a tempation to write deeply nested loops, which is seen as undesirable by some.
The desired behaviour can be implemented with goto.
The desired behaviour can be implemented with auxiliary variables.
Nested loops can in most cases be refactored to use just one loop.
The desired behaviour can be implemented using exception handling (although, on the other hand, controlling the expected flow of execution is not the primary task of exception handling).
That being said, I consider resposible usage of goto to be legitimate.

Related

Swift: Do For-In Loop Parameters Optimize Away Needless Accesses?

Context:
Suppose I want to enumerate the NSTableColumn objects of an NSTableView and I do this:
for column: NSTableColumn in someTableView.tableColumns
{
[...]
}
In Objective-C, it was a very well-known performance optimization to avoid unnecessary selector calls in loop parameters, so we'd factor that out like this:
NSArray *columns = [someTableView tableColumns];
for (NSTableColumn *column in columns)
{
[...]
}
Question:
Is this still needed in Swift, or will the compiler now factor out the repetitive access during optimization? Since Array is now a value type, if each loop iteration involved re-creating it, that would seem expensive.
(Apologies if it's been asked before; it's a really hard thing to Google.)

How to manage a large number of variables in C?

In an implementation of the Game of Life, I need to handle user events, perform some regular (as in periodic) processing and draw to a 2D canvas. The details are not particularly important. Suffice it to say that I need to keep track of a large(-ish) number of variables. These are things like: a structure representing the state of the system (live cells), pointers to structures provided by the graphics library, current zoom level, coordinates of the origin and I am sure a few others.
In the main function, there is a game loop like this:
// Setup stuff
while (!finished) {
while (get_event(&e) != 0) {
if (e.type == KEYBOARD_EVENT) {
switch (e.key.keysym) {
case q:
case x:
// More branching and nesting follows
The maximum level of nesting at the moment is 5. It quickly becomes unmanageable and difficult to read, especially on a small screen. The solution then is to split this up into multiple functions. Something like:
while (!finished {
while (get_event(&e) !=0) {
handle_event(state, origin_x, origin_y, &canvas, e...) //More parameters
This is the crux of the question. The subroutine must necessarily have access to the state (represented by the origin, the canvas, the live cells etc.) in order to function. Passing them all explicitly is error prone (which order does the subroutine expect them in) and can also be difficult to read. Aside from that, having functions with potentially 10+ arguments strikes me as a symptom of other design flaws. However the alternatives that I can think of, don't seem any better.
To summarise:
Accept deep nesting in the game loop.
Define functions with very many arguments.
Collate (somewhat) related arguments into structs - This really only hides the problem, especially since the arguments are only loosely related.
Define variables that represent the application state with file scope (static int origin_x; for example). If it weren't for the fact that it has been drummed into me that global variable are usually a terrible idea, this would be my preferred option. But if I want to display two views of the same instance of the Game of Life in the future, then the file scope no longer looks so appealing.
The question also applies in slightly more general terms I suppose: How do you pass state around a complicated program safely and in a readable way?
EDIT:
My motivations here are not speed or efficiency or performance or anything like this. If the code takes 20% longer to run as a result of the choice made here that's just fine. I'm primarily interested in what is less likely to confuse me and cause the least headache in 6 months time.
I would consider the canvas as one variable, containing a large 2D array...
consider static allocation
bool canvas[ROWS][COLS];
or dynamic
bool *canvas = malloc(N*M*sizeof(int));
In both cases you can refer to the cell at position i,j as canvas[i][j]
though for dynamic allocation, do not forget to free(canvas) at the end. You can then use a nested loop to update your state.
Search for allocating/handling a 2d array in C and examples or tutorials... Possibly check something like this or similar? Possibly this? https://www.geeksforgeeks.org/nested-loops-in-c-with-examples/
Also consider this Fastest way to zero out a 2d array in C?

Using "using" statements for every object implementing IDisposable?

I'm currently skimming through some code that reads Active Directory entries and manipulates them. Since I haven't had to do with this kind of stuff, I F12'd the classes (DirectoryEntry, SearchResultCollection, ...), and I found out they all implement the IDisposable interface, but I couldn't see any using blocks in our code.
Are those even necessary in this case (i.e., should I blindly refactor them in)?
Another question of mine regarding this (there are very many instantiated IDisposable objects in the code: Isn't IDisposable making stuff very "ugly" in this case? I mean, I like using statements as they basically free my mind from worrying about things, but in many cases the code has a layout similar to the following:
using (var one = myObject.GetSomeDisposableObject())
using (var two = myObject.GetSomeOtherDisposableObject())
{
one.DoSomething();
using (var foo = new DisposableFoo())
{
MyMethod(foo);
using (...)
using (...)
{
...
}
}
}
I feel that this becomes quite unreadable due to high indentation levels (even stacking the using statements). But extracting some of this into new methods can lead to many parameters that need to be passed, since naturally the "inner" code often needs the objects created in the using statements.
What is an elegant way to solve this without losing readability?
For the first part, this question refers to 'memory used by the task increasing constantly' when not disposing of AD references
For the second, a using block is syntactic sugar for a try/finally with the Dispose call in the finally block, which would be an alternative construct allowing you to dispose of everything in one place, reducing indentation

purpose of if (true)

I've seen some code written like this:
if (true) {
... // do something
}
Why would you want to do something like this? Is there any thing special about this structure?
THanks
Pretty much any modern compiler would just optimize this away. My guess is that someone put it there during development to let them easily remove a block of code (by changing true to false), and either forgot or didn't bother to remove it when they were done.
This is one of many ways to segment out code during testing/development. Many might debate whether or not it is good coding practice, but it can be a quick and handy way to compartmentalize code. It is also a quick way to execute code that follows a complex conditional statement that you want to test.
Might be able to use it like this:
/* if (my_comlex_or_rare_conditional_case) then */
if (true) then
{
lots of code here....
} /*End if */
There have been times where I've added true || or false && inside a condition to force it to execute the branch and test the code - but only during development. The code you've posted doesn't need the if condition.

To use goto or not?

This question may sound cliched, but I am in a situation here.
I am trying to implement a finite state automaton to parse a certain string in C. As I started writing the code, I realised the code may be more readable if I used labels to mark the different states and use goto to jump from one state to another as the case comes.
Using the standard breaks and flag variables is quite cumbersome in this case and hard to keep track of the state.
What approach is better? More than anything else I am worried it may leave a bad impression on my boss, as I am on an internship.
There is nothing inherently wrong with goto. The reason they are often considered "taboo" is because of the way that some programmers (often coming from the assembly world) use them to create "spaghetti" code that is nearly impossible to understand. If you can use goto statements while keeping your code clean, readable, and bug-free, then more power to you.
Using goto statements and a section of code for each state is definitely one way of writing a state machine. The other method is to create a variable that will hold the current state and to use a switch statement (or similar) to select which code block to execute based on the value of the state variable. See Aidan Cully's answer for a good template using this second method.
In reality, the two methods are very similar. If you write a state machine using the state variable method and compile it, the generated assembly may very well resemble code written using the goto method (depending on your compiler's level of optimization). The goto method can be seen as optimizing out the extra variable and loop from the state variable method. Which method you use is a matter of personal choice, and as long as you are producing working, readable code I would hope that your boss wouldn't think any different of you for using one method over the other.
If you are adding this code to an existing code base which already contains state machines, I would recommend that you follow whichever convention is already in use.
Using a goto for implementing a state machine often makes good sense. If you're really concerned about using a goto, a reasonable alternative is often to have a state variable that you modify, and a switch statement based on that:
typedef enum {s0,s1,s2,s3,s4,...,sn,sexit} state;
state nextstate;
int done = 0;
nextstate = s0; /* set up to start with the first state */
while(!done)
switch(nextstate)
{
case s0:
nextstate = do_state_0();
break;
case s1:
nextstate = do_state_1();
break;
case s2:
nextstate = do_state_2();
break;
case s3:
.
.
.
.
case sn:
nextstate = do_state_n();
break;
case sexit:
done = TRUE;
break;
default:
/* some sort of unknown state */
break;
}
I'd use a FSM generator, like Ragel, if I wanted to leave a good impression on my boss.
The main benefit of this approach is that you are able to describe your state machine at a higher level of abstraction and don't need to concern yourself of whether to use goto or a switch. Not to mention in the particular case of Ragel that you can automatically get pretty diagrams of your FSM, insert actions at any point, automatically minimize the amount of states and various other benefits. Did I mention that the generated FSMs are also very fast?
The drawbacks are that they're harder to debug (automatic visualization helps a lot here) and that you need to learn a new tool (which is probably not worth it if you have a simple machine and you are not likely to write machines frequently.)
I would use a variable that tracks what state you are in and a switch to handle them:
fsm_ctx_t ctx = ...;
state_t state = INITIAL_STATE;
while (state != DONE)
{
switch (state)
{
case INITIAL_STATE:
case SOME_STATE:
state = handle_some_state(ctx)
break;
case OTHER_STATE:
state = handle_other_state(ctx);
break;
}
}
Goto isn't neccessary evil, and I have to strongly disagree with Denis, yes goto might be a bad idea in most cases, but there are uses. The biggest fear with goto is so called "spagetti-code", untraceable code paths. If you can avoid that and if it will always be clear how the code behaves and you don't jump out of the function with a goto, there is nothing against goto. Just use it with caution and if you are tempted to use it, really evaluate the situation and find a better solution. If you unable to do this, goto can be used.
Avoid goto unless the complexity added (to avoid) is more confusing.
In practical engineering problems, there's room for goto used very sparingly. Academics and non-engineers wring their fingers needlessly over using goto. That said, if you paint yourself into an implementation corner where a lot of goto is the only way out, rethink the solution.
A correctly working solution is usually the primary objective. Making it correct and maintainable (by minimizing complexity) has many life cycle benefits. Make it work first, and then clean it up gradually, preferably by simplifying and removing ugliness.
I don't know your specific code, but is there a reason something like this:
typedef enum {
STATE1, STATE2, STATE3
} myState_e;
void myFsm(void)
{
myState_e State = STATE1;
while(1)
{
switch(State)
{
case STATE1:
State = STATE2;
break;
case STATE2:
State = STATE3;
break;
case STATE3:
State = STATE1;
break;
}
}
}
wouldn't work for you? It doesn't use goto, and is relatively easy to follow.
Edit: All those State = fragments violate DRY, so I might instead do something like:
typedef int (*myStateFn_t)(int OldState);
int myStateFn_Reset(int OldState, void *ObjP);
int myStateFn_Start(int OldState, void *ObjP);
int myStateFn_Process(int OldState, void *ObjP);
myStateFn_t myStateFns[] = {
#define MY_STATE_RESET 0
myStateFn_Reset,
#define MY_STATE_START 1
myStateFn_Start,
#define MY_STATE_PROCESS 2
myStateFn_Process
}
int myStateFn_Reset(int OldState, void *ObjP)
{
return shouldStart(ObjP) ? MY_STATE_START : MY_STATE_RESET;
}
int myStateFn_Start(int OldState, void *ObjP)
{
resetState(ObjP);
return MY_STATE_PROCESS;
}
int myStateFn_Process(int OldState, void *ObjP)
{
return (process(ObjP) == DONE) ? MY_STATE_RESET : MY_STATE_PROCESS;
}
int stateValid(int StateFnSize, int State)
{
return (State >= 0 && State < StateFnSize);
}
int stateFnRunOne(myStateFn_t StateFns, int StateFnSize, int State, void *ObjP)
{
return StateFns[OldState])(State, ObjP);
}
void stateFnRun(myStateFn_t StateFns, int StateFnSize, int CurState, void *ObjP)
{
int NextState;
while(stateValid(CurState))
{
NextState = stateFnRunOne(StateFns, StateFnSize, CurState, ObjP);
if(! stateValid(NextState))
LOG_THIS(CurState, NextState);
CurState = NextState;
}
}
which is, of course, much longer than the first attempt (funny thing about DRY). But it's also more robust - failure to return the state from one of the state functions will result in a compiler warning, rather than silently ignore a missing State = in the earlier code.
I would recommend you the "Dragon book": Compilers, Principles-Techniques-Tools from Aho, Sethi and Ullman. (It is rather expensive to buy but you for sure will find it in a library). There you will find anything you will need to parse strings and build finite automatons. There is no place I could find with a goto. Usually the states are a data table and transitions are functions like accept_space()
I can't see much of a difference between goto and switch. I might prefer switch/while because it gives you a place guaranteed to execute after the switch (where you could throw in logging and reason about your program). With GOTO you just keep jumping from label to label, so to throw in logging you'd have to put it at every label.
But aside from that there shouldn't be much difference. Either way, if you didn't break it up into functions and not every state uses/initializes all local variables you may end up with a mess of almost spaghetti code not knowing which states changed which variables and making it very difficult to debug/reason about.
As an aside, can you maybe parse the string using a regular expression? Most programming languages have libraries that allow using them. The regular expressions often create an FSM as part of their implementation. Generally regular expressions work for non arbitrarily nested items and for everything else there is a parser generator(ANTLR/YACC/LEX). It is generally much easier to maintain a grammar/regex than the underlying state machine. Also you said you were on an internship, and generally they might give you easier work than say a senior developer, so there is a strong chance that a regex may work on the string. Also regular expressions generally aren't emphasized in college so try using Google to read up on them.

Resources