Loops, "have n items to process but only n-1 update steps" - loops

Consider the following loop:
marker_stream = 0
for character in input_file:
if character != ',':
marker_stream |= 1
marker_stream <<= 1
For each character in input_file_contents, this loop does a processing step, stores the result of the processing step (either a 0 or 1 bit) in marker_stream, and then shifts marker_stream over by one position to prepare for the next iteration.
Here's the problem: I want to process each character in the input file, but I only want to shift marker_stream number of characters in the input file - 1 times. The loop above shifts marker_stream one too many times.
Now, I know I could add marker_stream >>= 1 after the for loop, or I could maintain some flag that says whether or not the character we're currently processing is the last character in the file, but neither of these solutions seem that great. The flag solution involves flags (yuck), and the extra line solution could be confusing if the processing loop was longer.
I'm looking for a more elegant solution to this problem, and, more generally the "I have n items to process but have an update step I want to run only n-1 times" problem.

Process the first element in the file separately; treat the file as a one-entry head with a tail holding the rest of the entries:
// Process head
entry <- readNext(inFile)
write(entry, outFile)
// Process tail
while (NOT inFile.endOfFile)
write(separator, outFile)
entry <- readNext(inFile)
write(entry, outFile)
endwhile
The head entry does not follow a separator; the tail entries all follow a separator. You get your 'n-1' effect by treating the single head entry differently from the n-1 tail entries in the file.

Related

Why won't my python code execute the loop to pick random key words?

I am creating code for a class project and part of it is that I need to find out if a set of randomly selected key words has a collective character length between two desired values. The issue I'm having is that Python stops executing as soon as it reaches the loop section.
here is the whole code, I am having trouble with the part after the second user input.
import random #allows us to generate random numbers elsewhere in the code
set1=[] # creates empty sets
print("type genPassword() to initiate the program")
def genPassword():
print("Answer the following questions in order. What is your first pet’s name? What is your favorite word? What was your first car? What city/town were you born in? What was the name of your first partner? dont forget to press enter after each word, no spaces.")
for c in range(0,5): #allows the user to add elements to the list, each element has to be on a seperate line
ele=input()
set1.append(ele)
print(set1)#displays the current set, currently used for debugging purposes, making sure the code works
minlen=int(input("what is the minimum length you would like the password to be?: "))
maxlen=int(input("what is the max length you would like your password to be? (must be more than the shortest word you input in previous section): "))
passlen=0
while minlen >= passlen >= maxlen:
set2=[] #empties set 2
amnt = random.randint (1,5) #selects a random number for the anount of keywords to use
for f in range(0,amnt):
keys=random.sample(set1,1) #selects a random key word
set2.append(keys) #adds word to set 2
print(set2)#shows the words it chose
set_amnt=len(set2) #how many words in the set
iteration=0
string_len=0
for i in range(0,set_amnt):
word_len=len(set2[iteration][0]) #takes the length of each word and adds it to the lenght of the whole string
iteration=iteration+1
string_len=string_len+word_len
print(string_len) #shows how long the string is (how many letters it is total)
passlen=string_len
I tried switching the loop types and adjusting where variables are in the code, but neither of those things worked and I cant think of what else to try to make it work. I even took the section out of a looping statement and it works then but it for some reason is having trouble with the loop. I expect it to select a random amount of words and tell me how long the whole string is and what words it picked out of a set of five words and then if the string is too long or too short it repicks and does it again and again until the string is within the accepted range of values for character length, printing the length and what words it chose each iteration.
Edit: just tidying up this answer, since its been changed a few times based on my discussion in comments with the OP
This is your problem:
minlen=int(input("what is the minimum length you would like the password to>
maxlen=int(input("what is the max length you would like your password to be>
passlen=0
print ("Entering while loop ", minlen, passlen, maxlen)
while minlen >= passlen >= maxlen:
The passlen is set to zero, in the line above the while - so the condition is never met, as shown in my example output:
['Happy', 'Blue', 'Citroen', 'Liverpool', 'Rachel']
what is the minimum length you would like the password to be?: 5
what is the max length you would like your password to be? (must be more than the shortest word you input in previous section): 19
Entering while loop 5 0 19
I think you are logically trying to say:
if maxlen is bigger than minlen and minlen is bigger than passlen, do the loop.
But what you are actually saying is
if passlen is bigger than minlen and if minlen is bigger than maxlen then do the loop`
so the while line condition becomes (with my sample data):
while 5 >= 0 >= 19:
# some code that will never get executed unless they enter 0 for minimum password length
Firstly, you should re-write the logic of the while condition test, so it does what you intend.
while maxlen >= minlen >= passlen:
Should do the trick.
If that is not what you are intending, then unless you completely refactor the while (and maybe even the for) loop, and what you initially wrote was intended - then the goal you are aiming for is to have the code executed at least once, because you are setting passlen to a meaningful value only the end of the loop, so you only get a meaningful value of passlen after the loop executes once.
When examining logic, I ask myself
Are you sure the logic of the condition statement is correct?
Have a think, what am I trying to achieve with the conditional block/loop,
does my condition test do what I intended to achieve that goal? and
is it better served a different way.
Finally, I would add an if before the while, to test user input. If the min len is bigger than the max len, tell the user they made a boo-boo and exit.
This executes fine, and also goes around the for loop (I think 5? times on the first pass):
minlen=int(input("what is the minimum length you would like the password to>
maxlen=int(input("what is the max length you would like your password to be>
passlen=0
if minlen >= maxlen:
print ("minimum length is greater than or equal to maximum length! Abort")
exit()
while maxlen >= minlen >= passlen:
# the rest of your code
Some stuff I did not want to delete but is no longer the answer
edit: My original answer (before I tidied it up) went off on a tangent with a bunch of waffle about implementing a do code...while test (a do foo until bar) loop for python, since there is no in-built one. I subsequently spotted the (in hindsight obvious) logic error and so don't necessarily need this tangent, but the text might be useful, so rather than delete it, I'm just plopping it here at the end:
How do we get one run of the loop before exiting? Instead of a while do while test do code, we need a do until do code while test which python does not have!
There is a fudge-y workaround to get do-until functionality into python, as detailed here, which goes along the lines of:
while true:
do code
if condition X:
break
This is checking the condition manually at the end of the loop, so that it always gets at least one pass. If you don't like the break, and I am not keen either, you could have
boolTest = true
while boolTest
# do code
if condition_to_test:
boolTest = false
where the boolTest being set to false will cause the while loop to terminate, without having to use a break (I don't know about you, but when I see break - especially if there are several - I instantly am reminded of BBC BASIC and my heroic use of goto as a child!

Could someone explain me what does this line of code mean?

I was wondering what this code really means. I mean, I would like to know what it does, in what order, and what do ? and : signs mean; all explained.
printf(" %c", ((sq & 8) && (sq += 7)) ? '\n' : pieces[board[sq] & 15]);
Thanks.
The first argument, " %c", means that printf needs to print out a character.
The second argument is the character that the function prints.
In this case, the second argument is a ternary operator. You can read the link provided, but in short, it's basically a short-hand for an if-else block. This is what it looks like in your example:
((sq & 8) && (sq += 7)) ? '\n' : pieces[board[sq] & 15]
Let's separate it into three parts:
((sq & 8) && (sq += 7))
'\n'
pieces[board[sq] & 15]
The first part is a condition (if);
this expression (sq & 8) uses what is called a bitwise AND operation (read more here). Basically, 8 in binary is 1000 and that part checks whether sq has a 1 in that position (it can be 1000, 11000, 101000 etc.); if it does, that expression equals 8 (any number bigger than zero means true) and if it doesn't, it equals 0 (which means false).
&& means AND, it just means that both left and right expression need to be true
sq += 7 will add 7 to sq and if it's not 0, it is true.
The second part \n is returned (and in your case printed out) if the condition is true; else the third part will be printed out (pieces[board[sq] & 15]).
This is fairly obfuscated code, so it's best to try to understand it in the context in which it appears. By obfuscating it in this way, the auther is trying to tell you "you don't really need to understand the details". So lets try to understand what this does from the 'top down' inferring the details of the context, rather than bottom up.
printf prints -- in this case " %c", which is a space and a single character. The single character will either be (from the ?-: ternary expression)
a newline '\n'
a piece from space sq on the board
which it will be depends on the condition before the ? -- it first tests a single bit of sq (the & 8 does a bitwise and with a constant with one set bit), and if that bit is set, adds 7 to sq and prints the newline1, while if it is not set, will print the piece.
So now we really need to know the context. This is probably in a loop that starts with sq = 0 and increments sq each time in the loop (ie, something like for (int sq = 0; ...some condition...; ++sq)). So what it is doing is printing out the pieces on some row of the board, and when it gets to the end of the row, prints a newline and goes on to the next row. A lot of this depends on how exactly the board array is organized -- it would seem to be a 1D array with a 2D board embedded in it; the first row at indexes 0..7, the second at indexes 16..23, the third at indexes 32..39 and so on2.
1technically, when the bit is set, it tests the result of adding 7, but that will be true unless sq was -7, which is probably impossible from the context (a loop that starts at 0 and only increments from there).
2The gaps here are inferred from the test in the line of code -- those indexes with bit 3 set (for which sq & 8 will be true) are not valid board spaces, but are instead "gaps" between the rows. They might be used for something else, elsewhere in the code
Ok, thank you all! I've looked at it and it now works as expected. Thanks!

Reading only odd/even line from file C [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
Task is to read some random line (odd or even, I'll specify it) from txt file, and user just writes absolutely the same, and if it is correct (the same) program'll print success. Problem is that I don't know how to read only odd or only even line# and it should be randomly generated as well (such odd or even number). Thanks
The usual way to loop and ignore the even items is...
for( int i = 0; thing = get_thing(); i++ ) {
/* It's even, skip it */
if( abs(i % 2) == 0 )
continue;
...do stuff with thing...
}
abs(i % 2) == 0 checks for even numbers, abs(i % 2) == 1 checks for odd. If you want to select one or the other, use int parity and abs(i % 2) == parity.
Efficiently selecting a random line out of a file takes a small amount of math. You must read the file once, because it's impossible to know where lines start and end otherwise.
An inefficient algorithm would read in all N lines then pick one, potentially consuming a lot of memory. A more efficient algorithm would just store where each line starts. Both are O(n) memory (meaning memory usage grows as the file size grows). But there's an O(1) memory algorithm which only has to store one line.
Each line has a 1/N chance it will be selected. The efficient algorithm applies this to each line as it's read in but N is the number of lines read so far. In pseudo-code...
int num_lines = 0;
char *picked_line = NULL;
while( line = getline() ) {
num_lines++;
if( roll(num_lines) == num_lines ) {
picked_line = line;
}
}
So the first line has a 1/1 chance of being selected. The second line has a 1/2 chance of being selected. The third has a 1/3 chance, and so on.
Why does this add up to each line having a 1/N chance? Well, let's walk through the first line being selected out of three.
1st line has a 1/1 chance.
It is picked. The 1st line's total odds are now 1.
2nd line has a 1/2 chance.
1st line, the picked one, has a 1/2 chance of surviving.
1st line's odds of being picked are now 1 * 1/2 or 1/2.
3rd line has a 1/3 chance.
1st line, the picked one, has a 2/3 chance of surviving.
1st line's odds of being picked are now 1 * 1/2 * 2/3 or 2/6 or 1/3.
You can do the same for the 2nd line and 3rd lines to prove to yourself this works.

very large loop counts in c

How can I run a loop in c for a very large count in c for eg. 2^1000 times?
Also, using two loops that run a and b no. of times, we get a resultant block that runs a*b no. of times. Is there any smart method for running a loop a^b times?
You could loop recursively, e.g.
void loop( unsigned a, unsigned b ) {
unsigned int i;
if ( b == 0 ) {
printf( "." );
} else {
for ( i = 0; i < a; ++i ) {
loop( a, b - 1 );
}
}
}
...will print a^b . characters.
While I cannot answer your first question, (although look into libgmp, this might help you work with large numbers), a way to perform an action a^b times woul be using recursion.
function (a,b) {
if (b == 0) return;
while (i < a) {
function(a,b-1);
}
}
This will perform the loop a times for each step until b equals 0.
Regarding your answer to one of the comments: But if I have two lines of input and 2^n lines of trash between them, how do I skip past them? Can you tell me a real life scenario where you will see 2^1000 lines of trash that you have to monitor?
For a more reasonable (smaller) number of inputs, you may be able to solve what sounds to be your real need (i.e. handle only relevant lines of input), not by iterating an index, but rather by simply checking each line for the relevant component as it is processed in a while loop...
pseudo code:
BOOL criteriaMet = FALSE;
while(1)
{
while(!criteriaMet)
{
//test next line of input
//if criteria met, set criteriaMet = TRUE;
//if criteria met, handle line of input
//if EOF or similar, break out of loops
}
//criteria met, handle it here and continue
criteriaMet = FALSE;//reset for more searching...
}
Use a b-sized array i[] where each cell hold values from 0 to a-1. For example - for 2^3 use a 3-sized array of booleans.
On each iteration. Increment i[0]. If a==i[0], set i[0] to 0 and increment i[1]. If 0==i[1], set i[1] to 0 and increment i[2], and so on until you increment a cell without reaching a. This can easily be done in a loop:
for(int j=0;j<b;++j){
++i[j];
if(i[j]<a){
break;
}
}
After a iterations, i[0] will return to zero. After a^2 iterations, i[0],i[1] will both be zero. AFter a^b iterations, all cells will be 0 and you can exit the loop. You don't need to check the array each time - the moment you reset i[b-1] you know the all the array is back to zero.
Your question doesn't make sense. Even when your loop is empty you'd be hard pressed to do more than 2^32 iterations per second. Even in this best case scenario, processing 2^64 loop iterations which you can do with a simple uint64_t variable would take 136 years. This is when the loop does absolutely nothing.
Same thing goes for skipping lines as you later explained in the comments. Skipping or counting lines in text is a matter of counting newlines. In 2006 it was estimated that the world had around 10*2^64 bytes of storage. If we assume that all the data in the world is text (it isn't) and the average line is 10 characters including newline (it probably isn't), you'd still fit the count of numbers of lines in all the data in the world in one uint64_t. This processing would of course still take at least 136 years even if the cache of your cpu was fed straight from 4 10Gbps network interfaces (since it's inconceivable that your machine could have that much disk).
In other words, whatever problem you think you're solving is not a problem of looping more than a normal uint64_t in C can handle. The n in your 2^n can't reasonably be more than 50-55 on any hardware your code can be expected to run on.
So to answer your question: if looping a uint64_t is not enough for you, your best option is to wait at least 30 years until Moore's law has caught up with your problem and solve the problem then. It will go faster than trying to start running the program now. I'm sure we'll have a uint128_t at that time.

How would I find an infinite loop in an array of pointers?

I have an array of pointers (this is algorithmic, so don't go into language specifics). Most of the time, this array points to locations outside of the array, but it degrades to a point where every pointer in the array points to another pointer in the array. Eventually, these pointers form an infinite loop.
So on the assumption that the entire array consists of pointers to another location in the array and you start at the beginning, how could you find the length of the loop with the highest efficiency in both time and space? I believe the best time efficiency would be O(n), since you have to loop over the array, and the best space efficiency would be O(1), though I have no idea how that would be achieved.
Index: 0 1 2 3 4 5 6
Value: *3 *2 *5 *4 *1 *1 *D
D is data that was being pointed to before the loop began. In this example, the cycle is 1, 2, 5 and it repeats infinitely, but indices 0, 3, and 4 are not part of the cycle.
This is an instance of the cycle-detection problem. An elegant O(n) time O(1) space solution was discovered by Robert W. Floyd in 1960; it's commonly known as the "Tortoise and Hare" algorithm because it consists of traversing the sequence with two pointers, one moving twice as fast as the other.
The idea is simple: the cycle must have a loop with length k, for some k. At each iteration, the hare moves two steps and the tortoise moves one, so the distance between them is one greater than it was in the previous iteration. Every k iterations, therefore, they are a multiple of k steps apart from each other, and once they are both in the cycle (which will happen once the tortoise arrives), if they are a multiple of k steps apart, they both point at the same element.
If all you need to know is the length of the cycle, you wait for the hare and the tortoise to reach the same spot; then you step along the cycle, counting steps until you get back to the same spot again. In the worst case, the total number of steps will be the length of the tail plus twice the length of the cycle, which must be less than twice the number of elements.
Note: The second paragraph was edited to possibly make the idea "more obvious", whatever that might mean. A formal proof is easy and so is an implementation, so I provided neither.
Make a directed graph of the elements in the array where a node points to another node if the element of the node points to the element of the node its pointing to and for each node. keep track of the indegree of the node(number of pointers pointing to it.) While making your graph, if there is a node with indegree == 2, then that node is part of an infinite cycle.
The above fails if the first element is included in the infinite cycle, so before the algorithm starts, add 1 indegree to the first element to resolve this.
The array becomes, as you describe it, a graph (more properly a forrest) where each vertex has out-degree of exactly one. The components of such a graph can only consist of chains that each possibly end in a single loop. That is, each component is either shaped like an O or like a 6. (I am assuming no pointers are null, but this is easy to deal with. You end up with 1-shaped components with no cycles at all.)
You can trace all these components by "visiting" and keeping track of where you've been with a "visited" hash or flags array.
Here's an algorithm.
Edit It just DFS of a forrest simplified for the case of one child per node, which eliminates the need for a stack (or recursion) because backtracking is not needed.
Let A[0..N-1] be the array of pointers.
Let V[0..N-1] be an array of boolean "visited" flags, initially false.
Let C[0..N-1] be an array if integer counts, initially zero.
Let S[0..N-1] be an array of "step counts" for each component trace.
longest = 0 // length of longest cycle
for i in 0..N-1, increment C[j] if A[i] points to A[j]
for each k such that C[k] = 0
// No "in edges", so must be the start of a 6-shaped component
s = 0
while V[k] is false
V[k] = true
S[k] = s
s = s + 1
k index of the array location that A[k] points to
end
// Loop found. Length is s - S[k]
longest = max(longest, s - S[k])
end
// Rest of loops must be of the O variety
while there exists V[k] false
Let k be such that V[k] is false.
s = 0
while V[k] is false
V[k] = true
s = s + 1
k index of the array location that A[k] points to
end
// Found loop of length s
longest = max(longest, s)
end
Space and execution time are both proportional to size of the input array A. You can get rid of the S array if you're willing to trace 6-shaped components twice.
Addition I fully agree that if it's not necessary to find the cycle of maximum size, then the ancient "two pointer" algorithm for finding cycles in a linked list is superior, since it requires only constant space.

Resources