I have been tasked, with a homework assignment (I'm not going to sugar-coat it), writing a 32-bit assembly program that uses a loop and indexed addressing to calculate the sum of the gaps between successive array elements, which are in non-decreasing order. (Ex: dwarray dword 0,2,5,9,10)
What I don't know how to do is subtract the nth element of an array from the nth-1 element in the array using a loop. If I did, then I would store the result in a different register and keep adding the results into that register until the last element has been reached. I'm only looking to be pointed in the right direction (I'm not looking for the answer). Does anyone have any suggestions?
Since you will be using a loop you'll need a loop counter equal to the number of elements in the array minus 1.
Convenient instructions would be add eax,[ebx+ecx*4] and sub eax,[ebx+ecx*4-4]
Related
I am trying to count the maximum number of operations in a program, given the pseudocode. I am given the pseudocode for finding the maximum number in an array (provided in our class slides by our professor). Now I need to count maximum possible number of primitive operations (I know usually order is the only thing we need to care about, but nonetheless I want to try this out.)
Some examples of primitive operations considered are:
Evaluating an expression
Assigning a value to a variable
Indexing into an array
Calling a method
Returning from a method, etc...
Now I understand why the first line takes 2 operations, we are indexing into the array A and assigning the value to the variable currentMax, thus there are 2 operations in total being carried out in the first line. I understand that the maximum number of primitive operations (that is the worst case) is going to be 2(n-1) for the 3 lines inside the for loop, but I am having trouble with the line that is labelled as having 2n operations.
In my mind, what is happening is the first time, i=1 is getting assigned and checked against n-1, thus 2 operations, and then the only thing occurring is the checking of i each time against n-1. The incrementing is happening in line where it says {increment counter i}. We aren't assigning any value to i in the for loop line. Thus I think the for loop line should have 2 + n operations instead of 2n.
I found another slide on the internet after a bit of searching that has the same structure, but it says 2+n instead of 2n and the total number of operations is then 7n-1 instead of 8n-2. This is shown here:
Which one is the correct one here?
I'm having some trouble figuring out how to multiply bignums recursively. For this problem, I have to input two strings using fgets and if the string consists of entirely digits, then I am to add the two "numbers" together recursively. I was able to do this part.
The next part is to then multiply the first number input from fgets by a character d, which is supposed to represent the numbers 2-9. I had to do this recursively as well. Lastly, the part I am stuck on is figuring out how to multiply these two numbers recursively.
For adding the two bignums together, I wrote a function that essentially traverses the numbers from right to left, adding the numbers as the recursion progresses and storing the result in a character array that grows over the time of the function. So essentially, my resultant array is in reverse order, meaning that the first element of my array is actually not the first number of the sum, but the last. However, I solved this problem by printing the elements in reverse order using a for loop.
For my recursive multiplication with a char d ranging from 2-9, I had a similar algorithm with the result in reverse order of the actual product, but again, I was able to solve this problem. Now I am having some trouble finding out how to multiply these two bignums together.
The idea I had was that I could somehow use the method of long multiplication/grade-school multiplication by calling my single-digit multiplication algorithm on the first string multiplied by the last element and second-to-last element of the second string, then call my sum function to add them together and store the result in a variable called accumulation then repeat this process in a recursion.
However, some problems I am having are that my single-digit multiplication recursive algorithms return the result in reverse order, so I cannot necessary add the two results together. Also, when multiplying the first string by the second-to-last element of the second string, I would have to multiply the result by 10 (since this digit is in the tens place), but I am not so sure how to implement that either.
Do you guys have any ideas on how I could possibly solve this problem using recursion only (no loops)? The result should look something like this:
First number > 57983985376
Second number > 543647645777735
Sum is 543705629763111
57983985376 times 2 is 115967970752
...
57983985376 times 9 is 521855868384
Product is 31522857142473014386403360
I'm posting this because found this assignment rather difficult, and StackOverflow has helped me countless times. Hopefully this will help someone else out.
Problem Description:
Read in N integers, terminated by a value of zero (zero is not used). Sort these numbers into ascending order and print them out. An error message must be produced if either no data is entered prior to a value of zero, or if too much data is entered and would overflow the array. This program should be able to handle up to 100 integer values.
Approach:
I decided to use a bubble sort, which iterates through each value in the array N times for an array of size N. It looks at each value and compares it to the next value; if the first value is higher than the next, it switches them. this could easily be modified to list them in descending order, as well. Anyway, what was most difficult here was handling a nested loop in nasm, and properly looping with all the ecx values and whatnot. The code I'm posting is well commented. Also, any constructive criticism is welcome, as are questions.
Reflections on this project:
I think there may be a better way to go through elements in the array, rather than using ebx. Masm has a pointer thing that can be used to iterate through the values. This code works and meets the requirements, but it could probably be better. Also, bubble sort may not be the best way to do it. I know there are other sorting algorithms, but bubble seemed the easiest to implement in nasm.
It would have been better to include your solution in the question. Now it seemed like the question had already been answered and did not need anymore attention. Lucky you, here are my thoughts about it:
You have to initialize the array with 101 doublewords because you will have stored that many inputs before jumping to the label tooManyInts.
When decrementing ECX you comment it is because of 0-based indexing. This is not true. It's because a series of N elements requires N-1 comparisons.
I find it hard to keep track of what's on the stack. Might I suggest you comment somewhat like this
push ecx ;(1)
dec ecx
push ecx ;(2)
...
loop L3
pop ecx ;(2)
SetWrite:
xor ebx,ebx
pop ecx ;(1)
tl;dr: What is the fastest way to sort an uint8x16_t?
I need to sort many arrays of exactly 16 unsigned bytes (in descending order, which doesn't matter, of course), and i'm trying to optimize sorting by means of ARM NEON vectorization.
And i find it to be quite a fancy puzzle, as it seems that there "must" exist a short combination of NEON instructions (such as vmax/vpmax/vmin/vpmin, vzip/vuzp) that reliably results in a sorted array.
For example, if we transform a pair (A, B) of two 8-byte arrays into (vpmax(A,B), vpmin(A,B)), we obtain same 16 values, just in different order. If we repeat this operation four times, we reliably have the array maximum in the first cell and the array minimum in the last cell; we cannot be sure about the middle elements though.
Another example: if we first do (C,D)=(vmax(A,B),vmin(A,B)), then we do (E,F)=(vpmax(C,D),vpmin(C,D)), then we do (G,H)=vzip(E,F), then we get our array split into four parts of four bytes, in each part we already know the largest element and the smallest element. Probably the next naive step would be to deinterleave this array to have top four bytes at start of the array (which won't necessary be the top 4 elements of the array, just top bytes of their respective groups) and repeat, not yet sure where it leads at the end.
Is there any known method for this particular problem or for other similar problems (for different array sizes or whatever)? Any ideas are appreciated :)
I need to find the position( or index ) say i of an integer array A of size 100, such that A[i]=0. 99 elements of array A are 1 and only one element is 0. I want the most efficient way solving this problem.(So no one by one element comparison).
Others have already answered the fundamental question - you will have to check all entries, or at least, up until the point where you find the zero. This would be a worst case of 99 comparisons. (Because if the first 99 are ones then you already know that the last entry must be the zero, so you don't need to check it)
The possible flaw in these answers is the assumption that you can only check one entry at a time.
In reality we would probably use direct memory access to compare several integers at once. (e.g. if your "integer" is 32 bits, then processors with SIMD instructions could compare 128 bits at once to see if any entry in a group of 4 values contains the zero - this would make your brute force scan just under 4 times faster. Obviously the smaller the integer, the more entries you could compare at once).
But that isn't the optimal solution. If you can dictate the storage of these values, then you could store the entire "array" as binary bits (0/1 values) in just 100 bits (the easiest would be to use two 64-bit integers (128 bits) and fill the spare 28 bits with 1's) and then you could do a "binary chop" to find the data.
Essentially a "binary chop" works by chopping the data in half. One half will be all 1's, and the other half will have the zero in it. So a single comparison allows you to reject half of the values at once. (You can do a single comparison because half of your array will fit into a 64-bit long, so you can just compare it to 0xffffffffffffffff to see if it is all 1's). You then repeat on the half that contains the zero, chopping it in two again and determining which half holds the zero... and so on. This will always find the zero value in 7 comparisons - much better than comparing all 100 elements individually.
This could be further optimised because once you get down to the level of one or two bytes you could simply look up the byte/word value in a precalculated look-up table to tell you which bit is the zero. This would bring the algorithm down to 4 comparisons and one look-up (in a 64kB table), or 5 comparisons and one look-up (in a 256-byte table).
So we're down to about 5 operations in the worst case.
But if you could dictate the storage of the array, you could just "store" the array by noting down the index of the zero entry. There is no need at all to store all the individual values. This would only take 1 byte of memory to store the state, and this byte would already contain the answer, giving you a cost of just 1 operation (reading the stored value).
You cannot do it better then linear scan - unless the data is sorted or you have some extra data on it. At the very least you need to read all data, since you have no clue where this 0 is hiding.
If it is [sorted] - just access the relevant [minimum] location.
Something tells me that the expected answer is "compare pairs":
while (a[i] == a[i+1]) i += 2;
Although it looks better that the obvious approach, it's still O(n),
Keep track of it as you insert to build the array. Then just access the stored value directly. O(1) with a very small set of constants.
Imagine 100 sea shells, under one is a pearl. There is no more information.
There is really no way to find it faster than trying to turn them all over. The computer can't do any better with the same knowledge. In other words, a linear scan is the best you can do unless you save the position of the zero earlier in the process and just use that.
More trivia than anything else, but if you happen to have a quantum computer this can be done faster than linear.
Grover's algortithm