The code for a calculator I made doesn't act like I thought it would? (Python 3.x) - arrays

I'm actively learning Python at the moment. I've already had some experience with code, but not nearly enough that I would call myself a good coder (or even a competent one tbh).
I tried to create a (pretty) simple calculator script. I wanted to make sure the user could choose how many different values (s)he wanted to calculate together. To do that I created a while loop.
uArray = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
def divide():
uAmount = int(input("How many different values do you want to add together? (max 10) "))
if uAmount <= 10:
for amount in range(0, uAmount):
uArray[amount] = int(input("enter a number: "))
else:
print("ERROR\nMax 10 different values supported")
return 1
global uTotal
uTotal = 1
for amount1 in range(0, (uAmount - 1)):
uTotal /= uArray[amount1]
print("The result is: " + str(uTotal))
I know this code might look REAL ugly to a lot af you and I'm sure that the same process could be done way easier and simpler if I knew how.
I just can't figure out why my current method doesn't work, even after trying to google it.
EXAMPLE: If I choose to use 2 different values. And I make those values 50 and 2, it should give 25 of course. But it gives 0.02 instead.
Thanks in advance to anyone willing to help! (and sorry if this is a noob question ahaha)

I just can't figure out why my current method doesn't work
Simple. You start with uTotal equal to 1. Then you divide by each of your numbers (except the last). Since you only have two numbers, 50 and 2 and you don't use the second one (due to range(0, uAmount - 1)), the whole calculation equals to this:
1 / 50 # => 0.02
How to fix?
Instead of setting uTotal to 1, set it to the value of the first element. Then apply your operation (division, in this case), using all other elements (except the first).
Array unpacking syntax may come in handy here:
total, *rest = uArray
for operand in rest:
total /= operand

Currently it's taking 1 and dividing it by your first input, which is 50.
Note that the for loop won't iterate like you were thinking. Need to remove the '- 1'
Here's a version:
uArray = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
def divide():
uAmount = int(input("How many different values do you want to add together? (max 10) "))
if uAmount <= 10:
for amount in range(0, uAmount):
uArray[amount] = int(input("enter a number: "))
else:
print("ERROR\nMax 10 different values supported")
return 1
global uTotal
uTotal = uArray[0]
for amount1 in range(1, (uAmount)):
uTotal /= uArray[amount1]
print("The result is: " + str(uTotal))

Related

How can I update a range within an array with a sequence

Given an array of values, how can I update a range with a sequence within that array, efficiently?
Updates are performed multiple times. After all updates are performed, we can query any index of the array for its final value.
If we update a value of v at index i, every element at index j is increased with a value of max { v - | i - j | , 0 }
For example.
array = {1,1,1,1,1,1}
Now I do an update at index 4 with a value of 3 the resulting array will look like this:
array = {1,1,2,3,4,3}
I want to perform both operations efficiently.
You can't update a range of elements "efficiently". Questions like these are always about figuring out how to avoid updating a range of elements altogether.
To figure out this one, consider two operations:
INTEGRATE(A) takes an array and replaces every element A[i] with sum(A[0]...A[i]).
DIFF(A) takes an array and replaces every element with its difference from the previous element (the first element is left unaltered).
These operations have some important properties:
They are inverses: INTEGRATE(DIFF(A)) = DIFF(INTEGRATE(A)) = A for all arrays A; and
They are linear: If A = B+C, then INTEGATE(A) = INTEGRATE(B) + INTEGRATE(C), and similarly for DIFF.
Your final array is the sum of the original array, plus a whole bunch of those "triangle" arrays. Let's say it's A + T1 + T2 + T3... etc.
Each one of those triangles has a whole bunch of non-zero elements, but watch what happens when you apply DIFF twice:
[0,0,1,2,3,2,1,0,0] -> [0,0,1,1,1,-1,-1,-1,0] -> [0,0,1,0,0,-2,0,0,1]
The result has only 3 non-zero elements. That gives us a way to calculate your final array quickly.
Let D(X) = DIFF(DIFF(X)) and let I(X) = INTEGRATE(INTEGRATE(X)). Then instead of calculating A + T1 + T2 + T3..., you calculate I( D(A) + D(T1) + D(T2) + D(T3)... )
Since all those D(Tx) have at most 3 non-zero elements, it's quick and easy to add them into the result.
I'm deliberately explaining how to solve it, without giving you full code. This also handles the complex case of interleaved updates and lookups, but therefore is more complex than what Matter Timmermans came up with.
You obviously can't use an array as your representation. It makes lookups fast, but an update with value k will be an O(k) operation.
Our second try, is to just have a list of the updates. Now updates are O(1), but after m updates a lookup is O(m).
What we need is to have a way to store updates such that both adding an update and doing a lookup are fast.
The first step is to change an update from "update at a value" to "update a range by a linear rule". That is currently you say:
update at 4 by 3
Instead we'd say:
from 2 to 3:
update by x - 2
from 4 to 5:
update by 7 - x
This isn't yet a win. But it becomes one when you rewrite the ranges in terms of a standard set of intervals. First the original array
from 0 to 5 1 + 0x
Now the array after update:
from 0 to 5, 1 + 0x +
from 2 to 3, -1 + x
from 4 to 5, 7 - x
This can be represented compactly in 2 arrays:
m = [0, 0, 1, 0, -1, 0]
b = [1, 0, -1, 0, 7, 0]
And as complicated as it feels, now both updates and lookups wind up with O(log(n)) work.
For example for a lookup:
def rising_binary (n):
power = 1
m = 0
yield m
while m < n:
if n & power:
m += power
yield m
power *= 2
...
answer = 0
for bin in rising_binary(k):
answer += m[bin] * k + b[bin]

Need help optimizing this array builder

My current code has two major bottlenecks, one I can improve for sure, but this one has me stuck. It eats up roughly 50% of my run time, and only gets worse.
What should it do?
It should take an array (a walk) from Walks and break it into two new arrays, A and B. The rules look a bit odd, but I'm sure they're straightforward enough.
Each walk should have even-N non-negative integers, and a pair is simply a list of 2 lists of integers, each list also being length N.
L is N/2.
#example pair: [[1,2,5,6,-4,-1],[8,12,-3,7,4,9]]
#example walks:[[1,0,2,5,3,1]] just 1 walk in this example. Could be k many.
#L = 3
newpairs=[]
for walk in walks:
Anew = [0 for j in range(2*L)]
Bnew = [0 for j in range(2*L)]
for r in range(L):
Anew[r] = int((pair[0][r]+walk[r])/2)
Anew[r+L] = int((pair[0][r]-walk[r])/2)
Bnew[r] = int((pair[1][r]+walk[r+L])/2)
Bnew[r+L] = int((pair[1][r]-walk[r+L])/2)
newpair = [Anew,Bnew]
newpairs.append(newpair)
#output:[[[1, 1, 3, 0, 1, 1], [6, 7, -1, 1, 4, -2]]]
I realize this may be a shot in the dark, but I'm happy to answer any questions to further clarify aspects of the code. My project cannot go much further without optimizing this piece. Its blowing up run times by over 50% and will only get worse as I push bigger sets through.
Your algorithm seems simple enough and doesn't have any glaring performance mistakes. You probably won't be reducing the run time by an order of magnitude or anything like it. There are some smaller optimizations you can do, though.
1) Use list multiplication notation for initializing your Anew and Bnew lists. Replace this:
Anew = [0 for j in range(2*L)]
Bnew = [0 for j in range(2*L)]
with this:
Anew = [0]*2*L
Bnew = [0]*2*L
Benchmarking:
>>> timeit.timeit('[0 for x in range(300)]')
7.822149500000023
>>> timeit.timeit('[0]*300')
0.8999562000000196
2) Use floor division. Replace
Anew[r] = int((pair[0][r]+walk[r])/2)
and similar lines, with this:
Anew[r] = (pair[0][r]+walk[r])//2
Benchmark:
>>> timeit.timeit('[int((x+y)/2) for x in range(-5,5) for y in range(-5,5)]')
23.69675469999993
>>> timeit.timeit('[(x+y)//2 for x in range(-5,5) for y in range(-5,5)]')
11.680407500000001
Beyond that, you might want to look into using numpy as it's almost always faster than the standard library for working with lists/arrays.

Having trouble randomly generating numbers in multidimensional arrays

I'm trying to generate coordinates in a mulidimensional array.
the range for each digit in the coords is -1 to 1. <=> seems like the way to go comparing two random numbers. I'm having trouble because randomizing it takes forever, coords duplicate and sometimes don't fill all the way through. I've tried uniq! which only causes the initialization to run forever while it tries to come up with the different iterations.
the coords look something like this. (-1, 0, 1, 0, 0)
5 digits give position. I could write them out but I'd like to generate the coords each time the program is initiated. The coords would then be assigned to a hash tied to a key. 1 - 242.
I could really use some advice.
edited to add code. It does start to iterate but it doesn't fill out properly. Short of just writing out an array with all possible combos and randomizing before merging it with the key. I can't figure out how.
room_range = (1..241)
room_num = [*room_range]
p room_num
$rand_loc_cords = []
def Randy(x)
srand(x)
y = (rand(100) + 1) * 1500
z = (rand(200) + 1) * 1000
return z <=> y
end
def rand_loc
until $rand_loc_cords.length == 243 do
x = Time.new.to_i
$rand_loc_cords.push([Randy(x), Randy(x), Randy(x), Randy(x), Randy(x)])
$rand_loc_cords.uniq!
p $rand_loc_cords
end
#p $rand_loc_cords
end
rand_loc
You are trying to get all possible permutations of -1, 0 and 1 with a length of 5 by sheer luck, which can take forever. There are 243 of them (3**5) indeed:
coords = [-1,0,1].repeated_permutation(5).to_a
Shuffle the array if the order should be randomized.

Plotting moving average with numpy and csv

I need help plotting a moving average on top of the data I am already able to plot (see below)
I am trying to make m (my moving average) equal to the length of y (my data) and then within my 'for' loop, I seem to have the right math for my moving average.
What works: plotting x and y
What doesn't work: plotting m on top of x & y and gives me this error
RuntimeWarning: invalid value encountered in double_scalars
My theory: I am setting m to np.arrays = y.shape and then creating my for loop to make m equal to the math set within the loop thus replacing all the 0's to the moving average
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import csv
import math
def graph():
date, value = np.loadtxt("CL1.csv", delimiter=',', unpack=True,
converters = {0: mdates.strpdate2num('%d/%m/%Y')})
fig = plt.figure()
ax1 = fig.add_subplot(1,1,1, axisbg = 'white')
plt.plot_date(x=date, y=value, fmt = '-')
y = value
m = np.zeros(y.shape)
for i in range(10, y.shape[0]):
m[i-10] = y[i-10:1].mean()
plt.plot_date(x=date, y=value, fmt = '-', color='g')
plt.plot_date(x=date, y=m, fmt = '-', color='b')
plt.title('NG1 Chart')
plt.xlabel('Date')
plt.ylabel('Price')
plt.show()
graph ()
I think that lmjohns3 answer is correct, but you have a couple of problems with your moving average function. First of all, there is the indexing problem the lmjohns3 pointed out. Take the following data for example:
In [1]: import numpy as np
In [2]: a = np.arange(10)
In [3]: a
Out[3]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Your function gives the following moving average values:
In [4]: for i in range(3, a.shape[0]):
...: print a[i-3:i].mean(),
1.0 2.0 3.0 4.0 5.0 6.0 7.0
The size of this array (7) is too small by one number. The last value in the moving average should be (7+8+9)/3=8. To fix that you could change your function as follows:
In [5]: for i in range(3, a.shape[0] + 1):
...: print a[i-3:i].sum()/3,
1 2 3 4 5 6 7 8
The second problem is that in order to plot two sets of data, the total number of data points needs to be the same. Your function returns a new set of data that is smaller than the original data set. (You maybe didn't notice because you preassigned a zeros array of the same size. Your for loop will always produce an array with a bunch of zeros at the end.)
The convolution function gives you the correct data, but it has two extra values (some at each end) because of the same argument, which ensures that the new data array has the same size as the original.
In [6]: np.convolve(a, [1./3]*3, 'same')
Out[6]:
array([ 0.33333333, 1. , 2. , 3. , 4. ,
5. , 6. , 7. , 8. , 5.66666667])
As an alternate method, you could vectorize your code by using Numpy's cumsum function.
In [7]: (cs[3-1:] - np.append(0,cs[:-3]))/3.
Out[7]: array([ 1., 2., 3., 4., 5., 6., 7., 8.])
(This last one is a modification of the answer in a previous post.)
The trick might be that you should drop the first values of your date array. For example use the following plotting call, where n is the number of points in your average:
plt.plot_date(x=date[n-1:], y=m, fmt = '-', color='b')
The problem here lives in your computation of the moving average -- you just have a couple of off-by-one problems in the indexing !
y = value
m = np.zeros(y.shape)
for i in range(10, y.shape[0]):
m[i-10] = y[i-10:1].mean()
Here you've got everything right except for the :1]. This tells the interpreter to take a slice starting at whatever i-10 happens to be, and ending just before 1. But if i-10 is larger than 1, this results in the empty list ! To fix it, just replace 1 with i.
Additionally, your range needs to be extended by one at the end. Replace y.shape[0] with y.shape[0]+1.
Alternative
I just thought I'd mention that you can compute the moving average more automatically by using np.convolve (docs) :
m = np.convolve(y, [1. / 10] * 10, 'same')
In this case, m will have the same length as y, but the moving average values might look strange at the beginning and end. This is because 'same' effectively causes y to be padded with zeros at both ends so that there are enough y values to use when computing the convolution.
If you'd prefer to get only moving average values that are computed using values from y (and not from additional zero-padding), you can replace 'same' with 'valid'. In this case, as Ryan points out, m will be shorter than y (more precisely, len(m) == len(y) - len(filter) + 1), which you can address in your plot by removing the first or last elements of your date array.
Okay, either I'm going nuts or it actually worked - I compared my chart vs. another chart and it seemed to have worked.
Does this make sense?
m = np.zeros(y.shape)
for i in range(10, y.shape[0]):
m[i-10] = y[i-10:i].mean()
plt.plot_date(x=date, y=m, fmt = '-', color='r')

Algorithm to find "most common elements" in different arrays

I have for example 5 arrays with some inserted elements (numbers):
1,4,8,10
1,2,3,4,11,15
2,4,20,21
2,30
I need to find most common elements in those arrays and every element should go all the way till the end (see example below). In this example that would be the bold combination (or the same one but with "30" on the end, it's the "same") because it contains the smallest number of different elements (only two, 4 and 2/30).
This combination (see below) isn't good because if I have for ex. "4" it must "go" till it ends (next array mustn't contain "4" at all). So combination must go all the way till the end.
1,4,8,10
1,2,3,4,11,15
2,4,20,21
2,30
EDIT2: OR
1,4,8,10
1,2,3,4,11,15
2,4,20,21
2,30
OR anything else is NOT good.
Is there some algorithm to speed this thing up (if I have thousands of arrays with hundreds of elements in each one)?
To make it clear - solution must contain lowest number of different elements and the groups (of the same numbers) must be grouped from first - larger ones to the last - smallest ones. So in upper example 4,4,4,2 is better then 4,2,2,2 because in first example group of 4's is larger than group of 2's.
EDIT: To be more specific. Solution must contain the smallest number of different elements and those elements must be grouped from first to last. So if I have three arrrays like
1,2,3
1,4,5
4,5,6
Solution is 1,1,4 or 1,1,5 or 1,1,6 NOT 2,5,5 because 1's have larger group (two of them) than 2's (only one).
Thanks.
EDIT3: I can't be more specific :(
EDIT4: #spintheblack 1,1,1,2,4 is the correct solution because number used first time (let's say at position 1) can't be used later (except it's in the SAME group of 1's). I would say that grouping has the "priority"? Also, I didn't mention it (sorry about that) but the numbers in arrays are NOT sorted in any way, I typed it that way in this post because it was easier for me to follow.
Here is the approach you want to take, if arrays is an array that contains each individual array.
Starting at i = 0
current = arrays[i]
Loop i from i+1 to len(arrays)-1
new = current & arrays[i] (set intersection, finds common elements)
If there are any elements in new, do step 6, otherwise skip to 7
current = new, return to step 3 (continue loop)
print or yield an element from current, current = arrays[i], return to step 3 (continue loop)
Here is a Python implementation:
def mce(arrays):
count = 1
current = set(arrays[0])
for i in range(1, len(arrays)):
new = current & set(arrays[i])
if new:
count += 1
current = new
else:
print " ".join([str(current.pop())] * count),
count = 1
current = set(arrays[i])
print " ".join([str(current.pop())] * count)
>>> mce([[1, 4, 8, 10], [1, 2, 3, 4, 11, 15], [2, 4, 20, 21], [2, 30]])
4 4 4 2
If all are number lists, and are all sorted, then,
Convert to array of bitmaps.
Keep 'AND'ing the bitmaps till you hit zero. The position of the 1 in the previous value indicates the first element.
Restart step 2 from the next element
This has now turned into a graphing problem with a twist.
The problem is a directed acyclic graph of connections between stops, and the goal is to minimize the number of lines switches when riding on a train/tram.
ie. this list of sets:
1,4,8,10 <-- stop A
1,2,3,4,11,15 <-- stop B
2,4,20,21 <-- stop C
2,30 <-- stop D, destination
He needs to pick lines that are available at his exit stop, and his arrival stop, so for instance, he can't pick 10 from stop A, because 10 does not go to stop B.
So, this is the set of available lines and the stops they stop on:
A B C D
line 1 -----X-----X-----------------
line 2 -----------X-----X-----X-----
line 3 -----------X-----------------
line 4 -----X-----X-----X-----------
line 8 -----X-----------------------
line 10 -----X-----------------------
line 11 -----------X-----------------
line 15 -----------X-----------------
line 20 -----------------X-----------
line 21 -----------------X-----------
line 30 -----------------------X-----
If we consider that a line under consideration must go between at least 2 consecutive stops, let me highlight the possible choices of lines with equal signs:
A B C D
line 1 -----X=====X-----------------
line 2 -----------X=====X=====X-----
line 3 -----------X-----------------
line 4 -----X=====X=====X-----------
line 8 -----X-----------------------
line 10 -----X-----------------------
line 11 -----------X-----------------
line 15 -----------X-----------------
line 20 -----------------X-----------
line 21 -----------------X-----------
line 30 -----------------------X-----
He then needs to pick a way that transports him from A to D, with the minimal number of line switches.
Since he explained that he wants the longest rides first, the following sequence seems the best solution:
take line 4 from stop A to stop C, then switch to line 2 from C to D
Code example:
stops = [
[1, 4, 8, 10],
[1,2,3,4,11,15],
[2,4,20,21],
[2,30],
]
def calculate_possible_exit_lines(stops):
"""
only return lines that are available at both exit
and arrival stops, discard the rest.
"""
result = []
for index in range(0, len(stops) - 1):
lines = []
for value in stops[index]:
if value in stops[index + 1]:
lines.append(value)
result.append(lines)
return result
def all_combinations(lines):
"""
produce all combinations which travel from one end
of the journey to the other, across available lines.
"""
if not lines:
yield []
else:
for line in lines[0]:
for rest_combination in all_combinations(lines[1:]):
yield [line] + rest_combination
def reduce(combination):
"""
reduce a combination by returning the number of
times each value appear consecutively, ie.
[1,1,4,4,3] would return [2,2,1] since
the 1's appear twice, the 4's appear twice, and
the 3 only appear once.
"""
result = []
while combination:
count = 1
value = combination[0]
combination = combination[1:]
while combination and combination[0] == value:
combination = combination[1:]
count += 1
result.append(count)
return tuple(result)
def calculate_best_choice(lines):
"""
find the best choice by reducing each available
combination down to the number of stops you can
sit on a single line before having to switch,
and then picking the one that has the most stops
first, and then so on.
"""
available = []
for combination in all_combinations(lines):
count_stops = reduce(combination)
available.append((count_stops, combination))
available = [k for k in reversed(sorted(available))]
return available[0][1]
possible_lines = calculate_possible_exit_lines(stops)
print("possible lines: %s" % (str(possible_lines), ))
best_choice = calculate_best_choice(possible_lines)
print("best choice: %s" % (str(best_choice), ))
This code prints:
possible lines: [[1, 4], [2, 4], [2]]
best choice: [4, 4, 2]
Since, as I said, I list lines between stops, and the above solution can either count as lines you have to exit from each stop or lines you have to arrive on into the next stop.
So the route is:
Hop onto line 4 at stop A and ride on that to stop B, then to stop C
Hop onto line 2 at stop C and ride on that to stop D
There are probably edge-cases here that the above code doesn't work for.
However, I'm not bothering more with this question. The OP has demonstrated a complete incapability in communicating his question in a clear and concise manner, and I fear that any corrections to the above text and/or code to accommodate the latest comments will only provoke more comments, which leads to yet another version of the question, and so on ad infinitum. The OP has gone to extraordinary lengths to avoid answering direct questions or to explain the problem.
I am assuming that "distinct elements" do not have to actually be distinct, they can repeat in the final solution. That is if presented with [1], [2], [1] that the obvious answer [1, 2, 1] is allowed. But we'd count this as having 3 distinct elements.
If so, then here is a Python solution:
def find_best_run (first_array, *argv):
# initialize data structures.
this_array_best_run = {}
for x in first_array:
this_array_best_run[x] = (1, (1,), (x,))
for this_array in argv:
# find the best runs ending at each value in this_array
last_array_best_run = this_array_best_run
this_array_best_run = {}
for x in this_array:
for (y, pattern) in last_array_best_run.iteritems():
(distinct_count, lengths, elements) = pattern
if x == y:
lengths = tuple(lengths[:-1] + (lengths[-1] + 1,))
else :
distinct_count += 1
lengths = tuple(lengths + (1,))
elements = tuple(elements + (x,))
if x not in this_array_best_run:
this_array_best_run[x] = (distinct_count, lengths, elements)
else:
(prev_count, prev_lengths, prev_elements) = this_array_best_run[x]
if distinct_count < prev_count or prev_lengths < lengths:
this_array_best_run[x] = (distinct_count, lengths, elements)
# find the best overall run
best_count = len(argv) + 10 # Needs to be bigger than any possible answer.
for (distinct_count, lengths, elements) in this_array_best_run.itervalues():
if distinct_count < best_count:
best_count = distinct_count
best_lengths = lengths
best_elements = elements
elif distinct_count == best_count and best_lengths < lengths:
best_count = distinct_count
best_lengths = lengths
best_elements = elements
# convert it into a more normal representation.
answer = []
for (length, element) in zip(best_lengths, elements):
answer.extend([element] * length)
return answer
# example
print find_best_run(
[1,4,8,10],
[1,2,3,4,11,15],
[2,4,20,21],
[2,30]) # prints [4, 4, 4, 30]
Here is an explanation. The ...this_run dictionaries have keys which are elements in the current array, and they have values which are tuples (distinct_count, lengths, elements). We are trying to minimize distinct_count, then maximize lengths (lengths is a tuple, so this will prefer the element with the largest value in the first spot) and are tracking elements for the end. At each step I construct all possible runs which are a combination of a run up to the previous array with this element next in sequence, and find which ones are best to the current. When I get to the end I pick the best possible overall run, then turn it into a conventional representation and return it.
If you have N arrays of length M, this should take O(N*M*M) time to run.
I'm going to take a crack here based on the comments, please feel free to comment further to clarify.
We have N arrays and we are trying to find the 'most common' value over all arrays when one value is picked from each array. There are several constraints 1) We want the smallest number of distinct values 2) The most common is the maximal grouping of similar letters (changing from above for clarity). Thus, 4 t's and 1 p beats 3 x's 2 y's
I don't think either problem can be solved greedily - here's a counterexample [[1,4],[1,2],[1,2],[2],[3,4]] - a greedy algorithm would pick [1,1,1,2,4] (3 distinct numbers) [4,2,2,2,4] (two distinct numbers)
This looks like a bipartite matching problem, but I'm still coming up with the formulation..
EDIT : ignore; This is a different problem, but if anyone can figure it out, I'd be really interested
EDIT 2 : For anyone that's interested, the problem that I misinterpreted can be formulated as an instance of the Hitting Set problem, see http://en.wikipedia.org/wiki/Vertex_cover#Hitting_set_and_set_cover. Basically the left hand side of the bipartite graph would be the arrays and the right hand side would be the numbers, edges would be drawn between arrays that contain each number. Unfortunately, this is NP complete, but the greedy solutions described above are essentially the best approximation.

Resources