Monty Hall simulation issue regarding arrays - arrays

I am running into an issue with a simulation I did of the "Montey Hall" statistics riddle. I should get consistent results for my answer (33%/66%), but every third time I run the simulation the results end up being 0/100 and then they flip to to 66%/33%. I believe the issue might be with how I am creating my arrays (i.e the results of one simulation are bleeding over into the next simulation), but I can't pinpoint the issue. Also, if you have any tips on a better way to write my simulation, I would appreciate that as well.
Below is my code
#simulates guessing a door
def sim_guess(nsim):
answer = []
guess = [0,1,2]
stratagy = [0.2,0.6,0.2]
for element in range(nsim):
answer.append(np.random.choice(guess, p=stratagy))
return answer
#simulates placing the prize
def simulate_prizedoor(nsim):
doors = [0,1,2]
answer = []
for element in range(nsim):
answer.append(np.random.choice(doors))
return answer
#simulates opening a door to reveal a goat
def goat_door(prize, guess):
answer = []
for i in range(len(prize)):
door = [0,1,2]
if prize[i] == guess[i]:
door.remove(prize[i])
answer.append(np.random.choice(door))
else:
door.remove(prize[i])
door.remove(guess[i])
answer.append(door[0])
return answer
#simulates changing guess after goat has been revealed
def switch_guess(goat, guess):
answer = []
for i in range(len(goat)):
door = [0,1,2]
door.remove(goat[i])
door.remove(guess[i])
answer.append(door[0])
return answer
#shows percentages after 10,000 trials
def win_percentage(prize, guess):
wins = []
for element in prize:
wins.append(prize[element] == guess[element])
answer = (float(np.sum(wins))/len(guess))*100
return answer
prize = simulate_prizedoor(10000)
guess = sim_guess(10000)
#wins without changing guess
print win_percentage(prize, guess)
#wins with changing guess
goat = goat_door(prize, guess)
switch = switch_guess(goat, guess)
print win_percentage(prize, switch)

I feel like this would be a lot easier to do with objects, as each game is separate from the others. I would probably do it something like
import random
class Game:
winningDoor = 0
chosenDoor = 0
goat = 0
def __init__(self):
self.winningDoor = random.randint(1,3)
def play(self, move, willSwap):
self.chosenDoor = move
self.goat = 0
i=1
while(self.goat <= 0):
if(i != self.winningDoor and i != self.chosenDoor):
self.goat = i
i += 1
if(willSwap):
self.chosenDoor = 6-(self.chosenDoor + self.goat)
return (self.winningDoor == self.chosenDoor)
def main():
swapwins = 0
staywins = 0
for i in range(0,10000):
testswap = Game()
if(testswap.play(random.randint(1,3), 1)):
swapwins += 1
teststay = Game()
if(teststay.play(random.randint(1,3), 0)):
staywins += 1
print swapwins, staywins
main()
Here, each game is made separately, with each game being played once. It's not the most efficient use of objects, and could probably be just a subroutine for this, but if you want more statistics, this will end up being much better. The only potentially confusing thing here would be
self.chosenDoor = 6-(self.chosenDoor + self.goat)
which is a simplification of the statement that if goat is 1 and 2 was chosen, change it to 3; if goat was 2 and 3 was chosen, change it to 1; if goat was 3 and chosen was 1, change it to 2; etc...
As far as why your original code didn't work, returning all of your things in groups of 10000 looks very odd and difficult to debug. your random numbers could also be accomplished with randint, which would also make your code more human-readable.

Related

Restarting a loop after a specific number of iterations?

I'm new to coding but is being asked to create a simulation of a 10-year experiment 1000 times. I have it at a number lower than 1000 to speed up the testing process. This is a partial copy of my code, I can get the other parameters of the task to work but instead of stopping at and restarting for every 10-years, it seems to accumulate the results of the previous years'.
For example, the code is supposed to compound money earned the year following a 'Success,' while I can get it to compound, my code seems to compound into year 11 and 12 instead of stopping at 10 and essentially restarting at year 1.
I tried .count() to keep track of how many elements I'm iterating through and also tried the while xyz function but I can't seem to get either to work.
for sim in range(5):
for yr in range(10):
experiment = "Success" if np.random.random() <= 0.1 else "Failure"
expense = 25000
margin = 0
results1.append(experiment)
expenses1.append(expense)
margins1.append(margin)
iter = 0
if iter < 10:
for i in range(len(results1)):
if i + 1 < len(results1) and i - 1 >= 0:
if results1[i] == 'Success':
expenses1[i + 1] = 0
margins1[i + 1] = 10000
if results1[i - 1] == 'Success':
expenses1[i] = 0
if margins1[i] != 10000:
margins1[i] = 10000
if expenses1[i - 1] == 0:
expenses1[i] = 0
expenses1[i + 1] = 0
if margins1[i] >= 10000:
margins1[i + 1] = margins1[i] * 1.2
iter += 1
else:
continue
iter = 0
all_data1 = zip(results1, expenses1, margins1)
df1 = pd.DataFrame(all_data1, columns=["Results", "R&D", "Margins"])
Let's look at the following simplified code snippet of your implementation.
import numpy as np
results1 = []
for sim in range(10):
for yr in range(10):
experiment = "Success" if np.random.random() <= 0.1 else "Failure"
results1.append(experiment)
What is the problem here? Well, what I assume you want is that for every year we add a result to a list for that simulation. However, what is the difference between what you currently have, and this following code?
import numpy as np
results1 = []
for yr in range(10 * 10):
experiment = "Success" if np.random.random() <= 0.1 else "Failure"
results1.append(experiment)
Well, unless you left something out of your included code, it looks like not much! You want the simulation to reset after n years (in this example 10), but you don't change anything in your code! Here is an example of how this reset could be represented:
import numpy as np
results1 = []
for sim in range(10):
sim_results1 = []
for yr in range(10):
experiment = "Success" if np.random.random() <= 0.1 else "Failure"
sim_results1.append(experiment)
results1.append(sim_results1)
Now, your results1 list will contain m lists (where m is the number of simulations) with each sublist showing whether the experiment was a success or failure over n years. In short: if you just add each experimental result to a big list that is the same across all simulation runs, it's not a surprise that it looks like you are simulating into year 11, 12, etc. What is actually happening is year 11 is really year 1 of simulation number 2, but you do not separate the simulations currently.

Logical Error - 2D array using ArrayList in Kotlin

I am making a simpel tic tac toe game while learning Kotlin, following a tutorial.
When I input and combination as my turn in the game, let's say 1, 3 : The X appears in all places of that column. I have spent almost 3 hours finding the erorr but I think it's somthing to do with Arraylist making. Kinldy help me. Code is shown below.
var board = arrayListOf<ArrayList<String>>()
fun main(args: Array<String>) {
for (i in 0..2){
val row = arrayListOf<String>()
for (j in 0..2){
row.add("")
board.add(row)
}
}
printBoard()
var continueGame = true
do{
println("Please enter a position. (e.g: 1, 3)")
val input = readLine()?:""
var x = 0
var y = 0
try {
val positions = input.split(",")
x = positions[0].trim().toInt()
y = positions[1].trim().toInt()
println("x is $x")
println("x is $y")
if(board[x-1][y-1] != "") {
println("position already taken")
}else{
board[x-1][y-1] ="X"
printBoard()
}
}catch (e: Exception){
println("Invalid input, please try again")
}
}while(continueGame)
}
fun printBoard(){
println("----------------")
for (i in 0..2){
for (j in 0..2){
when (board[i][j]){
"X" -> print("| X ")
"O" -> print("| O ")
else -> print("| ")
}
}
println("|")
println("----------------")
}
}
Move board.add(row) outside your inner for loop (the for (j... loop). You are adding each row to the outer ArrayList three times, so when you start using the 2D list later and assume it only has three rows, all three of those rows are the same first row repeated, and you're ignoring the last six rows.
But actually, when you know that your collections will not ever change size, Arrays are a cleaner solution than Lists. You can create your 3x3 2D Array in one line like this:
val board = Array(3) { Array(3) { "" } }
Here's how I looked at the problem.
Debug printBoard()
I first looked at the printBoard() function. I put a breakpoint in, and saw that board already had 3 Xs in it.
So the problem is happening further up the chain.
Debugging the X assignment
There's only once place in your code where X's are added to the board, so let's take a look there.
board[x - 1][y - 1] = "X"
I put a breakpoint on that line, and ran the program in debug mode.
When I inspect the board object, I see it's an ArrayList with 9 elements. Each element is also an ArrayList, each with 3 elements.
In total that's 27 squares, and a tic-tac-toe board only has 9! board only needs 3 ArrayLists.
Debugging board creation
If we take a look at where board is created...
for (i in 0..2) {
val row = arrayListOf<String>()
for (j in 0..2) {
row.add("")
board.add(row) // hmmmm
}
}
board.add(row) is nested inside both for loops. That means it will be called 9 times in total.
The Fix
So, quick fix, move add to the outer loop.
for (i in 0..2) {
val row = arrayListOf<String>()
for (j in 0..2) {
row.add("")
}
board.add(row) // better!
}
The program now works!
Why were there 3 X's?
I think it's interesting to understand why the X was appearing on the board three times.
If we look at the two for loops, the row list is being created in the outer loop - which means only 3 rows will be created. But because board.add(row) was in the inner loop, it will add the same row 3 times!
We can actually see that in the debug inspection. ArrayList#1073 is a unique ID for a specific row object, and it appears 3 times. So do ArrayList#1074 and ArrayList#1075
board[0], board[1], and board[2] all fetch the same row object, so in the printBoard() function, it loops over the first 3 elements of board... which are all exactly the same object!
Preventing the problem
The next step is to think about how to stop this problem from happening in the first place. I think that the for loops were confusing - they're easy to get wrong, and have 'magic numbers'. Fortunately Kotlin has lots of useful tools we can use to write clearer code.
Kotlin's Array class has a constructor that accepts an size: Int and an initialising lambda, that is used to fill each element of the array.
Here's a demo:
println(
Array(5) { i -> "I'm element $i" }.joinToString()
)
// output:
// I'm element 0, I'm element 1, I'm element 2, I'm element 3, I'm element 4
Each element of the Array has a value, based on i (the index of the array). For the tic-tac-toe board we don't care about the index, so we can ignore it.
// var board = arrayListOf<ArrayList<String>>() // old
var board = Array(3) { Array(3) { "" } } // new!
Output:
----------------
| | | |
----------------
| | | |
----------------
| | | |
----------------
Please enter a position. (e.g: 1, 3)
There we go, a 3x3 board - nice and clear!
I hope this helps!

How many random requests do I need to make to a set of records to get 80% of the records?

Suppose I have an array of 100_000 records ( this is Ruby code, but any language will do)
ary = ['apple','orange','dog','tomato', 12, 17,'cat','tiger' .... ]
results = []
I can only make random calls to the array ( I cannot traverse it in any way)
results << ary.sample
# in ruby this will pull a random record from the array, and
# push into results array
How many random calls like that, do I need to make, to get least 80% of records from ary. Or expressed another way - what should be the size of results so that results.uniq will contain around 80_000 records from ary.
From my rusty memory of Stats class in college, I think it's needs to be 2*result set size = or around 160_000 requests ( assuming random function is random, and there is no some other underlying issue) . My testing seems to confirm this.
ary = [*1..100_000];
result = [];
160_000.times{result << ary.sample};
result.uniq.size # ~ 80k
This is stats, so we are talking about probabilities, not guaranteed results. I just need a reasonable guess.
So the question really, what's the formula to confirm this?
I would just perform a quick simulation study. In R,
N = 1e5
# Simulate 300 times
s = replicate(300, sample(x = 1:N, size = 1.7e5, replace = TRUE))
Now work out when you hit your target
f = function(i) which(i == unique(i)[80000])[1]
stats = apply(s, 2, f)
To get
summary(stats)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 159711 160726 161032 161037 161399 162242
So in 300 trials, the maximum number of simulations needed was 162242 with an average number of 161032.
With Fisher-Yates shuffle you could get 80K items from exactly 80K random calls
Have no knowledge of Ruby, but looking at https://gist.github.com/mindplace/3f3a08299651ebf4ab91de3d83254fbc and modifying it
def shuffle(array, counter)
#counter = array.length - 1
while counter > 0
# item selected from the unshuffled part of array
random_index = rand(counter)
# swap the items at those locations
array[counter], array[random_index] = array[random_index], array[counter]
# de-increment counter
counter -= 1
end
array
end
indices = [0, 1, 2, 3, ...] # up to 99999
counter = 80000
shuffle(indices, 80000)
i = 0
while counter > 0
res[i] = ary[indices[i]]
counter -= 1
i += 1
UPDATE
Packing sampled indices into custom RNG (bear with me, know nothing about Ruby)
class FYRandom
_indices = indices
_max = 80000
_idx = 0
def rand()
if _idx > _max
return -1.0
r = _indices[idx]
_idx += 1
return r.to_f / max.to_f
end
end
And code for sample would be
rng = FYRandom.new
results << ary.sample(random: rng)

How can I vectorize this code?

First of all I should say that I couldn't find the appropriate title for my question so I would appreciate anyone who will edit the title!
Suppose that I have a 18432x1472 matrix and I want to convert it to a 3072x1472 one ( 18432/6 = 3072 ) in this form:
the mean of elements (1,6),(2,6),...,(6,6) of the old matrix will go to the element (1,1) of the new one
the mean of elements (7,6),(8,6),...,(12,6) of the old matrix will go to the element (2,1) of the new one and so on
Up to now I have written this code:
function Out = MultiLooking( In )
MatrixIn = double(In);
m = size(In,1);
InTranspose = MatrixIn';
A = zeros(m,m/6);
for i = 1:(m/6)
A(6*(i-1)+1,i) = 1;
A(6*(i-1)+2,i) = 1;
A(6*(i-1)+3,i) = 1;
A(6*(i-1)+4,i) = 1;
A(6*(i-1)+5,i) = 1;
A(6*(i-1)+6,i) = 1;
end
X = (InTranspose*A)/6;
Out1 = X';
Out = uint8(Out1);
end
But it is alittle slow and for my polarimetric SAR data, computer gets hanged out for a while when running this code so I need the code to run faster!
Can anyone suggest me a faster code for doing this purpose???
An alternative to Divakar's nice answer: use blockproc (Image Processing Toolbox):
blockproc(MatrixIn, [6 size(MatrixIn,2)], #(x) mean(x.data))
Try this -
%// Assuming MatrixIn is your input matrix
reshape(mean(reshape(MatrixIn,6,[])),size(MatrixIn,1)/6,[])
Alternative Solution using cell arrays (performance improvement over previous code is doubtful though) -
c1 = cellfun(#mean,mat2cell(MatrixIn,6.*ones(1,size(MatrixIn,1)/6),size(MatrixIn,2)),'uni',0)
out = vertcat(c1{:})

Efficient way to convert Scala Array to Unique Sorted List

Can anybody optimize following statement in Scala:
// maybe large
val someArray = Array(9, 1, 6, 2, 1, 9, 4, 5, 1, 6, 5, 0, 6)
// output a sorted list which contains unique element from the array without 0
val newList=(someArray filter (_>0)).toList.distinct.sort((e1, e2) => (e1 > e2))
Since the performance is critical, is there a better way?
Thank you.
This simple line is one of the fastest codes so far:
someArray.toList.filter (_ > 0).sortWith (_ > _).distinct
but the clear winner so far is - due to my measurement - Jed Wesley-Smith. Maybe if Rex' code is fixed, it looks different.
Typical disclaimer 1 + 2:
I modified the codes to accept an Array and return an List.
Typical benchmark considerations:
This was random data, equally distributed. For 1 Million elements, I created an Array of 1 Million ints between 0 and 1 Million. So with more or less zeros, and more or less duplicates, it might vary.
It might depend on the machine etc.. I used a single core CPU, Intel-Linux-32bit, jdk-1.6, scala 2.9.0.1
Here is the underlying benchcoat-code and the concrete code to produce the graph (gnuplot). Y-axis: time in seconds. X-axis: 100 000 to 1 000 000 elements in Array.
update:
After finding the problem with Rex' code, his code is as fast as Jed's code, but the last operation is a transformation of his Array to a List (to fullfill my benchmark-interface). Using a var result = List [Int], and result = someArray (i) :: result speeds his code up, so that it is about twice as fast as the Jed-Code.
Another, maybe interesting, finding is: If I rearrange my code in the order of filter/sort/distinct (fsd) => (dsf, dfs, fsd, ...), all 6 possibilities don't differ significantly.
I haven't measured, but I'm with Duncan, sort in place then use something like:
util.Sorting.quickSort(array)
array.foldRight(List.empty[Int]){
case (a, b) =>
if (!b.isEmpty && b(0) == a)
b
else
a :: b
}
In theory this should be pretty efficient.
Without benchmarking I can't be sure, but I imagine the following is pretty efficient:
val list = collection.SortedSet(someArray.filter(_>0) :_*).toList
Also try adding .par after someArray in your version. It's not guaranteed to be quicker, bit it might be. You should run a benchmark and experiment.
sort is deprecated. Use .sortWith(_ > _) instead.
Boxing primitives is going to give you a 10-30x performance penalty. Therefore if you really are performance limited, you're going to want to work off of raw primitive arrays:
def arrayDistinctInts(someArray: Array[Int]) = {
java.util.Arrays.sort(someArray)
var overzero = 0
var ndiff = 0
var last = 0
var i = 0
while (i < someArray.length) {
if (someArray(i)<=0) overzero = i+1
else if (someArray(i)>last) {
last = someArray(i)
ndiff += 1
}
i += 1
}
val result = new Array[Int](ndiff)
var j = 0
i = overzero
last = 0
while (i < someArray.length) {
if (someArray(i) > last) {
result(j) = someArray(i)
last = someArray(i)
j += 1
}
i += 1
}
result
}
You can get slightly better than this if you're careful (and be warned, I typed this off the top of my head; I might have typoed something, but this is the style to use), but if you find the existing version too slow, this should be at least 5x faster and possibly a lot more.
Edit (in addition to fixing up the previous code so it actually works):
If you insist on ending with a list, then you can build the list as you go. You could do this recursively, but I don't think in this case it's any clearer than the iterative version, so:
def listDistinctInts(someArray: Array[Int]): List[Int] = {
if (someArray.length == 0 || someArray(someArray.length-1) <= 0) List[Int]()
else {
java.util.Arrays.sort(someArray)
var last = someArray(someArray.length-1)
var list = last :: Nil
var i = someArray.length-2
while (i >= 0) {
if (someArray(i) < last) {
last = someArray(i)
if (last <= 0) return list;
list = last :: list
}
i -= 1
}
list
}
}
Also, if you may not destroy the original array by sorting, you are by far best off if you duplicate the array and destroy the copy (array copies of primitives are really fast).
And keep in mind that there are special-case solutions that are far faster yet depending on the nature of the data. For example, if you know that you have a long array, but the numbers will be in a small range (e.g. -100 to 100), then you can use a bitset to track which ones you've encountered.
For efficiency, depending on your value of large:
val a = someArray.toSet.filter(_>0).toArray
java.util.Arrays.sort(a) // quicksort, mutable data structures bad :-)
res15: Array[Int] = Array(1, 2, 4, 5, 6, 9)
Note that this does the sort using qsort on an unboxed array.
I'm not in a position to measure, but some more suggestions...
Sorting the array in place before converting to a list might well be more efficient, and you might look at removing dups from the sorted list manually, as they will be grouped together. The cost of removing 0's before or after the sort will also depend on their ratio to the other entries.
How about adding everything to a sorted set?
val a = scala.collection.immutable.SortedSet(someArray filter (0 !=): _*)
Of course, you should benchmark the code to check what is faster, and, more importantly, that this is truly a hot spot.

Resources