JAGS model with unknown data points

JAGS model with unknown data points - theory

I'm new at JAGS, and trying to make this model work. My data is an n by m matrix of 1s, 2s, 3s, and empty values where a subject was not presented with the question (currently represented by the number 7, but I'm not married to it). I would like the model to be able to handle missing data values, in which case the posterior distribution should draw evenly from the three options. Right now it runs fine if and only if the data contains nothing but 1, 2, and 3, if any other values are present I get a "Node inconsistent with parents" error. Any advice? Here's the model code: (it's an extension of Bayesian Cultural Consensus Theory if that is any help)
model{
for (i in 1:n){
for (k in 1:m){
D[i, k] <- (theta[i]*(1-delta[k]))/(theta[i]*(1-delta[k])+(1-theta[i])*delta[k])
for (j in 1:3){
pY[i,k,j] <- ifelse( z[k]==j , D[i,k], (1-D[i,k]) / (l-3))
}
Y[i,k] ~ dcat( pY[ i,k,1:3 ] )
}
}
for (i in 1:n){
g[i] ~ dunif(0,1)
theta[i] ~ dunif(0,1)}
for (k in 1:m){
for (j in 1:3){
iniv[k,j] <- 1/3}
z[k] ~ dcat( iniv[k,1:3] ) }
delta[1]<- 0.5
for (k in 2:m){
delta[k] <- tempdelta[k-1]
tempdelta[k-1] ~ dunif(0,1)}
}

Related

Determining whether a given point would create an island

I'm currently working on porting the game Hitori, aka Singles to the Game Boy in C using GBDK. One of the rules of this game is that no area of the board can be completely closed off from other areas. For example, if the current state of the board is:
00100
01000
00000
00000
00000
the solution cannot contain a 1 at (0,0) or (0,2). The board generation function needs to be able to detect this and not place a black tile there. I'm currently using a non-recursive depth-first search, which works, but is very slow on larger boards. Every other implementation of this game I can find on the internet uses DFS. The Game Boy is just too slow.
What I need is an algorithm that, when given a coordinate, can tell whether or not a 1 can be placed at that location without dividing the board. I've looked into scanline-based filling algorithms, but I'm not sure how much faster they'll be since boards rarely have long horizontal lines in them. I also thought of using an algorithm to follow along an edge, but I think that would fail if the edge wasn't connected to the side of the board:
00000
00100
01010
00100
00000
Are there any other types of algorithm that can do this efficiently?

I looked at another generator code, and it repeatedly chooses a tile to
consider blackening, doing so if that doesn’t lead to an invalid board.
If your generator works the same way, we can exploit the relatedness of
the connectivity queries. The resulting algorithm will require
O(n²)-time initialization and then process each update in amortized
O(log n) time (actually inverse Ackermann if you implement balanced
disjoint set merges). The constants should be OK as algorithms go,
though n = 15 is small.
Treating the board as a subset of the grid graph with the black tiles
removed, we need to detect when the number of connected components would
increase from 1. To borrow an idea from my colleague Jakub Łącki and
Piotr Sankowski (“Optimal Decremental Connectivity in Planar Graphs”,
Lemma 2), we can use Euler characteristic and planar duality to help
accomplish this.
Let me draw an empty board (with numbered tiles) and its grid graph.
+-+-+-+
|1|2|3|
+-+-+-+
|4|5|6|
+-+-+-+
|7|8|9|
+-+-+-+
1-2-3
|a|b|
4-5-6
|c|d|
7-8-9 i
In the graph I have lettered the faces (finite faces a, b, c, d
and the infinite face i). A planar graph satisfies the formula V − E +
F = 2 if and only if it is connected and nonempty. You can verify that
this one indeed does, with V = 9 vertices and E = 12 edges and F = 5
faces.
By blackening a tile, we remove its vertex and the neighboring edges
from the graph. The interesting thing here is what happens to the faces.
If we remove the edge 2-5, for example, then we connect face a with
face b. This is planar duality at work. We’ve turned a difficult
decremental problem in the primal into an incremental problem in the
dual! This incremental problem can be solved the same way as it is in
Kruskal’s algorithm, via the disjoint set data structure.
To show how this works, suppose we blacken 6. Then the graph would
look like this:
1-2-3
|a|
4-5
|c|
7-8-9 i
This graph has V = 8 and E = 9 and F = 3, so V − E + F = 2. If we were
to remove 2, then vertex 3 is disconnected. The resulting graph
would have V = 7 and E = 6 and F = 2 (c and i), but V − E + F = 3 ≠
2.
Just to make sure I didn’t miss anything, here’s a tested implementation
in Python. I have aimed for readability over speed since you’re going to
be translating it into C and optimizing it.
import random
# Represents a board with black and non-black tiles.
class Board:
# Constructs an n x n board.
def __init__(self, n):
self._n = n
self._black = [[False] * n for i in range(n)]
self._faces = [[Face() for j in range(n - 1)] for i in range(n - 1)]
self._infinite_face = Face()
# Blackens the tile at row i, column j if possible. Returns True if
# successful.
def blacken(self, i, j):
neighbors = list(self._neighbors(i, j))
if self._black[i][j] or any(self._black[ni][nj] for (ni, nj) in neighbors):
return False
incident_faces = self._incident_faces(i, j)
delta_V = -1
delta_E = -len(neighbors)
delta_F = 1 - len(incident_faces)
if delta_V - delta_E + delta_F != 2 - 2:
return False
self._black[i][j] = True
f = incident_faces.pop()
for g in incident_faces:
f.merge(g)
return True
# Returns the coordinates of the tiles adjacent to row i, column j.
def _neighbors(self, i, j):
if i > 0:
yield i - 1, j
if j > 0:
yield i, j - 1
if j < self._n - 1:
yield i, j + 1
if i < self._n - 1:
yield i + 1, j
# Returns the faces incident to the tile at row i, column j.
def _incident_faces(self, i, j):
return {self._face(fi, fj) for fi in [i - 1, i] for fj in [j - 1, j]}
def _face(self, i, j):
return (
self._faces[i][j]
if 0 <= i < self._n - 1 and 0 <= j < self._n - 1
else self._infinite_face
).rep()
# Tracks facial merges.
class Face:
def __init__(self):
self._parent = self
# Returns the canonical representative of this face.
def rep(self):
while self != self._parent:
grandparent = self._parent._parent
self._parent = grandparent
self = grandparent
return self
# Merges self and other into one face.
def merge(self, other):
other.rep()._parent = self.rep()
# Reference implementation with DFS.
class DFSBoard:
def __init__(self, n):
self._n = n
self._black = [[False] * n for i in range(n)]
# Blackens the tile at row i, column j if possible. Returns True if
# successful.
def blacken(self, i, j):
neighbors = list(self._neighbors(i, j))
if self._black[i][j] or any(self._black[ni][nj] for (ni, nj) in neighbors):
return False
self._black[i][j] = True
if not self._connected():
self._black[i][j] = False
return False
return True
# Returns the coordinates of the tiles adjacent to row i, column j.
def _neighbors(self, i, j):
if i > 0:
yield i - 1, j
if j > 0:
yield i, j - 1
if j < self._n - 1:
yield i, j + 1
if i < self._n - 1:
yield i + 1, j
def _connected(self):
non_black_count = sum(
not self._black[i][j] for i in range(self._n) for j in range(self._n)
)
visited = set()
for i in range(self._n):
for j in range(self._n):
if not self._black[i][j]:
self._depth_first_search(i, j, visited)
return len(visited) == non_black_count
def _depth_first_search(self, i, j, visited):
if (i, j) in visited:
return
visited.add((i, j))
for ni, nj in self._neighbors(i, j):
if not self._black[ni][nj]:
self._depth_first_search(ni, nj, visited)
def generate_board(n, board_constructor=Board):
deck = [(i, j) for i in range(n) for j in range(n)]
random.shuffle(deck)
board = Board(n)
return {(i, j) for (i, j) in deck if board.blacken(i, j)}
def generate_and_print_board(n):
black = generate_board(n)
for i in range(n):
print("".join(chr(9633 - ((i, j) in black)) for j in range(n)))
def test():
n = 4
boards = set(frozenset(generate_board(n, Board)) for i in range(1000000))
reference_boards = set(
frozenset(generate_board(n, DFSBoard)) for k in range(1000000)
)
assert len(boards) == len(reference_boards)
if __name__ == "__main__":
generate_and_print_board(15)
test()

R routine always samples from last row of array instead of random rows

I have been debugging the following routine for some time.
A problem that came to my attention is that sampling is always done on the last row of my array every time I run the simulation. I want it to select rows at random each time the code is run.
Here's what I have:
N <- 10
Hstar <- 5
perms <- 10 ### How many permutations are we considering
specs <- 1:N
Set up a container to hold the identity of each individual from each permutation
pop <- array(dim = c(perms, N))
haps <- as.character(1:Hstar)
Assign probabilities
probs <- rep(1/Hstar, Hstar)
Generate permutations
for(i in 1:perms){
pop[i, ] <- sample(haps, size = N, replace = TRUE, prob = probs)
}
Make a matrix to hold the 1:N individuals from each permutation
HAC.mat <- array(dim = c(perms, N))
for(j in specs){
for(i in 1:perms){
ind.index <- sample(specs, size = j, replace = FALSE) ## which individuals will we sample
hap.plot <- pop[i, ind.index] ## pull those individuals from a permutation
HAC.mat[i, j] <- length(unique(hap.plot)) ## how many haplotypes did we get for a given sampling intensity (j) from each ### permutation (i)
}
}
When I look at ind.index and hap.plot, I notice that values from haps are always taken from the last row in the pop variable and I can quite understand why this is occurring. I would like it to randomly sample from a given row in pop.
Any help is greatly appreciated.

I have found a workaround that looks like it works.
hap.plot <- pop[sample(nrow(pop), size = 1, replace = TRUE), ]

Divide an array into subarrays as equally as possible for core-mapping

What algorithms are used to map an image array to multiple cores for processing? I've been trying to come up with something that will return a list of (disjoint) ranges over which to iterate in an array, and so far I have the following.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import numpy as np
def divider(arr_dims, coreNum=1):
""" Get a bunch of iterable ranges;
Example input: [[[0, 24], [15, 25]]]
"""
if (coreNum == 1):
return arr_dims
elif (coreNum < 1):
raise ValueError(\
'partitioner expected a positive number of cores, got %d'\
% coreNum
)
elif (coreNum % 2):
raise ValueError(\
'partitioner expected an even number of cores, got %d'\
% coreNum
)
total = []
# Split each coordinate in arr_dims in _half_
for arr_dim in arr_dims:
dY = arr_dim[0][1] - arr_dim[0][0]
dX = arr_dim[1][1] - arr_dim[1][0]
if ((coreNum,)*2 > (dY, dX)):
coreNum = max(dY, dX)
coreNum -= 1 if (coreNum % 2 and coreNum > 1) else 0
new_c1, new_c2, = [], []
if (dY >= dX):
# Subimage height is greater than its width
half = dY // 2
new_c1.append([arr_dim[0][0], arr_dim[0][0] + half])
new_c1.append(arr_dim[1])
new_c2.append([arr_dim[0][0] + half, arr_dim[0][1]])
new_c2.append(arr_dim[1])
else:
# Subimage width is greater than its height
half = dX // 2
new_c1.append(arr_dim[0])
new_c1.append([arr_dim[1][0], half])
new_c2.append(arr_dim[0])
new_c2.append([arr_dim[1][0] + half, arr_dim[1][1]])
total.append(new_c1), total.append(new_c2)
# If the number of cores is 1, we get back the total; Else,
# we split each in total, etc.; it's turtles all the way down
return divider(total, coreNum // 2)
if __name__ == '__main__':
import numpy as np
X = np.random.randn(25 - 1, 36 - 1)
dims = [zip([0, 0], list(X.shape))]
dims = [list(j) for i in dims for j in dims[0] if type(j) != list]
print(divider([dims], 2))
It's incredibly limited, however, because it only accepts a number of cores that's some power of 2, and then I'm certain there's edge cases I'm overlooking. Running it returns [[[0, 24], [0, 17]], [[0, 24], [17, 35]]], and then using pathos I've mapped the first set to one core in my laptop and the second to another.
I guess I just don't know how to geometrically walk my way through partitioning an image into segments that are as similar in size as possible, so that each core on a given machine has the same amount of work to do.

I'm not too sure what you're trying to achieve, but if you want to split an array (of whatever dimensions) into multiple parts you can look into the numpy.array_split method numpy.array_split.
It partitions an array into an almost equal number of parts, so it works even when the number of partitions cannot cleanly divide the array.

loop through column and add other row

EDIT: I've made some progress. So I read up on subsets, and was able to break down my dataframe under a certain condition. Let's say titleCSV[3] consists of file names ("file1", "file2", "file3", etc) and titleCSV[13] contains values (-18, -8, -2, etc). Code below:
titleRMS <- data.frame(titleCSV[3], titleCSV[13])
for(x.RMS in titleRMS[2]){
x.RMS <- gsub("[A-Za-z]","",r)
x.RMS <- gsub(" ","",r)
x.RMS = abs(as.numeric(r))
}
x.titleRMSJudge <- data.frame(titleRMS[1], x.RMS)
x.titleRMSResult <- subset(x.titleRMSJudge, r < 12)
My question now is, what's the best way to print each row of the first column of x.titleRMSResult with a message saying that it's loud? Thanks, guys!
BTW, here is the dput of my titleRMS:
dput(titleRMS)
structure(list(FILE.NAME = c("00-Introduction.mp3", "01-Chapter_01.mp3",
"02-Chapter_02.mp3", "03-Chapter_03.mp3", "04-Chapter_04.mp3",
"05-Chapter_05.mp3", "06-Chapter_06.mp3", "07-Chapter_07.mp3",
"08-Chapter_08.mp3", "09-Chapter_09.mp3", "10-Chapter_10.mp3",
"11-Chapter_11.mp3", "12-Chapter_12.mp3", "Bonus_content.mp3",
"End.mp3"), AVG.RMS..dB. = c(-14, -10.74, -9.97, -10.53, -10.94,
-12.14, -11, -9.19, -10.42, -11.51, -14, -10.96, -11.71, -11,
-16)), .Names = c("FILE.NAME", "AVG.RMS..dB."), row.names = c(NA,
-15L), class = "data.frame")
ORIGINAL POST BELOW
Newb here! Coding in R. So I am trying to analyze a csv file. One column has 10 rows with different file names, while the other has 10 rows with different values. I want to run the 2nd column into a loop, and if it's greater/less than a certain value, I wanted it to print the associating file name as well as a message. I don't know how to have both columns run in a loop together so that the proper file name prints with the proper value/message. I wrote a loop that ends up checking each value for as many rows as there are in the other column. At the moment, all 10 rows meet the criteria for the message I want to print, so I've been getting 100 messages!
titleRMS <- data.frame(titleCSV[3], titleCSV[13])
for(title in titleRMS[1]){
title <- gsub(" ","",title)
}
for(r in titleRMS[2]){
r <- gsub("[A-Za-z]","",r)
r <- gsub(" ","",r)
r = abs(as.numeric(r))
for(t in title){
for(f in r){
if (f < 18 & f > 0) {
message(t, "is Loud!")
}
}
}
}
And this line of code only prints the first file name for each message:
for(r in titleRMS[2]){
r <- gsub("[A-Za-z]","",r)
r <- gsub(" ","",r)
r = abs(as.numeric(r))
for(f in r){
if (f < 18 & f > 0) {
message(t, "is Loud!")
}
}
}
Can someone throw me some tips or even re-write what I wrote to show me how to get what I need? Thanks, guys!

I've figured out my own issue. Here is what I wrote to come to the conclusion I wanted:
titleRMS <- data.frame(titleCSV[3], titleCSV[13])
filesHighRMS <- vector()
x.titleRMSJudge <- data.frame(titleCSV[3], titleCSV[13])
x.titleRMSResult <- subset(x.titleRMSJudge, titleCSV[13] > -12 & titleCSV[15] > -1)
for(i in x.titleRMSResult[,1]){
filesHighRMS <- append(filesHighRMS, i, 999)
}
emailHighRMS <- paste(filesHighRMS, collapse=", ")
blurbHighRMS <- paste("" ,nrow(x.titleRMSResult), " file(s) (" ,emailHighRMS, ") have a high RMS and are too loud.")
Being new to code, I bet there is a simpler way, I'm just glad I was able to work this out on my own. :-)

You're making things hard on yourself. You don't need regex for this, and you probably don't need a loop, at least not through your data frame. Definitely you don't need nested loops.
I think this will do what you say you want...
indicesToMessage <- titleRms[, 2] > 0 & titleRms[, 2] < 18
myMessages <- paste(titleRms[indicesToMessage, 1], "is Loud!")
for (i in 1:length(myMessages)) {
message(myMessages[i])
}
A more R-like way (read: without an explicit loop) to do the last line is like this:
invisible(lapply(myMessages, message))
The invisible is needed because message() doesn't return anything, just has the side-effect of printing to the console, but lapply expects a return and will print NULL if there is none. invisible just masks the NULL.
Edits: Negative data
Since your data is negative, I assume you actually want messages when the absolute value abs() is between 0 and 18. This works for that case.
indicesToMessage <- abs(titleRms[, 2]) > 0 & abs(titleRms[, 2]) < 18
myMessages <- paste(titleRms[indicesToMessage, 1], "is Loud!")
invisible(lapply(myMessages, message))

lapply and rbind not properly appending the results

SimNo <- 10
for (i in 1:SimNo){
z1<-rnorm(1000,0,1)
z2<-rnorm(1000,0,1)
z3<-rnorm(1000,0,1)
z4<-rnorm(1000,0,1)
z5<-rnorm(1000,0,1)
z6<-rnorm(1000,0,1)
X<-cbind(z1,z2,z3,z4,z5,z6)
sx<-scale(X)/sqrt(999)
det1<-det(t(sx)%*%sx)
detans<-do.call(rbind,lapply(1:SimNo, function(x) ifelse(det1<1,det1,0)))
}
when I run all commands with in loop except last one I get different values of determinant but when I run code with loops at once I get last value of determinant repeated for all.
Please help and guide to control all situation like this.
Is there way to have short and efficient way for this code, so that each individual variable can also be accessed.

Whenever you are repeating the same operation multiple times, and without inputs, think about using replicate. Here you can use it twice:
SimNo <- 10
det1 <- replicate(SimNo, {
X <- replicate(6, rnorm(1000, 0, 1))
sx <- scale(X) / sqrt(999)
det(t(sx) %*% sx)
})
detans <- ifelse(det1 < 1, det1, 0)
Otherwise, this is what your code should have looked with your for loop. You needed to create a vector for storing your outputs at each loop iteration:
SimNo <- 10
detans <- numeric(SimNo)
for (i in 1:SimNo) {
z1<-rnorm(1000,0,1)
z2<-rnorm(1000,0,1)
z3<-rnorm(1000,0,1)
z4<-rnorm(1000,0,1)
z5<-rnorm(1000,0,1)
z6<-rnorm(1000,0,1)
X<-cbind(z1,z2,z3,z4,z5,z6)
sx<-scale(X)/sqrt(999)
det1<-det(t(sx)%*%sx)
detans[i] <- ifelse(det1<1,det1,0)
}
Edit: you asked in the comments how to access X using replicate. You would have to make replicate create and store all your X matrices in a list. Then use the *apply family of functions to loop throughout that list to finish the computations:
X <- replicate(SimNo, replicate(6, rnorm(1000, 0, 1)), simplify = FALSE)
det1 <- sapply(X, function(x) {
sx <- scale(x) / sqrt(999)
det(t(sx) %*% sx)
})
detans <- ifelse(det1 < 1, det1, 0)
Here, X is now a list of matrices, so you can get e.g. the matrix for the second simulation by doing X[[2]].

SimNo <- 10
matdet <- matrix(data=NA, nrow=SimNo, ncol=1, byrow=TRUE)
for (i in 1:SimNo){
z1<-rnorm(1000,0,1)
z2<-rnorm(1000,0,1)
z3<-rnorm(1000,0,1)
z4<-rnorm(1000,0,1)
z5<-rnorm(1000,0,1)
z6<-rnorm(1000,0,1)
X<-cbind(z1,z2,z3,z4,z5,z6)
sx<-scale(X)/sqrt(999)
det1<-det(t(sx)%*%sx)
matdet[i] <-do.call(rbind,lapply(1:SimNo, function(x) ifelse(det1<1,det1,0)))
}
matdet

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

JAGS model with unknown data points - theory

Related

Determining whether a given point would create an island

R routine always samples from last row of array instead of random rows

Divide an array into subarrays as equally as possible for core-mapping

loop through column and add other row

lapply and rbind not properly appending the results

Categories

Resources