Set intersection is wrong - arrays

I am creating a program that works with geographic information. I have data that contains what geographic units touch each other. The function that fails is intended to add neighboring units to an array based on population (for example, it starts with a unit, then adds the most populous neighboring unit to the array, and then adds the most populous unit that touches one of the units in the array, and continues this until it reaches a population limit). The way I am doing this is using a for loop, and then using an array of the total units that have been sorted by population. I then find the index of the first (and therefore most populous) neighbor using the intersection between the neighbors of the units in the array and the neighbors of each unit in the array of total units. The following is my code (please excuse the clunkiness):
func createDistrict () {
if useBoard.isEmpty == false {
useBoard.sort(by: {$0.population > $1.population})
var maxPop = useBoard.first!.population
district.removeAll()
district.append(useBoard.first!)
useBoard.removeFirst()
for i in 0...useBoard.count - 1 {
if useBoard.indices.contains(i) {
if useBoard[i] == nil || district.map({$0.population}).reduce(0,+) > districtMax {
break
}
} else {
break
}
useBoard.sort(by: {$0.population > $1.population})
var superArray:[Precinct] = []
district.forEach { (z) in
superArray += z.neighbors
Array(Set(superArray))
}
var nextPre = useBoard.firstIndex { (l) -> Bool in
Set(l.neighbors).intersection(Set(superArray)).isEmpty == false
}
if nextPre == nil {
break
}else {
var temporary = Set(useBoard[nextPre!].neighbors).intersection(Set(superArray))
var newString = ""
var newTemp = Array(temporary)
for t in 0...newTemp.count - 1 {
var next = useBoard.firstIndex { (k) -> Bool in
k == newTemp[t]
}
newString.append("\(newTemp[t]) (\(next)), ")
}
print("\(useBoard[nextPre!].precinctID) (\(nextPre!)) touches \(newString)")
}
district.append(useBoard[nextPre!])
useBoard.remove(at: nextPre!)
}
}
district.forEach { (p) in
print("\(p.precinctID)")
}
}
In this function, var nextPre = useBoard.firstIndex { (l) -> Bool in Set(l.neighbors).intersection(Set(superArray)).isEmpty == false} is used to find the index of the most populous neighbor. However, when I test it using print, I get an incorrect output. In the following excerpt, the values in the parentheses are just the indices and don't really matter. The output:
2104 (8) touches 1987 (Optional(710)), 2676 (Optional(1591)),
2387 (10) touches 2105 (Optional(2140)),
2274 (11) touches 2273 (Optional(52)), 2386 (Optional(236)),
2275 (14) touches 2276 (Optional(22)), 2105 (Optional(2138)), 2273 (Optional(51)),
2276 (21) touches 2389 (Optional(1638)), 2273 (Optional(50)), 2274 (nil), 2275 (nil), 2277 (Optional(2771)), 2386 (Optional(234)),
2067 (35) touches 2404 (Optional(76)), 2212 (Optional(944)),
2406 (40) touches 2404 (Optional(75)), 2070 (Optional(1771)),
2440 (42) touches 2212 (Optional(942)), 2388 (Optional(497)), 2441 (Optional(1364)),
2273 (46) touches 2386 (Optional(230)), 2276 (nil), 2064 (Optional(384)), 2275 (nil), 2105 (Optional(2133)), 2274 (nil), 2387 (nil),
1795 (55) touches 1891 (Optional(1212)),
1908 (41) touches 2638 (Optional(2568)), 1869 (Optional(474)),
2404 (70) touches 2212 (Optional(938)), 2070 (Optional(1766)), 2069 (Optional(365)), 2068 (Optional(581)), 1743 (Optional(2453)), 2405 (Optional(2442)), 2387 (nil), 2105 (Optional(2130)), 2284 (Optional(2792)),
2736 (70) touches 2548 (Optional(1314)), 2420 (Optional(1305)),
1798 (52) touches 2419 (Optional(270)),
1907 (45) touches 1912 (Optional(1611)), 2737 (Optional(2082)),
As you can see, the neighbors are off by 1. For example, 2104 is the first unit. Then 2387 touches 2105, which is 2104 + 1. Then 2274 touches 2386, which is 2387 - 1. Then 2275 touches 2105, which is 2104 + 1. The .intersection should find the first unit that touches the units in the array, yet it finds the first unit that touches a unit's name + 1. I have no idea how this is occurring, as the geographic units are stored in a custom object, not an integer or any other number variable. Here is the custom object:
class Precinct {
var precinctID:String
var population:Int
var neighbors:[Precinct]
init(precinctID:String, population:Int, neighbors:[Precinct]){
self.precinctID = precinctID
self.population = population
self.neighbors = neighbors
}
}
extension Precinct: Equatable {
static func == (lhs: Precinct, rhs: Precinct) -> Bool {
return lhs.precinctID == rhs.precinctID && lhs.population == rhs.population && lhs.neighbors == rhs.neighbors
}
}
extension Precinct: Hashable {
var hashValue: Int {
return precinctID.hashValue ^ population.hashValue
}
}
extension Precinct: CustomStringConvertible {
var description: String {
return "\(precinctID)"
}
}
What's going wrong and how can I fix it? Thanks.

The problem is due to the fact that you are removing elements of useBoard while at the same time iterating over useBoard. You are printing the indices "\(newTemp[t]) (\(next)), " and then five lines later removing an index; before repeating the process. While you can change the values of a collection you are iterating over, never change the size of the collection at the same time.
A first step may be to copy useBoard before running the outer loop. Keep it constant so that you iterate over all of its contents but use the copy for all of your logic. I have trouble following the intent of your code.
As far as the hash, that is not your problem. However it is not ideal. Swift provides a default hash that is almost always better. Just allow Swift to synthesize its own algorithm by changing that extension to this.
extension Precinct: Hashable {}
There are a couple of issues. Please remove the call to sort useBoard within the outer loop. That has no effect because useBoard was already sorted before entering the loop. Also Array(Set(superArray)) is not doing anything for you.
Good luck.

Related

How to interpret oglmx function in R programming?

am currently working on a project wherein am supposed to model public acceptance on pricing schemes.
The independent variables being used for model:- Age, gender,income etc... which are categorical in nature, so I converted them into factored variables using as.factor() function.
Age Gender Income
0 1 2
0 0 0
0 0 1
I have certain other variables like Transit satisfaction, Environment improvement etc... which are ordered factors on scale of 1 to 5 . 1 being extremely dissatisfied and 5 being very satisfied.
My model is as follows :-
mdl = oglmx( prcing ~Ann_In1+Edu+Env_imp+rs_imp,data=cpdat, link = "logit", constantMEAN = F, constantSD = F, delta = 0, threshparam = NULL)
summary(mdl)
Estimate Std. error t value Pr(>|t|)
Ann_In11 0.1605540 0.3021613 0.5314 0.5951749
Ann_In12 -0.9556992 0.4218504 -2.2655 0.0234824 *
Edu1 0.0710699 0.2678081 0.2654 0.7907196
Edu2 1.0732587 0.7112519 1.5090 0.1313061
Env_imp.L -0.8524288 0.4899275 -1.7399 0.0818752 .
Env_imp.Q 0.0784353 0.3936332 0.1993 0.8420595
Env_imp.C 0.4589036 0.4498676 1.0201 0.3076878
Env_imp^4 -0.2219108 0.4423486 -0.5017 0.6159032
rd_sft.L 2.6335035 0.7362206 3.5771 0.0003475 ***
rd_sft.Q -0.7064391 0.5773880 -1.2235 0.2211377
rd_sft.C 0.0130127 0.4408486 0.0295 0.9764519
rd_sft^4 -0.2886550 0.3582014 -0.8058 0.4203318
I obtained the results as below. Am unable to interpret the results. Any leads in this can be very helpful.
In case of rd_sft (road safety ) as rd_sft.L (linear) is signiicant than other levels, can we neglect the other levels i.e Q,C,^4 in model formation ??
please through some light on model formulation and its intepretation as i am new to R.

Text mining Clustering Analysis in R - Error :Two dimensional array

I'm trying to follow a document that has some code on text mining clustering analysis.
I'm fairly new to R and the concept of text mining/clustering so please bear with me if i sound illiterate.
I create a simple matrix called dtm and then run kmeans to produce 3 clusters. The code im having issues is where a function has been defined to get "five most common words of the documents in the cluster"
dtm0.75 = as.matrix(dt0.75)
dim(dtm0.75)
kmeans.result = kmeans(dtm0.75, 3)
perClusterCounts = function(df, clusters, n)
{
v = sort(colSums(df[clusters == n, ]),
decreasing = TRUE)
d = data.frame(word = names(v), freq = v)
d[1:5, ]
}
perClusterCounts(dtm0.75, kmeans.result$cluster, 1)
Upon running this code i get the following error:
Error in colSums(df[clusters == n, ]) :
'x' must be an array of at least two dimensions
Could someone help me fix this please?
Thank you.
I can't reproduce your error, it works fine for me. Update your question with a reproducible example and you might get a more useful answer. Perhaps your input data object is empty, what do you get with dim(dtm0.75)?
Here it is working fine on the data that comes with the tm package:
library(tm)
data(crude)
dt0.75 <- DocumentTermMatrix(crude)
dtm0.75 = as.matrix(dt0.75)
dim(dtm0.75)
kmeans.result = kmeans(dtm0.75, 3)
perClusterCounts = function(df, clusters, n)
{
v = sort(colSums(df[clusters == n, ]),
decreasing = TRUE)
d = data.frame(word = names(v), freq = v)
d[1:5, ]
}
perClusterCounts(dtm0.75, kmeans.result$cluster, 1)
word freq
the the 69
and and 25
for for 12
government government 11
oil oil 10

customizable PageRank algorithm in Gremlin?

I'm looking for a Gremlin version of a customizable PageRank algorithm. There are a few old versions out there, one (from: http://www.infoq.com/articles/graph-nosql-neo4j) is pasted below. I'm having trouble fitting the flow into the current GremlinGroovyPipeline-based structure. What is the modernized equivalent of this or something like it?
$_g := tg:open()
g:load('data/graph-example-2.xml')
$m := g:map()
$_ := g:key('type', 'song')[g:rand-nat()]
repeat 2500
$_ := ./outE[#label='followed_by'][g:rand-nat()]/inV
if count($_) > 0
g:op-value('+',$m,$_[1]/#name, 1.0)
end
if g:rand-real() > 0.85 or count($_) = 0
$_ := g:key('type', 'song')[g:rand-nat()]
end
end
g:sort($m,'value',true())
Another version is available on slide 55 of http://www.slideshare.net/slidarko/gremlin-a-graphbased-programming-language-3876581. The ability to use the if statements and change the traversal based on them is valuable for customization.
many thanks
I guess I'll answer it myself in case somebody else needs it. Be warned that this is not a very efficient PageRank calculation. It should only be viewed as a learning example.
g = new TinkerGraph()
g.loadGraphML('graph-example-2.xml')
m = [:]
g.V('type','song').sideEffect{m[it.name] = 0}
// pick a random song node that has 'followed_by' edge
def randnode(g) {
return(g.V('type','song').filter{it.outE('followed_by').hasNext()}.shuffle[0].next())
}
v = randnode(g)
for(i in 0..2500) {
v = v.outE('followed_by').shuffle[0].inV
v = v.hasNext()?v.next():null
if (v != null) {
m[v.name] += 1
}
if ((Math.random() > 0.85) || (v == null)) {
v = randnode(g)
}
}
msum = m.values().sum()
m.each{k,v -> m[k] = v / msum}
println "top 10 songs: (normalized PageRank)"
m.sort {-it.value }[0..10]
Here's a good reference for a simplified one-liner:
https://groups.google.com/forum/m/#!msg/gremlin-users/CRIlDpmBT7g/-tRgszCTOKwJ
(as well as the Gremlin wiki: https://github.com/tinkerpop/gremlin/wiki)

Segmentation fault: 11 c structures

I have an array of structure records and a function insert to insert or update the records.
The insert function take list(an array of records),name (book name),author, year,copies and size n of the list
It updates the record if it find's one otherwise inserts a new one. here n=7
void insert(struct books *list,char name[],char author[],int year,int copies,int n)
{
int i,found=0,empty;
for(i=0;(i<n) && (found==0);i++)
{
// update works fine
if( strcmp(name,list[i].name)==0 && strcmp(author,list[i].author)==0 )
{
list[i].copies=copies;
list[i].year=year;
printf("\n\n####################################################\n");
printf("####\tRecord was successfully updated!\t####\n");
printf("####################################################\n");
found=1;
}
//get an empty record
if(strcmp(list[i].author,"i")==0){empty=i;}
}
//insert gives segmentation error
if(found==0)
{
strcpy(list[empty].name,name);
strcpy(list[empty].author,author);
list[empty].year=year;
list[empty].copies=copies;
printf("\n\n####################################################\n");
printf("####\tRecord was successfully inserted!\t####\n");
printf("####################################################\n");
}
}
My list array is:
A
Ruby On Rails
2004
100
B
Inferno
1993
453
C
Harry Potter and the soccers stones
2012
150
D
Harry Potter and the soccers stone
2012
150
E
Learn Python Easy Way
1967
100
F
Ruby On Rails
2004
130
i
i
0
0
Why is it giving Segmentation error: 11?
Probably you need to initialize empty. Do
empty=0;
And really, SO is not a debugging service. So stop asking questions like this.

Getting current position of one of the multiple objects in a figure?

I wrote a script that returns several text boxes in a figure. The text boxes are moveable (I can drag and drop them), and their positions are predetermined by the data in an input matrix (the data from the input matrix is applied to the respective positions of the boxes by nested for loop). I want to create a matrix which is initially a copy of the input matrix, but is UPDATED as I change the positions of the boxes by dragging them around. How would I update their positions? Here's the entire script
function drag_drop=drag_drop(tsinput,infoinput)
[x,~]=size(tsinput);
dragging = [];
orPos = [];
fig = figure('Name','Docker Tool','WindowButtonUpFcn',#dropObject,...
'units','centimeters','WindowButtonMotionFcn',#moveObject,...
'OuterPosition',[0 0 25 30]);
% Setting variables to zero for the loop
plat_qty=0;
time_qty=0;
k=0;
a=0;
% Start loop
z=1:2
for idx=1:x
if tsinput(idx,4)==1
color='red';
else
color='blue';
end
a=tsinput(idx,z);
b=a/100;
c=floor(b); % hours
d=c*100;
e=a-d; % minutes
time=c*60+e; % time quantity to be used in 'position'
time_qty=time/15;
plat_qty=tsinput(idx,3)*2;
box=annotation('textbox','units','centimeters','position',...
[time_qty plat_qty 1.5 1.5],'String',infoinput(idx,z),...
'ButtonDownFcn',#dragObject,'BackgroundColor',color);
% need to new=get(box,'Position'), fill out matrix OUT of loop
end
fillmenu=uicontextmenu;
hcb1 = 'set(gco, ''BackgroundColor'', ''red'')';
hcb2 = 'set(gco, ''BackgroundColor'', ''blue'')';
item1 = uimenu(fillmenu, 'Label', 'Train Full', 'Callback', hcb1);
item2 = uimenu(fillmenu, 'Label', 'Train Empty', 'Callback', hcb2);
hbox=findall(fig,'Type','hggroup');
for jdx=1:x
set(hbox(jdx),'uicontextmenu',fillmenu);
end
end
new_arr=tsinput;
function dragObject(hObject,eventdata)
dragging = hObject;
orPos = get(gcf,'CurrentPoint');
end
function dropObject(hObject,eventdata,box)
if ~isempty(dragging)
newPos = get(gcf,'CurrentPoint');
posDiff = newPos - orPos;
set(dragging,'Position',get(dragging,'Position') + ...
[posDiff(1:2) 0 0]);
dragging = [];
end
end
function moveObject(hObject,eventdata)
if ~isempty(dragging)
newPos = get(gcf,'CurrentPoint');
posDiff = newPos - orPos;
orPos = newPos;
set(dragging,'Position',get(dragging,'Position') + [posDiff(1:2) 0 0]);
end
end
end
% Testing purpose input matrices:
% tsinput=[0345 0405 1 1 ; 0230 0300 2 0; 0540 0635 3 1; 0745 0800 4 1]
% infoinput={'AJ35 NOT' 'KL21 MAN' 'XPRES'; 'ZW31 MAN' 'KM37 NEW' 'VISTA';
% 'BC38 BIR' 'QU54 LON' 'XPRES'; 'XZ89 LEC' 'DE34 MSF' 'DERP'}
If I understand you correctly (and please post some code if I'm not), then all you need is indeed a set/get combination.
If boxHandle is a handle to the text-box object, then you get its current position by:
pos = get (boxHandle, 'position')
where pos is the output array of [x, y, width, height].
In order to set to a new position, you use:
set (boxHandle, 'position', newPos)
where newPos is the array of desired position (with the same structure as pos).
EDIT
Regarding to updating your matrix, since you have the handle of the object you move, you actually DO have access to the specific text box.
When you create each text box, set a property called 'UserData' with the associated indices of tsinput used for that box. In your nested for loop add this
set (box, 'UserData', [idx, z]);
after the box is created, and in your moveObject callback get the data by
udata = get(dragging,'UserData');
Then udata contains the indices of the elements you want to update.

Resources