read a graph by vertices not as an edge list in R - arrays

To explain: I have an undirected graph stored in a text file as edges where each line consist of two values represent an edge, like:
5 10
1000 2
212 420
.
.
.
Normally when reading a graph in R from a file (using igraph), it will be read as edges so to call the edges of the graph "g" we write E(g) and to call the vertices of "g" we write V(g) and to call both vertices of a certain edge (i.e to call a certain edge (edge i)) we write E(g)[i].
My question: Is there a similar way to call one vertex only inside an edge not to call both of them.
For example, if I need the second vertex in the third edge then what I need to type?
Also from the beginning, is there something on igraph to read the graph as vertices and not as edges? like to read the graph as a table with two columns such that each edge to be read as X[i][1], X[i][2].
I need this because I want to do a loop among all vertices and to choose them separately from the edge and I think it is possible if each vertex was labeled like an element in a table.
Many thanks in advance for any help

If you have a two column table with vertices, you could use graph_from_data_frame to convert it into graph. To get nodes on particular edge, you can use ends.
#DATA
set.seed(2)
m = cbind(FROM = sample(LETTERS[1:5], 10, TRUE), TO = sample(LETTERS[6:10], 10, TRUE))
#Convert to graph
g = graph_from_data_frame(m, directed = FALSE)
#plot(g)
#Second vertex on third edge
ends(graph = g, es = 3)[2]
#[1] "I"

Related

Create a regular graph through one vertex deletion

The problem: Given an undirected graph, implemented using adjacency list. I'm looking for an algorithm to transform it to a regular graph (each vertex has same degree) through one vertex deletion.
For example:
Iterate all vertex, partition them by their degrees.
If all have same degree, its only possible if there is a vertex that has degree n - 1.
If you can partition them into 2 different degrees set: Let´s call X the set with the lower degree and Y the one with higher. Lets call dg(X) and dg(Y) the degree of those vertex
If one of the partitions has only 1 vertex and its degree is either 0 or the amount of vertex in the other set, remove it
If dg(Y) - dg(X) > 1, its not possible
If dg(Y) - dg(X) = 1 and |Y| = dg(X), check if a vertex from X is connected to all vertex from Y and remove it.
If dg(Y) - dg(X) = 1 and |X| = dg(Y), check if a vertex from Y is connected to all vertex from X and remove it.
Any other case is not possible with 2 partitions
If you can partition into 3 sets:
One of them must have only 1 vertex and that vertex has to be connected to all vertex from the other highest degree set, and to none of the remaining set. The degree difference between the other highest degree set and the remaining set must also be 1
Any other case, its not possible

Matlab: Saving data in a matrix

New to Matlab! I've been messing around with this open source code to make it do what I want and it does, but not the way I want it to. I just need some help wrapping this up.
This is what I have so far:
clear
global geodesic_library;
geodesic_library = 'geodesic_debug'; %"release" is faster and "debug" does additional checks
rand('state', 0); %comment this statement if you want to produce random mesh every time
load V3_Elements4GeodesicD.k
load V3_Nodes4GeodesicD.k
vertices = V3_Nodes4GeodesicD (:, [2 3 4]);
faces = V3_Elements4GeodesicD (:, [3 4 5]);
N = 12240; %number of points in a mesh
mesh = geodesic_new_mesh(vertices,faces); %initilize new mesh
algorithm = geodesic_new_algorithm(mesh, 'exact'); %initialize new geodesic algorithm
vertex_id = 6707 ; %create a single source at vertex #1
source_points = {geodesic_create_surface_point('vertex',vertex_id,vertices(vertex_id,:))};
geodesic_propagate(algorithm, source_points); %propagation stage of the algorithm (the most time-consuming)
vertex_id = 12240; %create a single destination at vertex #N
destination = geodesic_create_surface_point('vertex',vertex_id,vertices(vertex_id,:));
path = geodesic_trace_back(algorithm, destination); %find a shortest path from source to destination
distances = zeros(N,1); %find distances to all vertices of the mesh (actual pathes are not computed)
[source_id, distances] = geodesic_distance_and_source(algorithm) %find distances to all vertices of the mesh; in this example we have a single source, so source_id is always equal to 1
geodesic_delete; %delete all meshes and algorithms
It prints out distances and then, in subsequent code, it plots the path.
So here's my problem. It prints out 12000+ distances corresponding to each of my "sources" but I only care about distances between 10 sources and 12 destinations on my mesh, given by vertices and faces. How can I get it to print the 120 distances I care about and store them in a matrix?
In MATALB, if you don't put a semicolon at the end of your statement, then, the output of that statement gets printed on the console. So, your following statement:
[source_id, distances] = geodesic_distance_and_source(algorithm)
does not have a semicolon. I suspect that's where you see 12000 distances printed out.
To answer your second question: I don't have enough information about the structure of matrix distances. I think you can use indexing to find out distance between source m and destination n as distances(m,n). That's how usually distance matrices are structured, but I can't say for sure.

Splitting an array into n parts and then joining them again forming a histogram

I am new to Matlab.
Lets say I have an array a = [1:1:1000]
I have to divide this into 50 parts 1-20; 21-40 .... 981-1000.
I am trying to do it this way.
E=1000X
a=[1:E]
n=50
d=E/n
b=[]
for i=0:n
b(i)=a[i:d]
end
But I am unable to get the result.
And the second part I am working on is, depending on another result, say if my answer is 3, the first split array should have a counter and that should be +1, if the answer is 45 the 3rd split array's counter should be +1 and so on and in the end I have to make a histogram of all the counters.
You can do all of this with one function: histc. In your situation:
X = (1:1:1000)';
Edges = (1:20:1000)';
Count = histc(X, Edges);
Essentially, Count contains the number of elements in X that fall into the categories defined in Edges, where Edges is a monotonically increasing vector whose elements define the boundaries of sequential categories. A more common example might be to construct X using a probability density, say, the uniform distribution, eg:
X = 1000 * rand(1000, 1);
Play around with specifications for X and Edges and you should get the idea. If you want the actual histogram plot, look into the hist function.
As for the second part of your question, I'm not really sure what you're asking.

What's the fastest way to find deepest path in a 3D array?

I've been trying to find solution to my problem for more than a week and I couldn't find out anything better than a milion iterations prog, so I think it's time to ask someone to help me.
I've got a 3D array. Let's say, we're talking about the ground and the first layer is a surface.
Another layers are floors below the ground. I have to find deepest path's length, count of isolated caves underground and the size of the biggest cave.
Here's the visualisation of my problem.
Input:
5 5 5 // x, y, z
xxxxx
oxxxx
xxxxx
xoxxo
ooxxx
xxxxx
xxoxx
and so...
Output:
5 // deepest path - starting from the surface
22 // size of the biggest cave
3 // number of izolated caves (red ones) (izolated - cave that doesn't reach the surface)
Note, that even though red cell on the 2nd floor is placed next to green one, It's not the same cave because it's placed diagonally and that doesn't count.
I've been told that the best way to do this, might be using recursive algorithm "divide and rule" however I don't really know how could it look like.
I think you should be able to do it in O(N).
When you parse your input, assign each node a 'caveNumber' initialized to 0. Set it to a valid number whenever you visit a cave:
CaveCount = 0, IsolatedCaveCount=0
AllSizes = new Vector.
For each node,
ProcessNode(size:0,depth:0);
ProcessNode(size,depth):
If node.isCave and !node.caveNumber
if (size==0) ++CaveCount
if (size==0 and depth!=0) IsolatedCaveCount++
node.caveNumber = CaveCount
AllSizes[CaveCount]++
For each neighbor of node,
if (goingDeeper) depth++
ProcessNode(size+1, depth).
You will visit each node 7 times at worst case: once from the outer loop, and possibly once from each of its six neighbors. But you'll only work on each one once, since after that the caveNumber is set, and you ignore it.
You can do the depth tracking by adding a depth parameter to the recursive ProcessNode call, and only incrementing it when visiting a lower neighbor.
The solution shown below (as a python program) runs in time O(n lg*(n)), where lg*(n) is the nearly-constant iterated-log function often associated with union operations in disjoint-set forests.
In the first pass through all cells, the program creates a disjoint-set forest, using routines called makeset(), findset(), link(), and union(), just as explained in section 22.3 (Disjoint-set forests) of edition 1 of Cormen/Leiserson/Rivest. In later passes through the cells, it counts the number of members of each disjoint forest, checks the depth, etc. The first pass runs in time O(n lg*(n)) and later passes run in time O(n) but by simple program changes some of the passes could run in O(c) or O(b) for c caves with a total of b cells.
Note that the code shown below is not subject to the error contained in a previous answer, where the previous answer's pseudo-code contains the line
if (size==0 and depth!=0) IsolatedCaveCount++
The error in that line is that a cave with a connection to the surface might have underground rising branches, which the other answer would erroneously add to its total of isolated caves.
The code shown below produces the following output:
Deepest: 5 Largest: 22 Isolated: 3
(Note that the count of 24 shown in your diagram should be 22, from 4+9+9.)
v=[0b0000010000000000100111000, # Cave map
0b0000000100000110001100000,
0b0000000000000001100111000,
0b0000000000111001110111100,
0b0000100000111001110111101]
nx, ny, nz = 5, 5, 5
inlay, ncells = (nx+1) * ny, (nx+1) * ny * nz
masks = []
for r in range(ny):
masks += [2**j for j in range(nx*ny)][nx*r:nx*r+nx] + [0]
p = [-1 for i in range(ncells)] # parent links
r = [ 0 for i in range(ncells)] # rank
c = [ 0 for i in range(ncells)] # forest-size counts
d = [-1 for i in range(ncells)] # depths
def makeset(x): # Ref: CLR 22.3, Disjoint-set forests
p[x] = x
r[x] = 0
def findset(x):
if x != p[x]:
p[x] = findset(p[x])
return p[x]
def link(x,y):
if r[x] > r[y]:
p[y] = x
else:
p[x] = y
if r[x] == r[y]:
r[y] += 1
def union(x,y):
link(findset(x), findset(y))
fa = 0 # fa = floor above
bc = 0 # bc = floor's base cell #
for f in v: # f = current-floor map
cn = bc-1 # cn = cell#
ml = 0
for m in masks:
cn += 1
if m & f:
makeset(cn)
if ml & f:
union(cn, cn-1)
mr = m>>nx
if mr and mr & f:
union(cn, cn-nx-1)
if m & fa:
union(cn, cn-inlay)
ml = m
bc += inlay
fa = f
for i in range(inlay):
findset(i)
if p[i] > -1:
d[p[i]] = 0
for i in range(ncells):
if p[i] > -1:
c[findset(i)] += 1
if d[p[i]] > -1:
d[p[i]] = max(d[p[i]], i//inlay)
isola = len([i for i in range(ncells) if c[i] > 0 and d[p[i]] < 0])
print "Deepest:", 1+max(d), " Largest:", max(c), " Isolated:", isola
It sounds like you're solving a "connected components" problem. If your 3D array can be converted to a bit array (e.g. 0 = bedrock, 1 = cave, or vice versa) then you can apply a technique used in image processing to find the number and dimensions of either the foreground or background.
Typically this algorithm is applied in 2D images to find "connected components" or "blobs" of the same color. If possible, find a "single pass" algorithm:
http://en.wikipedia.org/wiki/Connected-component_labeling
The same technique can be applied to 3D data. Googling "connected components 3D" will yield links like this one:
http://www.ecse.rpi.edu/Homepages/wrf/pmwiki/pmwiki.php/Research/ConnectedComponents
Once the algorithm has finished processing your 3D array, you'll have a list of labeled, connected regions, and each region will be a list of voxels (volume elements analogous to image pixels). You can then analyze each labeled region to determine volume, closeness to the surface, height, etc.
Implementing these algorithms can be a little tricky, and you might want to try a 2D implementation first. Thought it might not be as efficient as you like, you could create a 3D connected component labeling algorithm by applying a 2D algorithm iteratively to each layer and then relabeling the connected regions from the top layer to the bottom layer:
For layer 0, find all connected regions using the 2D connected component algorithm
For layer 1, find all connected regions.
If any labeled pixel in layer 0 sits directly over a labeled pixel in layer 1, change all the labels in layer 1 to the label in layer 0.
Apply this labeling technique iteratively through the stack until you reach layer N.
One important considering in connected component labeling is how one considers regions to be connected. In a 2D image (or 2D array) of bits, we can consider either the "4-connected" region of neighbor elements
X 1 X
1 C 1
X 1 X
where "C" is the center element, "1" indicates neighbors that would be considered connected, and "X" are adjacent neighbors that we do not consider connected. Another option is to consider "8-connected neighbors":
1 1 1
1 C 1
1 1 1
That is, every element adjacent to a central pixel is considered connected. At first this may sound like the better option. In real-world 2D image data a chessboard pattern of noise or diagonal string of single noise pixels will be detected as a connected region, so we typically test for 4-connectivity.
For 3D data you can consider either 6-connectivity or 26-connectivity: 6-connectivity considers only the neighbor pixels that share a full cube face with the center voxel, and 26-connectivity considers every adjacent pixel around the center voxel. You mention that "diagonally placed" doesn't count, so 6-connectivity should suffice.
You can observe it as a graph where (non-diagonal) adjacent elements are connected if they both empty (part of a cave). Note that you don't have to convert it to a graph, you can use normal 3d array representation.
Finding caves is the same task as finding the connected components in a graph (O(N)) and the size of a cave is the number of nodes of that component.

Random walks in directed graphs/networks

I have a weighted graph with (in practice) up to 50,000 vertices. Given a vertex, I want to randomly choose an adjacent vertex based on the relative weights of all adjacent edges.
How should I store this graph in memory so that making the selection is efficient? What is the best algorithm? It could be as simple as a key value store for each vertex, but that might not lend itself to the most efficient algorithm. I'll also need to be able update the network.
Note that I'd like to take only one "step" at a time.
More Formally: Given a weighted, directed, and potentially complete graph, let W(a,b) be the weight of edge a->b and let Wa be the sum of all edges from a. Given an input vertex v, I want to choose a vertex randomly where the likelihood of choosing vertex x is W(v,x) / Wv
Example:
Say W(v,a) = 2, W(v,b) = 1, W(v,c) = 1.
Given input v, the function should return a with probability 0.5 and b or c with probability 0.25.
If you are concerned about the performance of generating the random walk you may use the alias method to build a datastructure which fits your requirements of choosing a random outgoing edge quite well. The overhead is just that you have to assign each directed edge a probability weight and a so-called alias-edge.
So for each note you have a vector of outgoing edges together with the weight and the alias edge. Then you may choose random edges in constant time (only the generation of th edata structure is linear time with respect to number of total edges or number of node edges). In the example the edge is denoted by ->[NODE] and node v corresponds to the example given above:
Node v
->a (p=1, alias= ...)
->b (p=3/4, alias= ->a)
->c (p=3/4, alias= ->a)
Node a
->c (p=1/2, alias= ->b)
->b (p=1, alias= ...)
...
If you want to choose an outgoing edge (i.e. the next node) you just have to generate a single random number r uniform from interval [0,1).
You then get no=floor(N[v] * r) and pv=frac(N[v] * r) where N[v] is the number of outgoing edges. I.e. you pick each edge with the exact same probability (namely 1/3 in the example of node v).
Then you compare the assigned probability p of this edge with the generated value pv. If pv is less you keep the edge selected before, otherwise you choose its alias edge.
If for example we have r=0.6 from our random number generator we have
no = floor(0.6*3) = 1
pv = frac(0.6*3) = 0.8
Therefore we choose the second outgoing edge (note the index starts with zero) which is
->b (p=3/4, alias= ->a)
and switch to the alias edge ->a since p=3/4 < pv.
For the example of node v we therefore
choose edge b with probability 1/3*3/4 (i.e. whenever no=1 and pv<3/4)
choose edge c with probability 1/3*3/4 (i.e. whenever no=2 and pv<3/4)
choose edge a with probability 1/3 + 1/3*1/4 + 1/3*1/4 (i.e. whenever no=0 or pv>=3/4)
In theory the absolutely most efficient thing to do is to store, for each node, the moral equivalent of a balanced binary tree (red-black, or BTree, or skip list all fit) of the connected nodes and their weights, and the total weight to each side. Then you can pick a random number from 0 to 1, multiply by the total weight of the connected nodes, then do a binary search to find it.
However traversing a binary tree like that involves a lot of choices, which have a tendency to create pipeline stalls. Which are very expensive. So in practice if you're programming in an efficient language (eg C++), if you've got less than a couple of hundred connected edges per node, a linear list of edges (with a pre-computed sum) that you walk in a loop may prove to be faster.

Resources