Maximal matching in a bipartite graph [closed]

Maximal matching in a bipartite graph [closed] - c

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 11 years ago.
I am stuck with a maximal matching in a bipartite graph problem. The problem goes something like this:
Given a board with m circular holes and given a set of n circular discs. Holes are numbered as h1, ..., hm, and discs as d1, ..., dn.
We have a matrix A of m rows and n columns. A[i][j] = 1 if hi can fit dj (i.e., diameter of hi ≥ diameter of dj), and 0 otherwise.
Given the condition that any hole can contain at most one disc, I need to find the configuration for which holedisc fitting is maximal.
I have read that this problem can be modelled into network flow problem, but could not exactly follow how. Can someone explain how to do this? Also, is there any C code for this that I might be able to look at?

The reduction from bipartite matching to maximum flow is actually quite beautiful. When you are given a bipartite graph, you can think of the graph as two columns of nodes connected by edges from the first column to the second:
A ----- 1
B --\ 2
C \- 3
... ...
Z n
To reduce the problem to max-flow, you begin by directing all of the edges from the first column to the second column so that flow can only move from the left column to the right. After you do this, you introduce two new nodes s and t that act as the source and terminal nodes. You position s so that it is connected to all of the nodes on the left side and t so that each node in the right side is connected to it. For example:
A ----- 1
/ B --\ 2 \
s- C \- 3 - t
\ ... ... /
Z n
The idea here is that any path you can take from s to t must enter one of the nodes in the left column, then cross some edge to the right column, and from there to t. Thus there is an easy one-to-one mapping from an edge in a matching and an s-t path: just take the path from s to the source of the edge, then follow the edge, then follow the edge from the endpoint to the node t. At this point, our goal is to find the way to maximize the number of node-disjoint paths from s to t. We can accomplish this using a maximum-flow as follows. First, set the capacity of each edge leaving s to be 1. This ensures that at most one unit of flow enters each of the nodes in the first column. Similarly, set the capacity of each edge crossing the two columns to be one, ensuing that we either pick the edge or don't, rather than possibly picking it with some multiplicity. Finally, set the capacity of the edges leaving the second column into t to be one as well. This ensures that each node in the right-hand side is only matched once, since we can't push more than one unit of flow past it.
Once you've constructed the flow network, compute a maximum flow using any of the standard algorithms. Ford-Fulkerson is a simple algorithm that performs well here, since the maximum flow in the graph is equal to the number of nodes. It has a worst-case performance of O(mn). Alternatively, the highly optimized Hopcroft-Karp algorithm can do this in O(m√n) time, which can be much better.
As for a C implementation, a quick Google search for the Ford-Fulkerson step turned up this link. You'd need to construct the flow network before passing it into this code, but the construction isn't too complex and I think that you shouldn't have much trouble with it.
Hope this helps!

Related

Looking for an algorithm to find the shortest path

Basically I have a graph with 12 nodes (representing cities) and 13 edges (representing routes).
Now let's say that (randomly) I have a plan for visiting n nodes, departing from a specific one (A). So (having N <= (12-1)) and then come to the starting point.
For what I've been looking, it seems almost like the Traveling Salesman Problem but with the difference that in my salesman doesn't necessarily needs to visit all nodes.
What algorithm am I looking for?
EDIT
Apparently this is not going to be a TSP, because TSP says that the graph must be closed and we go through every city (node) only once. In my case, it can cross a city more than once, if it makes the route shorter.
A few more examples for what am I looking for:
Example one:
Depart from: A
Need to visit (B,D,E,L,G,J,K)
Come back to: A
Example two:
Depart from: A
Need to visit (B,C,D,G,H,I,J,K)
Come back to: A
Rules:
- Get shortest path
- No specific order
- Can visit one node (city) more than once
Remember, this is for a project in C, so this is just pre-coding research.

There are a lot of algorithms out there doing this. The catchword is path-finding.
The best algorithm to learn from at the beginning is the good old Dijkstra http://en.wikipedia.org/wiki/Dijkstra%27s_algorithm
Then for larger graphs (that are no maze) you might want an algorithm with some direction heuristics making evaluation faster like the A* algorithm. http://en.wikipedia.org/wiki/A*
There are others, but these are tthe two most common.
Update from the discussion:
From our discussion I think going trough all permutations of the "must have nodes" B|L|I|D|G|K|J, starting from A and then going to A again would be an approach to solve it:
// Prepare a two dimensional array for the permutations
Node permutation[permutationCount][7];
// Fill it with all permutations
...
int cost[permutationCount];
for (int i = 0; i < permutationCount; ++i) {
cost[i] = dijkstraCost(nodeA, permutation[i][0])
+ dijkstraCost(permutation[i][0], permutation[i][1])
+ dijkstraCost(permutation[i][1], permutation[i][2])
+ dijkstraCost(permutation[i][2], permutation[i][3])
+ dijkstraCost(permutation[i][3], permutation[i][4])
+ dijkstraCost(permutation[i][4], permutation[i][5])
+ dijkstraCost(permutation[i][5], permutation[i][6])
+ dijkstraCost(permutation[i][6], nodeA);
}
// Now Evaluate lowest cost and you have your shortest path(s)
....
I think that should work.

You are right it is a TSP, but what you need to do is too reduce the graph so it only contains nodes that are to be visited.
How to reduce the graph is left as an exercise for the reader ;-)

How does the winged-edge structure for meshes work?

I'm implementing an algorithm in which I need manipulate a mesh, adding and deleting edges quickly and iterating quickly over the edges adjacent to a vertex in CCW or CW order.
The winged-edge structure is used in the description of the algorithm I'm working from, but I can't find any concise descriptions of how to perform those operations on this data structure.

I've learned about it in University but that was a while ago.
In response to this question i've searched the web too for any good documentation, found none that is good, but we can go through a quick example for CCW and CW order and insertion/deletion here.
Have a look at this table and graphic:
from this page:
http://www.cs.mtu.edu/~shene/COURSES/cs3621/NOTES/model/winged-e.html
The table gives only the entry for one edge a, in a real table you have this row for every edge. You can see you get the:
left predecessor,
left successor,
right predecessor,
right successor
but here comes the critical point: it gives them relative to the direction of the edge which is X->Y in this case, and when it is right-traversed (e->a->c).
So for the CW-order of going through the graph this is very easy to read: edge a left has right-successor c and then you look into the row for edge c.
Ok, this table is easy to read for CW-order traversal; for CCW you have to think "from which edge did i come from when i walked this edge backwards". Effectively you get the next edge in CCW-order by taking the left-traverse-predecessor in this case b and continue with the row-entry for edge b in the same manner.
Now insertion and deletion: It is clear that you cant just remove the edge and think that the graph would still consist of only triangles; during deletion you have to join two vertices, for example X and Y in the graphic. To do this you first have to make sure that everywhere the edge a is referred-to we have to fix that reference.
So where can a be referred-to? only in the edges b,c,d and e (all other edges are too far away to know a) plus in the vertex->edge-table if you have that (but let's only consider the edges-table in this example).
As an example of how we have to fix edges lets take a look at c. Like a, c has a left and right pre- and successor (so 4 edges), which one of those is a? We cannot know that without checking because the table-entry for c can have the node Y in either its Start- or End-Node. So we have to check which one it is, let's assume we find that c has Y in its Start-Node, we then have to check whether a is c's right predecessor (which it is and which we find out by looking at c's entry and comparing it to a) OR whether it is c's right successor. "Successor??" you might ask? Yes because remember the two "left-traverse"-columns are relative to going the edge backward. So, now we have found that a is c's right predecessor and we can fix that reference by inserting a's right predecessor. Continue with the other 3 edges and you are done with the edges-table. Fixing an additional Node->Vertices is trivial of course, just look into the entries for X and Y and delete a there.
Adding edges is basically the reverse of this fix-up of 4 other edges BUT with a little twist. Lets call the node which we want to split Z (it will be split into X and Y). You have to take care that you split it in the right direction because you can have either d and e combined in a node or e and c (like if the new edge is horizontal instead of the vertical a in the graphic)! You first have to find out between which 2 edges of the soon-to-be X and between which 2 edges of Y the new edge is added: You just choose which edges shall be on one node and which on the other node: In this example graphic: choose that you want b, c and the 2 edges to the north in between them on one node, and it follows that the other edges are on the other node which will become X. You then find by vector-subtraction that the new edge a has to be between b and c, not between say c and one of the 2 edges in the north. The vector-subtraction is the desired position of the new X minus the desired position of Y.

R Tree 50,000 foot overview?

I'm working on a school project that involves taking a lat/long point and finding the top five closest points in a known list of places. The list is to be stored in memory, with the caveat that we must choose an "appropriate data structure" -- that is, we cannot simply store all the places in an array and compare distances one-by-one in a linear fashion. The teacher suggested grouping the place data by US State to prevent calculating the distance for places that are obviously too far away. I think I can do better.
From my research online it seems like an R-Tree or one of its variants might be a neat solution. Unfortunately, that sentence is as far as I've gotten with understanding the actual technique, as the literature is simply too dense for my non-academic head.
Can somebody give me a really high overview of what the process is for populating an R-Tree with lat/long data, and then traversing the tree to find those 5 nearest neighbors of a given point?
Additionally the project is in C, and I don't have to reinvent the wheel on this, so if you've used an existing open source C implementation of an R Tree I'd be interested in your experiences.
UPDATE: This blog post describes a straightforward search algorithm for a regionally partitioned space (like a PR quadtree). Hope that helps a future reader.

Have you considered alternative data structures?
I believe, instead of R-tree a Point Quadtree would be more effective for your need.Spatial Index Demos provides some demos for a list of possible data structures including R-tree and Point Quadtree. Hope it gives an insight.

Quad Trees
A quad tree takes a square of space and divides it into four children with half the dimensions along the X and Y axis.
+---+---+
| | | Each square is a child
| | | of the parent; when you
+---+---+ get to leaves a node has
| | | a single point or a list
| | | of points.
+---+---+
This data structure is recursive and you search for points by checking which child holds the point until you get to the leaf. A leaf either has a single member (point with X,Y coords) or a list of members, depending on the implementation. If you fill up a node you split it into 4 and distribute the children. Essentially, the data structure is a generalisation of a binary tree, so it is not necessarily balanced.
Balancing a quad tree may not be necessary for your purposes and is left as an exercise for the reader - try searching on the web for 'balanced quad tree'
Note that this data structure cannot index items that can overlap, but if you're only storing points this won't be a problem.
Finding nearest neighbours in a quad tree
Off the top of my head, here's a quick and dirty algorithm for finding the 'n' nearest neighbours to your point. It's not necessarily optimially efficient, but it will be fairly simple to implement. If someone has a link to a better one, feel free to post it in a comment or answer.
Locate the quad tree node containing
your point, keeping a list of its
parents.
Push all of the points in the
node into a priority queue based on
their distance from your base point
(i.e. by the length of the hypotenuse
per Pythagoras' theorem). Depending
on the implementation there may be
one or more per node. For a simple
implementation of a priority queue
data structure, look up 'binary
heap'.
If any of the 'n' points are further away then the edges of the bounding box, add the contents of its neighbours. i.e. If your base point is close to the edge of the bounding box, it is possible that neighbouring tree nodes might contain points that are closer than the points found within your bounding box. You will need to back up the tree to do this, which is why you need to keep track of your parent nodes.
When all of the 'n' closest points are closer than the edges of your bounding box you know that there could not possibly be neighbours that you have missed. Therefore, the 'n' closest points within this box must be your 'n' closest neighbours.

Suggestions of the easiest algorithms for some Graph operations

The deadline for this project is closing in very quickly and I don't have much time to deal with what it's left. So, instead of looking for the best (and probably more complicated/time consuming) algorithms, I'm looking for the easiest algorithms to implement a few operations on a Graph structure.
The operations I'll need to do is as follows:
List all users in the graph network given a distance X
List all users in the graph network given a distance X and the type of relation
Calculate the shortest path between 2 users on the graph network given a type of relation
Calculate the maximum distance between 2 users on the graph network
Calculate the most distant connected users on the graph network
A few notes about my Graph implementation:
The edge node has 2 properties, one is of type char and another int. They represent the type of relation and weight, respectively.
The Graph is implemented with linked lists, for both the vertices and edges. I mean, each vertex points to the next one and each vertex also points to the head of a different linked list, the edges for that specific vertex.
What I know about what I need to do:
I don't know if this is the easiest as I said above, but for the shortest path between 2 users, I believe the Dijkstra algorithm is what people seem to recommend pretty often so I think I'm going with that.
I've been searching and searching and I'm finding it hard to implement this algorithm, does anyone know of any tutorial or something easy to understand so I can implement this algorithm myself? If possible, with C source code examples, it would help a lot. I see many examples with math notations but that just confuses me even more.
Do you think it would help if I "converted" the graph to an adjacency matrix to represent the links weight and relation type? Would it be easier to perform the algorithm on that instead of the linked lists? I could easily implement a function to do that conversion when needed. I'm saying this because I got the feeling it would be easier after reading a couple of pages about the subject, but I could be wrong.
I don't have any ideas about the other 4 operations, suggestions?

List all users in the graph network given a distance X
A distance X from what? from a starting node or a distance X between themselves? Can you give an example? This may or may not be as simple as doing a BF search or running Dijkstra.
Assuming you start at a certain node and want to list all nodes that have distances X to the starting node, just run BFS from the starting node. When you are about to insert a new node in the queue, check if the distance from the starting node to the node you want to insert the new node from + the weight of the edge from the node you want to insert the new node from to the new node is <= X. If it's strictly lower, insert the new node and if it is equal just print the new node (and only insert it if you can also have 0 as an edge weight).
List all users in the graph network given a distance X and the type of relation
See above. Just factor in the type of relation into the BFS: if the type of the parent is different than that of the node you are trying to insert into the queue, don't insert it.
Calculate the shortest path between 2 users on the graph network given a type of relation
The algorithm depends on a number of factors:
How often will you need to calculate this?
How many nodes do you have?
Since you want easy, the easiest are Roy-Floyd and Dijkstra's.
Using Roy-Floyd is cubic in the number of nodes, so inefficient. Only use this if you can afford to run it once and then answer each query in O(1). Use this if you can afford to keep an adjacency matrix in memory.
Dijkstra's is quadratic in the number of nodes if you want to keep it simple, but you'll have to run it each time you want to calculate the distance between two nodes. If you want to use Dijkstra's, use an adjacency list.
Here are C implementations: Roy-Floyd and Dijkstra_1, Dijkstra_2. You can find a lot on google with "<algorithm name> c implementation".
Edit: Roy-Floyd is out of the question for 18 000 nodes, as is an adjacency matrix. It would take way too much time to build and way too much memory. Your best bet is to either use Dijkstra's algorithm for each query, but preferably implementing Dijkstra using a heap - in the links I provided, use a heap to find the minimum. If you run the classical Dijkstra on each query, that could also take a very long time.
Another option is to use the Bellman-Ford algorithm on each query, which will give you O(Nodes*Edges) runtime per query. However, this is a big overestimate IF you don't implement it as Wikipedia tells you to. Instead, use a queue similar to the one used in BFS. Whenever a node updates its distance from the source, insert that node back into the queue. This will be very fast in practice, and will also work for negative weights. I suggest you use either this or the Dijkstra with heap, since classical Dijkstra might take a long time on 18 000 nodes.
Calculate the maximum distance between 2 users on the graph network
The simplest way is to use backtracking: try all possibilities and keep the longest path found. This is NP-complete, so polynomial solutions don't exist.
This is really bad if you have 18 000 nodes, I don't know any algorithm (simple or otherwise) that will work reasonably fast for so many nodes. Consider approximating it using greedy algorithms. Or maybe your graph has certain properties that you could take advantage of. For example, is it a DAG (Directed Acyclic Graph)?
Calculate the most distant connected users on the graph network
Meaning you want to find the diameter of the graph. The simplest way to do this is to find the distances between each two nodes (all pairs shortest paths - either run Roy-Floyd or Dijkstra between each two nodes and pick the two with the maximum distance).
Again, this is very hard to do fast with your number of nodes and edges. I'm afraid you're out of luck on these last two questions, unless your graph has special properties that can be exploited.
Do you think it would help if I "converted" the graph to an adjacency matrix to represent the links weight and relation type? Would it be easier to perform the algorithm on that instead of the linked lists? I could easily implement a function to do that conversion when needed. I'm saying this because I got the feeling it would be easier after reading a couple of pages about the subject, but I could be wrong.
No, adjacency matrix and Roy-Floyd are a very bad idea unless your application targets supercomputers.

This assumes O(E log V) is an acceptable running time, if you're doing something online, this might not be, and it would require some higher powered machinery.
List all users in the graph network given a distance X
Djikstra's algorithm is good for this, for one time use. You can save the result for future use, with a linear scan through all the vertices (or better yet, sort and binary search).
List all users in the graph network given a distance X and the type of relation
Might be nearly the same as above -- just use some function where the weight would be infinity if it is not of the correct relation.
Calculate the shortest path between 2 users on the graph network given a type of relation
Same as above, essentially, just determine early if you match the two users. (Alternatively, you can "meet in the middle", and terminate early if you find someone on both shortest path spanning tree)
Calculate the maximum distance between 2 users on the graph network
Longest path is an NP-complete problem.
Calculate the most distant connected users on the graph network
This is the diameter of the graph, which you can read about on Math World.
As for the adjacency list vs adjacency matrix question, it depends on how densely populated your graph is. Also, if you want to cache results, then the matrix might be the way to go.

The simplest algorithm to compute shortest path between two nodes is Floyd-Warshall. It's just triple-nested for loops; that's it.
It computes ALL-pairs shortest path in O(N^3), so it may do more work than necessary, and will take a while if N is huge.

Update position of a point [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 13 years ago.
My problem is this:
I have a set of points in 3D space, and their positions are updated from time to time with a certain velocity. But i need to keep a minimal distance between them.
Can you help me with this?
EDIT: I am using C for the implementation of the algorithm.
Thanks in advance.

You can also do this using a physics simulation. This gives many more possibilities, but at a higher computational cost.
For example, others here suggest detecting collisions, but in your comment to duffymo you suggest you would might like a smooth deceleration to avoid collision. In this case, you could create an inter-particle force pushing them away from each other, and then calculate your velocity at each time step using a = F/m, and v = v0 + dt a, where F is the sum of the forces of all the particles on each other. For an example inter-particle force you could use something that looks like one of these:
Calculated from the Python code below. But really anything could work as long as it gets large enough near your minimum distance (so the points never come that close together), and it's zero beyond some distance (so the points aren't always repelled from each other).
from pylab import *
def repulse(x, c, rmin=1., fmax=100):
if x<=rmin:
return fmax
try:
f = c/(x-rmin)-5.
if f<0.:
f = 0.
if f>fmax:
return fmax
except:
f = fmax
return f
x = arange(0, 100, .01)
r = 0.*x
for c in range(0, 10):
for i, xv in enumerate(x):
r[i] = repulse(xv, 2.**c)
plot(x, r)
show()

If you want to keep a minimal distance d, you can always assume the points are made up of rigid balls of radius d/2. So whenever 2 balls come in contact (i.e. the distance is ≤ d), you change the velocity assuming an elastic collision. Look up your physics textbook for how to change the velocity in case of elastic collision.
(You may need to implement a quad-tree for efficient collision detection.)

Updating position given a velocity is easy - just use a first order difference for the velocity and calculate the position at the end of the time step.
"But I need to keep a minimal distance between them" - makes no sense at all. The distance between them will be governed by the physics of the process that gives you the velocity vector. Can you describe what you're trying to do?

The first thing you need to do is to detect when distance between 2 points becomes less than your minimal distance. The second one is to move point in a way to remove collisions.
The first part is circle-to-circle collission* detection basically, so the aproaches are the same: checking distance between every moved point and other points or using continious collision detection*(if points move by some simple laws).
The second part is up to you, there are too many ways.
(*) - googleable

Determining whether two particles will collide. Suppose you have two particles A and B, and you know their positions at time 0 and their velocities. Initially they are farther apart than the distance r; you want to know if and when they will come within r of each other.
Let's define two 3D vectors R and V. R = the position of B relative to A at time 0, B.position - A.position, and V = the velocity of B relative to A, B.velocity - A.velocity. The square of the distance between A and B at time t will be
square_of_distance(t)
= abs(R + V*t)^2
= dot(R + V*t, R + V*t)
= dot(R, R) + 2 * dot(R, V*t) + dot(V*t, V*t)
= dot(R, R) + 2 * dot(R, V) * t + dot(V, V) * t^2
where dot(v1, v2) is the dot product of two vectors and abs(v) is the vector length.
This turns out to be a simple quadratic function of t. Using the quadratic formula, you can find the times t, if any, when square_of_distance(t) = r2. That will tell you if and when the particles approach each other closer than r.
Determining which of a large number of particles will collide first. Of course you can take every possible pair of particles and calculate when they collide. That's O(n2). Improving on that is harder than the simple stuff we've been doing here so far.
Since you only really need to know about the next, say, n seconds, you can calculate a bounding box for each particle's path over that period of time, extend all those boxes by r in each direction, and see which ones, if any, overlap. This can be done using a modified kd-tree. Overlapping bounding boxes do not necessarily indicate actual collisions, only potential collisions. These potential collisions still have to be checked mathematically to see if there are any real collisions; this is just a way to reduce the amount of checking from O(n2) to something more manageable.