JGraphT graph visit/travel use case - jgrapht

I am new to JGraphT, and i have a use case, say a simple graph, A->B->C, and have (timed) events representing visiting the vertices, and they can come out of order, for example, event B (visits B) comes in first with event time (9:02), since it is the first, create a instance of the graph 1 (A->B->C) with B noted as visited. then event A comes in next with event time (9:00), since A is before (event time) B, and within a range (say 10 minute), A and B belong to the same graph instance, so I add A to instance 1, so now instance 1 has two vertices visited A and B.
I am wondering if anyone can give some suggestions/guide/direction as how JGraphT can help, with my custom implementation if necessary.
What I need;
Check if a new visit event (vertex) fits in an existing graph (instance) with my own logic to check event time.
Give me all the vertices visited for a graph (instance).

Related

Flink - processing consecutive events within time constraint

I have a use case and I think I need some help on how to approach it.
Because I am new to streaming and Flink I will try to be very descriptive in what I am trying to achieve. Sorry if I am not using to formal and correct language.
My code will be in java but I do not care to get code in python or just pseudo code or approach.
TL:DR
Group events of same key that are within some time limit.
Out of those events, create a result event only from the 2 most closest (time domain) events.
This require (I think) opening a window for each and every event that comes.
If you'll look ahead at the batch solution you will understand best my problem.
Background:
I have data coming from sensors as a stream from Kafka.
I need to use eventTime because that data comes unrecorded. The lateness that will give me 90% of events is about 1 minute.
I am grouping those events by some key.
What I want to do:
Depending on some event's fields - I would like to "join/mix" 2 events into a new event ("result event").
The first condition is that those consecutive events are WITHIN 30 seconds from each other.
The next conditions are simply checking some fields values and than deciding.
My psuedo solution:
open a new window for EACH event. That window should be of 1 minute.
For every event that comes within that minute - I want to check it's event time and see if it is 30 seconds from the initial window event. If yes - check for other condition and omit a new result stream.
The Problem - When a new event comes it needs to:
create a new window for itself.
Join only ONE window out of SEVERAL possible windows that are 30 seconds from it.
The question:
Is that possible?
In other words my connection is between two "consecutive" events only.
Thank you very much.
Maybe showing the solution for **BATCH case will show what I am trying to do best:**
for i in range(grouped_events.length):
event_A = grouped_events[i]
event_B = grouped_events[i+1]
if event_B.get("time") - event_A.get("time") < 30:
if event_B.get("color") == event_A.get("color"):
if event_B.get("size") > event_A.get("size"):
create_result_event(event_A, event_B)
My (naive) tries so far with Flink in java
**The sum function is just a place holder for my function to create a new result object...
First solution is just doing a simple time window and summing by some field
Second is trying to do some process function on the window and maybe there iterate throw all events and check for my conditions?
DataStream
.keyBy(threeEvent -> threeEvent.getUserId())
.window(TumblingEventTimeWindows.of(Time.seconds(60)))
.sum("size")
.print();
DataStream
.keyBy(threeEvent -> threeEvent.getUserId())
.window(TumblingEventTimeWindows.of(Time.seconds(5)))
.process(new processFunction());
public static class processFunction extends ProcessWindowFunction<ThreeEvent, Tuple3<Long, Long, Float>, Long, TimeWindow> {
#Override
public void process(Long key, Context context, Iterable<ThreeEvent> threeEvents, Collector<Tuple3<Long, Long, Float>> out) throws Exception {
Float sumOfSize = 0F;
for (ThreeEvent f : threeEvents) {
sumOfSize += f.getSize();
}
out.collect(new Tuple3<>(context.window().getEnd(), key, sumOfTips));
}
}
You can, of course, use windows to create mini-batches that you sort and analyze, but it will be difficult to handle the window boundaries correctly (what if the events that should be paired land in different windows?).
This looks like it would be much more easily done with a keyed stream and a stateful flatmap. Just use a RichFlatMapFunction and use one piece of keyed state (a ValueState) that remembers the previous event for each key. Then as each event is processed, compare it to the saved event, produce a result if that should happen, and update the state.
You can read about working with flink's keyed state in the flink training and in the flink documentation.
The one thing that concerns me about your use case is whether or not your events may arrive out-of-order. Is it the case that to get correct results you would need to first sort the events by timestamp? That isn't trivial. If this is a concern, then I would suggest that you use Flink SQL with MATCH_RECOGNIZE, or the CEP library, both of which are designed for doing pattern recognition on event streams, and will take care of sorting the stream for you (you just have to provide timestamps and watermarks).
This query may not be exactly right, but hopefully conveys the flavor of how to do something like this with match recognize:
SELECT * FROM Events
MATCH_RECOGNIZE (
PARTITION BY userId
ORDER BY eventTime
MEASURES
A.userId as userId,
A.color as color,
A.size as aSize,
B.size as bSize
AFTER MATCH SKIP PAST LAST ROW
PATTERN (A B)
DEFINE
A AS true,
B AS ( timestampDiff(SECOND, A.eventTime, B.eventTime) < 30)
AND A.color = B.color
AND A.size < B.size )
);
This can also be done quite naturally with CEP, where the basis for comparing consecutive events is to use an iterative condition, and you can use a within clause to handle the time constraint.

Checking every bit is on some cycle

I'm coding a model where :
Node are represented as bitvectors of 10 length each representing some molecules and edges can take any molecules that was present at source to to a target node.
for example
S_Node : 0b0100000011 // Molecule 0 , 1 , 8 present on node
One_Edge : 0b0000000010 // Molecule 1 is going out on edge
I have to enforce condition that each outgoing Molecule on edge is coming back to the source node on some cycle. Molecule has to come back in a cycle means that during taking path of the cycle it has to be present on evry node and evry edge it takes.
* Parallel edges are allowed.
Molecule 1 takes path S_Node -> Node_1 -> Node_2 ... -> S_Node. So Molecule 1 started from S_Node on an edge and traveled through Node_1 ... and came back to S_Node on a cycle. Hence this molecule satisfies the condition.
Similarly i have to check for each molecule on each edge.
I'm doing in trivial possible way of checking for each nodes what are possible edges going out and then for each edge what are possible bits that are present and enforcing each coming back on some cycle.
for (i = 0; i < N; i++) { // for each Node
for (j = 0; j < E; j++) { // for each Edge going out frm node i
// Lets say we have some way of finding E
if(edgeWeight & (1 << j)) { //All outgoing bits
// Enforcing that each will come back
// On some Cycle
Its easily visible that i have to iterate over all nodes and then all edges going out and then for each bits on those edges, have to write code for enforcing the same. Enforcing itself have to iterate over at least no.Of Nodes #N.
Any better way to efficiently do this ? Any other way to check for same thing in graph theory ? Thanks
You seem to have a directed graph per molecule (per bit) Simply do your trick to check for any non-cycles per molecule.
You can take btillys way of checking for cycles, another option is to look at strongly connected components. You essentially want each subgraph (for a given molecule) to be a graph where each connected component is actually strongly connected. There are some good algorithms to for strongly connected components referred to from the wikipedia article linked to earlier.
The representation of nodes is irrelevant to the problem. You have a directed graph. You wish to verify that for every node and edge, there exists a cycle containing that edge. And you want to be reasonably efficient about it (rather than doing a brute force search for all possible cycles from all edges).
Here is an observation. Suppose that you find a cycle in your graph G. Consider the graph G' which is the same as your original graph EXCEPT that the cycle has been collapsed down to a single node. The answer to your question for G is the same as the answer to your question for G' because any cycle in G leads to a cycle in G' (possibly a self-intersecting one that can be turned into 2 cycles), and any cycle in G' leads to a cycle in G (if you hit the collapsed node, then follow its cycle around until you find the exit point to continue).
So now the question goes from brute force discovery of cycles to collapsing cycles until you have a small graph where the question is easily answered. So for each node, for each edge, you start a path. Your path continues until you have discovered a cycle. Any cycle. (Not necessarily back to the original node!) Collapse that cycle, and keep traveling until you either have to backtrack (in which case your condition is not met) or you manage to loop back to your original node, collapse that cycle, and move on to another edge.
If you implement this, you'll have a polynomial algorithm, but not the best you can do. The problem is that creating new graphs with a cycle collapsed is an expensive operation. But there is a trick that helps. Instead of collapsing the graph every time you find a cycle, try to be lazy about it. Create a new "fake node" for that cycle, and mark each node in that cycle as going to that fake one. Every time you see an edge that goes to a node, do a recursive search through those mappings to the most collapsed node that you've found, and mark everything you saw in that search as directly mapping there.
If you implement the lazy bit well, your overall algorithm should wind up O(E) where E is the number of edges in your graph. You actually can't do better than that given that you have to visit every edge no matter what you do.

How can I count the number of valid squares in each group?

EDIT:I have a logical error and I don't know how to fix it.This is my code. I think that the problem is in the while function at line 245. It doesn't add the next valid pixel to the queue, so the queue becomes 0 and it exits the WHILE function.
I need help from a veteran here! I have something like a chess table, with equal sized squares, numbered from bottom to top, right to left, but only some of them are valid for me (as shown in the picture I posted a link to). I deleted the non-valid ones from the table.
I want my C program to count the squares in each group. As you can see in the image, a valid group only has directly connected squares, and squares that connect only diagonally are not in the same group. I used colors to evidentiate the valid groups in my picture.
I know the table's width and height and I know how many squares and valid squares it has.
I stored their numbers in a vector, but I can't figure out how to count the squares in each group.
How can I do this?
Here is my picture:
I want to find out a method which works for larger "chess tables", like pictures with known sizes.
What you surely miss is : What group a valid square belongs to ?
You could use graph theory to solve that.
But you can also try some other approach.
For instance, you can use one tag list which will keep track of already visited nodes or not. You can use a vector of vector for managing group nodes.
Browse the table with same direction let's say from top left to bottom right. Check only non visited nodes.
When a valid square is found which is non visited add this node to vector[count++] and flag this node to visited.
Look for this node connected squares in bottom right direction. If one is found then flag it to visited and add the node in the same vector[count] list.
You repeat the same process until you cannot find anymore connected components (with recursion for instance).
If no connected squares are found anymore in the same group continue step 1.
At the end just sum of each of your vector[count] and it should give the expected results (for performance you could do that on the fly while looking connected components).
You should categorize each connected squares into a single component and count the size of each component. For that you can use a nested data structure such as vector to store the connected squares and nest it inside a map to distinguish the other connected squares. like,
map<int squareIndex, vector<squares> >
The basic idea is to have a visited array so that you can ignore the already visited squares. The algorithm follows just like doing Breadth First Search in Graphs. You can use a Queue to store the adjacent connected components of the current square. I would suggest to do the following.
Algo
traverse through the array from 1 to N
If queue is empty then make new index in map and push element into your map
If visited[element] is true then go to step 1, else visited[element]=true and push element into your map.
element = dequeue()
enqueue adjacent squares of the current element and repeat from step 2 for the current element
In this way finally your map will be of size 4 (for the given case) and each map element will contain a vector of the connected squares. Use size property of each vector to count the number of squares in each index.
You are searching for something called connected component.
This post sheds some light on the subject but there's a number of different implementations available on the Web.

hierarchical pathfinding implementation

I want to divide my map in clusters and implement HPA*. Where do I start, each time I try this I run into problems. I need to implement this on a random and dynamically changing map.
I am unsure how to write an algorithm that places these "nodes", to connect the parts, in between sections/clusters of the map and update them. I guess every time open tiles lie in between closed tiles on the edge of a cluster/section there should be a node since inside the cluster it could be that multiple openings into the cluster do not connect to each other within this section.
Normally I would just have a big Tile[,] map. I guess I could just leave this be and create a cluster/section class that holds all the paths and nodes. And have a node class/struct that holds the 2 tiles that are connected between sections. I have read several articles about HPA* but I just can not wrap my head around implementing this correctly on a random and dynamical map. I hope to get some good pointers here although the question is not very clear.
-edit-
What i am trying to do is making cluster class that holds 10x10 tiles/nodes with on each side an entry point (several if there is a obstruction on the edge). The entries link to the next cluster.

Entering / Exiting a NavGraph - Pathfinding

I've got a manually created NavGraph in a 3D environment. I understand (and have implemented previously) an A* routine to find my way through the graph once you've 'got on the graph'.
What I'm interested in, is the most optimal way to get onto and 'off' the Graph.
Ex:
So the routine go's something like this:
Shoot a ray from the source to the destination, if theres nothing in the way, go ahead and just walk it.
if theres something in the way, we need to use the graph, so to get onto the graph, we need to find the closest visible node on the graph. (to do this, I previously sorted the graph based on the distance from the source, then fired rays from closet to furthest till i found one that didn't have an obstacle. )
Then run the standard A*...
Then 'exit' the graph, through the same method as we got on the graph (used to calculate the endpoint for the above A*) so I take and fire rays from the endpoint to the closest navgraph node.
so by the time this is all said and done, unless my navgraph is very dense, I've spent more time getting on/off the graph than I have calculating the path...
There has to be a better/faster way? (is there some kind of spacial subdivision trick?)
You could build a Quadtree of all the nodes, to quickly find the closest node from a given position.
It is very common to have a spatial subdivision of the world. Something like a quadtree or octree is common in 3D worlds, although you could overlay a grid too, or track arbitrary regions, etc. Basically it's a simple data-structures problem of giving yourself some sort of access to N navgraph nodes without needing an O(N) search to find where you are, and your choices tend to come down to some sort of tree or some sort of hash table.

Resources