Which graph algorithms are available in Memgraph? - graph-databases

I know that there are like 100s of graph algorithms out there. Which one are available in Memgraph? Is there some command like help list algorithms that would tell me what can I use?

As far as I know, there is no Cypher command that would list all of the available/implemented graph algorithms.
If you have "pure" Memgraph then you have:
Depth-first search (DFS)
Breadth-first search (BFS)
Weighted shortest path (WSP)
All shortest paths (ASP)
This are built in graph algorithms.
As Iłya Bursov has written in his comment there is also MAGE. It is an open-source repository that contains graph algorithms and modules written by the team behind Memgraph and its users in the form of query modules.
At the moment MAGE includes the following algorithms:
Betweenness Centrality
Biconnected Components
Bipartite Matching
Bridge Detection
Community Detection
Cycle Detection
Graph Coloring
Katz Centrality
Maximum Flow
Node Similarity
PageRank
Union Find
Dynamic Betweenness Centrality
Dynamic Community Detection
Dynamic node2vec
Dynamic Katz Centrality
Dynamic PageRank
A complete list of algorithms and implementation details within MAGE can be found at https://memgraph.com/docs/mage/algorithms.

Related

Can Apache-AGE handle graph algorithms such as shortest path, centrality measures, and page rank?

I'm interested in using Apache-AGE for graph data storage and analysis. Can Apache-AGE handle common graph algorithms such as shortest path, centrality measures, and page rank? I would like to know if Apache-AGE provides built-in functions or APIs for these algorithms, or if I would need to implement them myself using the provided data access interfaces. Any information on this topic would be greatly appreciated.
For now these algorithms are not implemented, but what you can do is download any driver e.g Apache Age Python driver. Then use that to implement those algorithms yourself in python. You can do these in other languages which support Apache Age. You can find ther drivers here
I'll go through for each algorithm mentioned:
Shortest Path
Referring to this closed Github issue for shortest path, there is no direct implementation of it in AGE due to complexity. In this case there are two options:
Create own query for AGE -> Video lecture, Slides
Implementation of algorithm in AGE -> Paper that can be used for reference
Centrality Measures
It seems the older AgensGraph did have a centrality measuring algorithm. You may attempt on AGE using this reference. However, currently no official documentation or support regarding it.
Page Rank
Neo4j has an implementation documentation which you can follow to create queries or implement the algorithm directly.

Why there are not so many graph databases as graph processing frameworks?

For graph databases, especially those are active and distributed, I knew some but not a lot. Like orientdb, Titan, Dex, etc.
Regarding the graph processing frameworks, there are huge set of tools like graphx, graph lab, powergraph, xstream, pregel, etc. and there are more coming out every year.
Can any one tell me the difference between those two categories of tools? Are they exchangeable? And why graph databases are not drawing enough attention as graph processing frameworks?
The difference between graph databases and graph processing frameworks is databases are built to save data in the basic form of a graph, where relationships between the data are built with edges and the data points are built with nodes/vertices. Some databases, like OrientDB extend this basic concept considerably, to make the database much more versatile. Others are less versatile. Though in general, the main goal is to persist the data an a graph-like form, edges and vertices.
With graph processing frameworks, on the other hand, they take a set of data and build analytical graphs out of the data. The goal is mainly analysis of graph like patterns or structures within the data.
I'll try to put this in an analogy, as I understand it.
Say you have a punch bowl full of punch (your data).
In a graph database scenario, the punch is already a graph and you can look into the bowl and see all the stuff in your graph and analyze it too.
With a graph processing framework, you have a punch bowl full of stuff too, but it is murky and you don't see any graphs in it directly. To get a graph of some type, you first have to ladle out some of the punch, in let's say, a "graph processing ladle". This allows you to see some kind of graph coherence, depending on the algorithms you choose to try and analyze the data with. Of course, depending on your machine or system, like Spark, the graph processing ladle could be huge, even just as big as your whole punch bowl or even bigger.
Still, it takes time and processing to make a "sensible graph" out of the punch (your data). The other thing about this is, if you want to store this newly found ladle of analyzed graph punch, you'd have to have another bowl to put it in. And, if you drop the ladle on the floor, your graph data is gone. This wouldn't happen with a graph database.
I hope that makes sense.
Scott
There are connections and isolations between the graph database and graph computing.
Connections:
Graph database will not only offer data storage but also a series of graph data processing, For example, to find solve the SSSP problem needs traversal and computation of the graph which must be supported by the graph processing framework.
Isolations:
You can't use the graph database for most of the graph computing like PageRank, Greedy Graph Coloring, because as a basic storage and query system, graph database doesn't need to have the ability to do computing jobs.
Correct me if I'm wrong, I'm also a freshman for graph computing.

Ground Truth datasets for Evaluating Open Source NLP tools for Named Entity Recognition

I am working on building a document similarity graph for a collection. I already do all the basic things like tokenization, stemming, stop-word removal, and bag-of-word representation to represent the documents and computing similarity using Jaccard coefficient. I am now trying to extract Named Entities and evaluate if these would be helpful in improving the quality of the document similarity graph. I have been spending much of time on finding ground-truth datasets for my analysis. I have been very disappointed with Message Understanding Conference (MUC) datasets. They are cryptic to understand and requires sufficient data cleaning/massaging before it can be used on a different platform (like Scala)
My questions are here more specifically
Are there tutorials on getting started with MUC datasets that would make it easier for analyzing the results using open source NLP tools like openNLP
there other datasets available?
Tools like OpenNLP and Stanford Core NLP employ approaches that are essentially supervised. Correct?
GATE is a great tool for hand-annotating your own text corpus Correct?
For a new test dataset (that I hand-create) how can I compute the baseline (Vocabulary Transfer) or what kind of metrics can I compute?
First of all, I have a few concerns about using Jaccard coefficient to compute similarity. I'd expect TF.IDF and cosinus similarity to give better results.
Some answers to your questions:
See the CoNLL 203 evaluation campaign: it also provides data, evaluation tools, etc. You ma also have a look at ACE.
Yes
Gate is also a pipeline that automatically annotates text, but as far as I know NER is a rule-based component.
A baseline is most of the time a very simple algorithm (e.g. majority classes) so it is not a baseline to compare corpus, but to compare approaches.

Examples for Topological Sorting on Large DAGs

I am looking for real world applications where topological sorting is performed on large graph sizes.
Some fields where I image you could find such instances would be bioinformatics, dependency resolution, databases, hardware design, data warehousing... but I hope some of you may have encountered or heard of any specific algorithms/projects/applications/datasets that require topsort.
Even if the data/project may not be publicly accessible any hints (and estimates on the order of magnitude of potential graph sizes) might be helpful.
Here are some examples I've seen so far for Topological Sorting:
While scheduling task graphs in a distributed system, it is usually
needed to sort the tasks topologically and then assign them to
resources. I am aware of task graphs containing more than 100,000
tasks to be sorted in a topological order. See this in this context.
Once upon a time I was working on a Document Management System. Each
document on this system has some kind of precedence constraint to a
set of other documents, e.g. its content type or field referencing.
Then, the system should be able to generate an order of the documents
with the preserved topological order. As I can remember, there were
around 5,000,000 documents available two years ago !!!
In the field of social networking, there is famous query to know the
largest friendship distance in the network. This problem needs to
traverse the graph by a BFS approach, equal to the cost of a
topological sorting. Consider the members of Facebook and find your
answer.
If you need more real examples, do not hesitate to ask me. I have worked in lots of projects working on on large graphs.
P.S. for large DAG datasets, you may take a look at Stanford Large Network Dataset Collection and Graphics# Illinois page.
I'm not sure if this fits what you're looking for but did you know Bio4j project?
Not all the contents stored in the graph based DB would be adequate for topological sorting (there exists directed cycles in an important part of the graph), however there are sub-graphs like Gene Ontology and Taxonomy where this ordering may have sense.
TopoR is a commercial topological PCB router that works first by routing the PCB as topological problem and then translating the topology into physical space. They support up to 32 electrical layers, so it should be capable of many thousands of connections (say 10^4).
I suspect integrated circuits may use similar methods.
The company where I work manages a (proprietary) database of software vulnerabilities and patches. Patches are typically issued by a software vendor (like Microsoft, Adobe, etc.) at regular intervals, and "new and improved" patches "supercede" older ones, in the sense that if you apply the newer patch to a host then the old patch is no longer needed.
This gives rise to a DAG where each software patch is a node with arcs pointing to a node for each "superceding" patch. There are currently close to 10K nodes in the graph, and new patches are added every week.
Topological sorting is useful in this context to verify that the graph contains no cycles - if they do arise then it means that there was either an error in the addition of a new DB record, or corruption was introduced by botched data replication between DB instances.

Where can I get very simple inroduction to all Artificial intelligent techniques with Real world Examples

I know that Artificial Intelligence field is very vast and there are many books on it. But i just want to know the any resource where i can get the simple inroduction to all Artificail Intelligence techniques like
It would like to have 1 or 2 page introduction of all techniques and their examples of how they can be applied or for what purpose they can be used. I am interested in
Backpropagation Algoritm
Hebbs Law
Bayesian networks
Markov Chain Models
Simulated Annealing
Tabu Search
Genetic Algorithms or Evolutionary Algos
Now there are many variants and more AI techniques. And each one have many books written on them. i am unable to decide which algos i can use unless i know what they are capable of doing.
So can i find 1-2 page inroduction of them with Application examples
Essentials of Metaheuristics covers several of these - I can't promise it'll cover all of them, but I know there's good stuff on simulated annealing and genetic algorithms in there. Probably at least a few of the others, but I'd have to re-download it to check. It's available for free download.
It can be a bit light on the theory, but it'll give you a straightforward description, some explanation of when you'd want to use each, and a lot of useful pseudocode.
Here's an image on local search (= tabu search without tabu) from the Drools Planner manual:
I am working on similar images for Greedy algorithms, brute force, branch and bound and simulated annealing.
As an example of Genetic Algorithms implementation I can give you this.
It's an API I developed for GA, with one implementation for each operator, and one concrete example problem solved ((good) Soccer Team among ~600 players with budget restriction). Its all setup so you run it with mvn exec:java and watch it evolving in the console output. But you can implement your own problem structure, or even other operators (crossing, mutating, selection) methods.

Resources