I am very new to Cytoscape and have what feels like an easy-to-solve problem. I am using the drag and drop functionality of Cytoscape to build networks (I am basically translating photographs of insect trail networks into Cytoscape representations - this makes it much easier to build the networks). The problem is that I need to export the edge table from Cytoscape into R for further analysis. Cytoscape exports a .csv file that has a column that looks like this:
"Node 5 (undirected) Node 2" <- all in a single column
What I need is either a .csv file that contains either an adjacency matrix or an edge list (where edges are represented as pairs of nodes, with each node in its own column). Is there an easy way to do this? Thank you in advance!
There are several solutions:
The http://apps.cytoscape.org/apps/adjexporter allows you to export the network as adjecency matrix
You can connect directly to Cytoscape from R using the CyREST interface: http://apps.cytoscape.org/apps/cyrest
You can replace all (undirected) terms with a tab
Hope this helps,
Piet
Related
Sklearn algorithm require a feature and a label for it to learn.
I have a CSV file which contain some data. These data is actually a challenge from hackerearth website in which participant need to create a learning algorithm that learn from data on massive amount of individuals from affiliate network and their ad click performance which then predict future performance of other individuals in the affiliate network which allow the company to optimize their ad performance.
The features in these data include id,date,siteid, offerid, category, merchant, countrycode,type of browser, type of device and the number of clicks their ads have gotten.
https://www.hackerearth.com/practice/algorithms/string-algorithm/string-searching/practice-problems/machine-learning/predict-ad-clicks/
So my plan is to use the first 7 information as my feature and ad click as label. Unfortunately, countrycode,browser and device information is in text (Google Chrome, Desktop) and not integers which can be turned into array.
Q1: Is there a way for sklearn to accept not just numpy arrays but also words as features? Am I support to use vectorizer for this? If so, how would I do it? If not, can I just replace the wording data into numbers (Google Chrome replaced by 1, firefox replaced by 2) and still have it to work? (I am using Naive Bayes algorithm)
Q2: Would Naive Bayes algorithm be suitable for this task? Since this competition require participant to create a program that predict the probability of individuals in affiliate network have their ads click, I assume Naive Bayes would be best suited.
Training data : https://drive.google.com/open?id=1vWdzm0uadoro3WcpWmJ0SVEebeaSsHvr
Testing data : https://drive.google.com/open?id=1M8gR1ZSpNEyVi5W19y0d_qR6EGUeGBQl
My messy coding and horrible attempt at this challenge which I don't think will be much help:
from sklearn.naive_bayes import GaussianNB
import csv
import pandas as pd
import numpy as np
data = []
from numpy import genfromtxt
import pandas as pd
data = genfromtxt('smaller.csv', delimiter=',')
dat = pd.read_csv('smaller.csv', delimiter=',')
print(dat(siteid))
feature = []
label =[]
i = 1
j = 1
while i <17:
feature.append(data[i][2:8])
i += 1
while j <17:
label.append(data[i][9])
j += 1
clf = GaussianNB()
clf.fit(feature,label)
print(clf.predict([data[18][2:8]]))
print(data[18])
Answer for Question1: No. Sklearn only works with numerical data. So you need to convert your text to numbers.
Now to convert text to numbers you can follow multiple approaches. First is as you said just assign numbers to them. But you need to to take in account if the text data shows any order like the numbers assigned to them or not. In that case, most often one-hot encoding is used. Please see the below scikit-learn documentation for that:
- http://scikit-learn.org/stable/modules/preprocessing.html#encoding-categorical-features
Answer to Question 2: It depends on the data and task at hand.
No single algorithm is capable of handling every type of data optimally.
Most of the times we need to compare multiple algorithms and see what gives best result for our data. See this example:
http://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html#sphx-glr-auto-examples-classification-plot-classifier-comparison-py
Even in a single algorithm we need to check for various parameter values, tune those values for maximum score. This is called grid-search. See this example:
http://scikit-learn.org/stable/auto_examples/model_selection/plot_grid_search_digits.html#sphx-glr-auto-examples-model-selection-plot-grid-search-digits-py
Hope this clears your doubts. Make sure to go through the scikit-learn documentation and examples:
http://scikit-learn.org/stable/user_guide.html
http://scikit-learn.org/stable/auto_examples/index.html
They are one of the best out there.
Let's say we have a simple graph stored in OrientDB, with an edge type called weightedEdge that has a property weight. I'd like to be able to traverse from a starting node to an arbitrary depth, but only traverse to nodes that have the maximum value of weight on their incoming edge compared to all other edges at the same depth. Is this possible using OrientDB SQL?
So in the given example above, I would only want to hop along the red arrows.
Thank you!
May be you can use a let block and then property selection via [ ]:
traverse out()[weight = $maxW] ...
LET $maxW = SELECT max(weight) FROM ...
I write CSP map coloring problem solver. A definied constraint as Constraint(A, B) what means that country A is adjacent to country B. Currently I create maps by hand but now I need some big map to test my algorithm. Do you know where I can find some easy to parse data?
Something like this would be ideal for me:
A B
B C
A D
There are some nice example graphs of various kinds available from Donald Knuths Stanford GraphBase page.
I'm developing an application that needs routing information for certain cities. First, I downloaded a openstreetmap datafile (*.osm) and then I imported it into a postgreSQL database using osm2pgrouting tool (http://workshop.pgrouting.org/chapters/installation.html).
After this, I have the following tables:
nodes: that contains simple locations points
ways: that contains ways with some nodes involved
vertices_tmp: stores nodes that may be used for pgrouting functions like Djistra, A*...etc.
Would I use nodes that isn't in "vertices_tmp" table for calculate distances between nodes? Or I would only do it with the nodes stored in "vertices_tmp"?
Into ways table there are a field named "the_geom" that encapsulates different locations points (nodes). For example:
"MULTILINESTRING((1.5897786 42.5600441,1.5898376 42.5601455,1.589992 42.5605438,1.590095 42.5606795,1.5901782 42.5608026,1.5902238 42.561018,1.5902912 42.5616808,1.5903685 42.561899,1.5904008 42.5620563,1.5903836 42.5624117,1.5904265 42.5627151,1.5904947 42.5628368,1.5905981 42.5629553,1.5906926 42.5631007,1.590802 42.5633238,1.5908604 42.5634883,1.5909501 42.5637139,1.5910869 42.5638755,1.5913053 42.5639639,1.5914994 42.5640237,1.591648 42.5640261,1.5919232 42.5640145,1.5921124 42.5640363,1.5923292 42.5640953,1.592804 42.5643306))"
Can I route with intermediate nodes or only with source/target nodes?
My goal is to be able to routing between different nodes or POIs, depending of its amenity tags, not only driving distance, walking distance too. Furthermore I need to calculate shortest path for source/targets nodes.
Any idea for do this?
You can't use the elements of the nodes table.
If you want to plan route from one POI to another, first you have to find the nearest vertex/edge based on the selected algorithm(Shooting star requires edges, the others use vertices).
After this you can make the routing, just pick an algorithm from THIS SITE
You will find there a good tutorial about different routing solutions and some help for the detailed usage (including how to determine the closest way).
i'm coding a c project for an algorithm class and i really need some help!
Here's the problem:
I have a set of names like this one N = (James,John,Robert,Mary,Patricia,Linda Barbara) wich are stored in an RB tree.
Starting from this set of names a series of couple like those ones are formed:
(James,Mary)
(James,Patricia)
(John,Linda)
(John,Barbara)
(Robert,Linda)
(Robert,Barbara)
Now i need to merge the elements in a way that i can form n subgroups with the constraint that each pairing is respected and the group has the smallest possible cardinality.
With the couples in the example they will form two groups
(James,Mary,Patricia) and (John,Robert,Barbara,Linda).
The task is to return the maximum number of groups formed and the number of males and females in the group with the maximum cardinality.
In this case it would be 2 2 2
I was thinking about building a graph where every name is represented by a vertex and two vertex are in an edge only if they are paired.
I can then use an algorithm (like Kruskal) to find the Minimum spanning tree.Is that right?
The problem is that the graph would not be completely connected.
I also need to find a way to map the names to the edges of the Graph and vice-versa.
Can the edges be indexed by a string?
Every help is really appreciated :)
Thanks in advice!
You don't need to find the minimum spanning tree. That is really for finding the "best" edges in a graph that will still keep the graph connected. In other words, you don't care how John and Robert are connected, just that they are.
You say that the problem is that the graph would not be completely connected, but I think that is actually the point. If you represent graph edges by using the couples as you suggest, then the vertices that are connected form the groups that you are looking for.
In your example, James is connected to Mary and also James is connected to Patricia. No other person connects to any of those three vertices (if they did, you would have another couple that included them), which is why they form a single group of (James, Mary, Patricia). Similarly all of John, Robert, Barbara, and Linda are connected to each other.
Your task is really to form the graph and find all of the connected subgraphs that are disjoint from each other.
While not a full algorithm, I hope that helps get you started.
I think that you can easily solve this with a dfs and connected components. Because every person(node) has a relation with an other one (edge). So you have an outer loop and run an explore function for every node which is unvisited and add the same number for every node explored by the explore function.
e.g
dfs() {
int group 0;
for(int i=0;i<num_nodes;i++) {
if(nodes[i].visited==false){
explore(nodes[i],group);
group++;
}
}
then you simple have to sort the node by the group and then you are ready. if you want to track the path you can use a pre number which indicates which node was explored first, second..etc
(sorry for my bad english)!
The sets of names and pairs of names already form a graph. A data structure with nodes and pointers to other nodes is just another representation, one that you don't necessarily need. Disjoint sets are easier to implement IMO, and their purpose in life is exactly to keep track of sameness as pairs of things are joined together.