Recursively get all the vertices emerging from a group of vertices - graph-databases

I have a vertex label "group" and a group can have multiple "categories".
For example, a group named "food" can have multiple categories like "Seafood, Chinese, Indian" which are connected by an edge labelled "label1".
Now, a category can have further categories, like "Seafood" can have "fish, prawns" and so on. the depth is arbitrary and these all further categories are connected by edge labelled "label2".
food --label1--> seafood --label2--> fish --label2--> jellyfish --label2--> so on
--label2--> starfish
--label2--> prawns
--label2--> crab
--label1--> Indian
--label1--> Chinese
I want to recursively traverse all the vertices and get the data.
I hope you understood the question. Please help me out.

It's as easy as:
g.V(food).out("label1").
emit().
repeat(out("label2"))

Related

How to return the column heading for multiple matches, from multiple criteria, in Excel?

I'm working with 40+ price lists for different groups of customers. By entering the item number and price, I'd like to show the names of all the price lists that match that item/price. I've set up the items and pricing lists like the table below. Some items have the same prices in multiple price lists, and some price lists do not list all items.
Item ListPrice Red Blue Green Grey
Hosaka2 $200 $180 $188 $171
TrodesH $460 $410 $380 $400 $380
TrodesL $810 $680 $680 $720
Shuri $80 $72 $72 $70
I'd like to enter TrodesH and $380 into cells and have the formula return Blue & Grey. Or enter Hosaka2 and $200 and have the formula return ListPrice.
I've used this Index array formula in the past to return multiple matches to a single entry. To, for example, list all the account numbers whose address matches the zip code I enter. It's very handy. But here I'm looking for the combination of item and price. Any ideas how I might move forward?
Edit: For better legibility in the Excel file, I've placed the above data in a worksheet named "Matrix" and the lookup formulas in a worksheet named "Check."
Let's assume that A1:F5 contains the data, and H2 contains the item of interest, such as TrodesH, and I2 contains the price of interest, such as $380, try...
J2:
=COUNTIF(INDEX($B$2:$F$5,MATCH($H2,$A$2:$A$5,0),0),$I2)
K2, confirmed with CONTROL+SHIFT+ENTER, and copied across:
=IF(COLUMNS($K2:K2)<=$J2,INDEX($B$1:$F$1,SMALL(IF(INDEX($B$2:$F$5,MATCH($H2,$A$2:$A$5,0),0)=$I2,COLUMN($B$1:$F$1)-COLUMN($B$1)+1),COLUMNS($K2:K2))),"")
Hope this helps!
Taking the diagram below as a model (adjust it to your own data layout and places), you can enter this formula at J1 and copy/paste into the colored area:
J1:
=IFERROR(INDEX($1:$1,AGGREGATE(15,6,COLUMN($B$2:$F$5)
/($A$2:$A$5=$H1)/($B$2:$F$5=$I1),COLUMN(A:A))),"")

Many to Many Database Relationship Design - to enable Word Clouds

I'm relatively new to database design and struggling to introduce a many-to-many relationship in a SSAS Tabular model.
I have some 'WordGroup' performance data in one table, like so;
WordGroup | IndexedVolume
Dining | 1,000
Sports | 2,000
Movies | 1,600
... and so on
Then I have 'Words' contained within these 'WordGroups' sitting in another category table, like so;
WordGroup | Word
Dining | Restaurant
Dining | Food
Dining | Dinner
Sports | Football
Sports | Basketball
... and so on
I can't see Performance data (IndexedVolume) by 'Word' detail - only by the 'WordGroup' that it is contained within. For example above, I can't look at 'Football' IndexedVolume on it's own, I can only choose the 'Sports' WordGroup that contains Football.
However, when analysing by 'WordGroup' I would still like users to understand what 'Words' are included (ideally in a Word Cloud Visualisation). Therefore, I wanted to develop a relationship between these two tables, so when someone chooses a Word Group (or multiple) we can return the Words that are contained within the Word Group(s) - i.e. below.
User selects Dining WordGroup
<<<Word Cloud or Flat Table would show Words below>>>
Restaurant
Food
Dinner
I looked at Concatenate / Strings etc, but was deterred as the detail here is much more complex and each WordGroup may contain 10+ Words, with translations.
Any advice would be greatly appreciated!
If analizing by WordGroup is an obligatory requirement, you sholud use these tables:
The many-to-many aplies beacuse your words may be conected to one or more groups, e.g. tree is conected to enviroment, forest, etc.
and obviously one word_group is conected to many words.
To see performance data by Word use :
select w.idword , w.name, sum(wg.index_volume)
from word w
left join word_group_has_word wgw
on w.idword=wgw.word_id
left join word_group wg
on wg.idword_group=wgw.word_group_id
group by w.idword
So you will see the sum of all the index volume of all the group_words conected to the words. ANd if you wanna see the words conected to the word groups use:
select distinct w.idword , w.name
from word w
left join word_group_has_word wgw
on w.idword=wgw.word_id
where wgw.word_group_id in [listWordGroupsId]

can redis do a leaderboard by categories + leaderboard encompassing all the categories

Example:
Art ranking for categories: [ pictures, paintings, sculptures, ... ].
Queries:
Top 10 pictures ?
Sculptures with rank from 10 to 20 (pagination).
Best art ? (combine all categories)
To consider
Some arts have multiple categories. There are some sculptures that are paintings as well.
There are a lot of art categories. Pictures, painting and sculpture is only a subset of them.
Redis seemed like a pretty good choice at first with sorted set for each art categories.
EG:
zset:pictures
zset:paintings
...
However, having a ranking for all categories combined requires another sorted set zset:art. This means that the score of each artistic item has to be updated in multiple places. This is a no-go.
I would like to be sure I'm thinking about this correctly, since I'm new to redis, before deciding to use a relational DB instead.

Solr Search Facets: How do i make them count products and NOT product varieties

The shop i'm working on sells clothing. Each item of clothing come in multiple varieties. For example Shirt A might come in: Red Large, Red Medium, Blue Large, Blue Medium, White Large, and White Medium.
At first I had added each variety as a solr doc. So for the above product I added 6 solr docs, each with the same Product ID. I got solr to group the results by Product ID and everything worked perfectly.
However the facet counts were all variety counts and not product counts. So for example .. just limiting it to the one product above - (if that were the only product in the system say).. the facet counts would show:
Red (2)
Blue (2)
White (2)
Which was correct, there were 2 documents added for each color. But really what i want to see is this:
Red (1)
Blue (1)
White (1)
As there is only 1 product for each color.
So now i'm thinking in order to do that I need to make each solr document a product.
In that case i would add the product, and add the field "color" 3 times one red, one blue, one white, and add the field size 3 times as well. But now solr doesn't really know what size goes with each color. Maybe I only have white in small.
What is the correct way to go about this to make the facet counts as they should be?
Turns out I could do this using grouping (field collapsing) here
http://wiki.apache.org/solr/FieldCollapsing#Request_Parameters
specially these parameters added to the query
group=true
group.field=product_id"
group.limit=100
group.facet=true
group.ngroups=true
group.facet is the one that really make the facets work with the groups like i wanted them to.
I think that you have 2 options.
Option 1:
Once you get the list of facet values (Red, Blue & White in the given example), then fire the original query again with each facet value as a filter. For example, if the original query was q=xyz&group.field=ProductID then fire q=xyz&group.field=ProductID&group.ngroups=true&fq=color:Red. The ngroups value in the response will give you the required count for Red. Similarly, fire a separate query for Blue and White.
Option 2:
Create a separate field called Product_Color which includes both the ProductID and the color. For example, if a product has ID is ABC123 and color is Red, then Product_Color will be ABC123_Red. Now, to get the facets for color, fire a separate query which groups by Product_Color instead of ProductID and you will get the required facets with the correct values. Remeber to set group.truncate=true for this to work.
You can try looking into Facet Pivot, which would allow you to have single document, tree like facet with proper counts and filtering.

Efficiently selecting a title (the center of the cluster) for a cluster of strings

I have an (imperfectly) clustered string data, where the items in one cluster might look like this:
[
Yellow ripe banana very tasty,
Yellow ripe banana with little dots,
Green apple with little dots,
Green ripe banana - from the market,
Yellow ripe banana,
Nice yellow ripe banana,
Cool yellow ripe banana - my favourite,
Yellow ripe,
Yellow ripe
],
where the optimal title would be 'Yellow ripe banana'.
Currently, I am using simple heuristics - choosing the most common, or the shortest name if tie, - with the help of SQL GROUP BY. My data contains a large amount of such clusters, they change frequently, and, every time a new fruit is added to or removed from the cluster, the title for the cluster has to be re-calculated.
I would like to improve two things:
(1) Efficiency - e.g., compare the new fruit name to the title of the cluster only, and avoid grouping / phrase clustering of all fruit titles each time.
(2) Precision - instead of looking for the most common complete name, I would like to extract the most common phrase. The current algorithm would choose 'Yellow ripe', which repeats 2 times and is the most common complete phrase; however, as the phrase, 'Yellow ripe banana' is the most common in the given set.
I am thinking of using Solr + Carrot2 (got no experience with the second). At this point, I do not need to cluster the documents - they are already clustered based on other parameters - I only need to choose the central phrase as the center/title of the cluster.
Any input is very appreciated, thanks!
Solr provides an analysis component called a ShingleFilter that you can use to create tokens from groups of adjacent words. If you put that in your analysis chain (ie apply it it incoming documents when you index them), and then compute facets for the resulting field with a query restricted to the "fruit cluster", you will be able to get a list of all distinct shingles along with their occurrence frequencies - I think you can even retrieve them sorted by frequency - which you can use easily I think to derive the title you want. Then when you add a new fruit, its shingles will automatically be included in the facet computations the next time around.
Just a bit more concrete version of this proposal:
create two fields: fruit_shingle, and cluster_id.
Configure fruit_shingle with the ShingleFilter and any other processing you might want (like tokenizing at word boundaries with maybe StandardTokenizer, prior to the ShingleFilter).
Configure cluster_id as a unique id, using whatever data you use to identify the clusters.
For each new fruit, store its text in fruit_shingle and its id in cluster_id.
Then retrieve facets for a query: "cluster_id:", and you will get a list of words, word pairs, word triplets, etc (shingles). You can configure the ShingleFilter to have a max length, I believe. Sort the facets by some combination of length and/or frequency that you deem appropriate and use that as the "title" of the fruit cluster.

Resources