I'd like to know how I would deal with object states in a FireBase environment.
What do I mean by states? Well, let's say you have an app with which you organize order lists. Each list consists of a bunch of orders, so it can be considered a hierarchical data structure. Furthermore each list has a state which might be one of the following:
deferred
open
closed
sent
acknowledged
ware completely received
ware partially received
something else
On the visual (HTML) side the lists shall be distinguished by their state. Each state shall be presented to the client in its own, say, div-element, listing all the related orders beneath.
So the question is, how do I deal with this state in FireBase (or any other document based database)?
structure
Do I...
... (option 1) use a state-field for each orderlist and filter on the clientside by using if or something similar:
orderlist1.state = open
order1
order2
orderlist2.state = open
order1
orderlist3.state = closed
orderlist4.state = deferred
... (option 2) use the hierarchy of FireBase to classify the orderlists like so:
open
orderlist1
order1
order2
orderlist2
order1
closed
orderlist3
deferred
orderlist4
... (option 3) take a totally different approach?
So, what's the royal road here?
retrieval, processing & visual output of option 2
Since for option 1 the answer to this question is apparantly pretty straight forward (if state == ...) I continue with option 2: how do I retrieve the data in option 2? Do I use a Firebase-object for each state, like so:
var closedRef = new Firebase("https://xxx.firebaseio.com/closed");
var openRef = new Firebase("https://xxx.firebaseio.com/open");
var deferredRef = new Firebase("https://xxx.firebaseio.com/deferred");
var somethingRef = new Firebase("https://xxx.firebaseio.com/something");
Or what's considered the best approach to deal with that sort of data/structure?
There is no universal answer to this question. The "best approach" is going to depend on the particulars of your use case, which you haven't provided here. Specifically, how you will be reading and manipulating the data.
Data architecture in NoSQL is all about working hard on writes to make reads easy. It's all about how you plan to use the data. (It's also enough material for a chapter in a book.)
The advantage to "option 1" is that you can easily iterate all the entire list. Great if your list is measured in hundreds. This is a great approach if you want to fetch the list and manipulate it on the fly on the client side.
The advantage to "option 2" is that you can easily grab a subset of the list. Great if your list is measured in thousands and you will typically be fetching open issues only rather than closed ones. This is great for archiving/new/old lists like yours.
There are other options as well.
Sorted Data using Priorities
Perhaps the most universal approach is to use ordered data. This allows you to query a subset of your records using something like:
new Firebase(URL).startAt('open').endAt('open').limit(10);
This is sufficient in most cases where you have only one criteria, or when you can create a unique identifier from multiple criteria (e.g. 'open:marketing') without difficulty. Examples are scoreboards, state lists like yours, data ordered by timestamps.
Using an index
You can also create custom subsets of your data by creating an index of keys and using that to fetch the others.
This is most useful when there is no identifiable characteristic of your subsets. For example, if I pick them from a list and store my favorites.
I think my this plnkr can help you for this.
Here, click on edit/add and just check the country(order in your case) - State(state in your case) dependent dropdown may be the same as you want.just one single thing you may need to add is filter it.
They both are different tables in db.
You can also get it from git.
Related
I am dealing with a real-estate app. A Home will hvae typical properties like Price, Bed Rooms, Bath Rooms, SqFt, Lot size etc. User will search for Homes and such a query will require multiple inequality filters like: Price between x and y, rooms greater than z, bathrooms more than p... etc.
I know that multiple inequality filters are not allowed. I also do not want to perform any filtering in my code and/because I want to be able to use Cursors.
so I have come up with two solutions. I am not sure if these are right - so wonder if gurus can shed some light
Solution 1: I will discretize the values of each attribute and save them in a list-field, then use IN. For example: If there are 3 bed rooms, instead of storing beds=3, I will store beds = [1,2,3]. Now if a user searches for homes with say at least two bedrooms, then instead of writing the filter as beds>2, I will write the filter as "beds IN [2]" - and my home above [1,2,3] will qualify - so so will any home with 2 beds [1,2] or 4 beds [1,2,3,4] and so on
Solution 2: It is similar to the first one but instead of creating a list-property, I will actually add attributed (columns) to the home. So a home with 3 bed rooms will have the following attributed/columns/properties: col-bed-1:true, col-bed-2:true, col-bed-3:true. Now if a user searches for homes with say at least two bedrooms, then instead of writing the filter as beds>2, I will write the filter as "col-bed-2 = true" - and my home will qualify - so will any home with 2 beds, 3 beds, 4 beds and so on
I know both solutions will work, but I want to know:
1. Which one is better both from a performance and google pricing perspective
2. Is there a better solution to do this?
I do almost exactly your use case with a python gae app that lists posts with housing advertisements (similar to craigslist). I wrote it in python and searching with a filter is working and straightforward.
You should choose a language: Python, Java or Go, and then use the Google Search API (that has built-in filtering for equalities or inequalities) and build datastore indexes that you can query using the search API.
For instance, you can use a python class like the following to populate the datastore and then use the Search API.
class Home(db.Model):
address = db.StringProperty(verbose_name='address')
number_of_rooms = db.IntegerProperty()
size = db.FloatProperty()
added = db.DateTimeProperty(verbose_name='added', auto_now_add=True) # readonly
last_modified = db.DateTimeProperty(required=True, auto_now=True)
timestamp = db.DateTimeProperty(auto_now=True) #
image_url = db.URLProperty();
I definitely think that you should avoid storing permutations for several reasons: Permutations can explode in size and makes the code difficult to read. Instead you should do like I did and find examples where someone else has already solved an equal or similar problem.
This appengine demo might help you.
Short and simple question:
For a social network platform would you create a separate node for the friend requests and creating the edge after confirmation, or creating the edge directly and set a confirmed flag?
What are the advantages / disadvantages?
I am interested in your comments.
One advantage of using the flag option is when either of the user nodes are deleted by delete vertex the friend request edge will be deleted automatically by OrientDB to maintain graph consistency. If you use a seperate node for the request then you need to delete that node manually.
Performance wise, I guess, the question you linked is relevant to OrientDB too.
For such decisions, I'd also consider the readability of the code. One advantage of using a graph DB is your code becomes easier to understand and reason about. So you can write the queries for different options and judge yourself about which code is more readable. Let's try it for the flag option:
# create
CREATE EDGE Friend
FROM (SELECT FROM User where name = "Alice")
TO (SELECT FROM User where name = "Bob")
SET status = "requested" # or confirmed = False
# confirmed
UPDATE Friend SET status = "confirmed" # or confirmed = True
WHERE out.name = "Alice" AND in.name = "Bob"
# query
SELECT in.name FROM Friend
WHERE out.name = "Alice" AND status = "confirmed"
# output: Bob
# another method
SELECT outE(Friend)[status = "confirmed"].in.name
FROM User WHERE name = "Alice"
# output: Bob
I'll argue that if you are familiar with graphs as mathematical objects and get used to the OrientDB syntax and terminology, this option enables you to write very understandable code.
If you don't like this option, as an alternative to keeping requests in a different node (class/table), I'll also suggest storing them inside the User nodes as a LINKSET or something similar.
I believe you should also take into consideration the memory you'll have available. If you store that info in the edge, that probably means you'll have to define an index on that property to have faster queries. And this means more memory needed.
I advise you to store friend requests in a different node.
Find friends is easier:
select expand(both('Friend')) from #12:0
Find requests is easier:
select expand(in('Request')) from #12:0
And they are very likely faster than an index on some property.
Using a model such User1(V)---Friendship(E)---->User2(V) is enough to represent the friendship bind between users and by using properties you can implement all the workflow from the request to the completion. This design is pretty basic so you'll have a standard complexity when it comes to query/traverse .... that can be more difficult more you add constraints on properties..... a disadvantage is that an Edge is not a Vertex and this will affect its interaction with others vertexes, if you need such interaction then an approach where friendship is a vertex is the way to go.
After working with neo4j and now coming to the point of considering to make my own entity manager (object manager) to work with the fetched data in the application, i wonder about neo4j's output format.
When i run a query it's always returned as tabular data. Why is this??
Sure tables keep a big place in data and processing, but it seems so strange that a graph database can only output in this format.
Now when i want to create an object graph in my application i would have to hydrate all the objects and this is not really good for performance and doesn't leverage true graph performace.
Consider MATCH (A)-->(B) RETURN A, B when there is one A and three B's, it would return:
A B
1 1
1 2
1 3
That's the same A passed down 3 times over the database connection, while i only need it once and i know this before the data is fetched.
Something like this seems great http://nigelsmall.com/geoff
a load2neo is nice, a load-from-neo would also be nice! either in the geoff format or any other formats out there https://gephi.org/users/supported-graph-formats/
Each language could then implement it's own functions to create the objects directly.
To clarify:
Relations between nodes are lost in tabular data
Redundant (non-optimal) format for graphs
Edges (relations) and vertices (nodes) are usually not in the same table. (makes queries more complex?)
Another consideration (which might deserve it's own post), what's a good way to model relations in an object graph? As objects? or as data/method inside the node objects?
#Kikohs
Q: What do you mean by "Each language could then implement it's own functions to create the objects directly."?
A: With an (partial) graph provided by the database (as result of a query) a language as PHP could provide a factory method (in C preferably) to construct the object graph (this is usually an expensive operation). But only if the object graph is well defined in a standard format (because this function should be simple and universal).
Q: Do you want to export the full graph or just the result of a query?
A: The result of a query. However a query like MATCH (n) OPTIONAL MATCH (n)-[r]-() RETURN n, r should return the full graph.
Q: you want to dump to the disk the subgraph created from the result of a query ?
A: No, existing interfaces like REST are prefered to get the query result.
Q: do you want to create the subgraph which comes from a query in memory and then request it in another language ?
A: no i want the result of the query in another format then tabular (examples mentioned)
Q: You make a query which only returns the name of a node, in this case, would you like to get the full node associated or just the name ? Same for the edges.
A: Nodes don't have names. They have properties, labels and relations. I would like enough information to retrieve A) The node ID, it's labels, it's properties and B) the relation to other nodes which are in the same result.
Note that the first part of the question is not a concrete "how-to" question, rather "why is this not possible?" (or if it is, i like to be proven wrong on this one). The second is a real "how-to" question, namely "how to model relations". The two questions have in common that they both try to find the answer to "how to get graph data efficiently in PHP."
#Michael Hunger
You have a point when you say that not all result data can be expressed as an object graph. It reasonable to say that an alternative output format to a table would only be complementary to the table format and not replacing it.
I understand from your answer that the natural (rawish) output format from the database is the result format with duplicates in it ("streams the data out as it comes"). I that case i understand that it's now left to an alternative program (of the dev stack) to do the mapping. So my conclusion on neo4j implementing something like this:
Pro's - not having to do this in every implementation language (of the application)
Con's - 1) no application specific mapping is possible, 2) no performance gain if implementation language is fast
"Even if you use geoff, graphml or the gephi format you have to keep all the data in memory to deduplicate the results."
I don't understand this point entirely, are you saying that these formats are no able to hold deduplicated results (in certain cases)?? So infact that there is no possible textual format with which a graph can be described without duplication??
"There is also the questions on what you want to include in your output?"
I was under the assumption that the cypher language was powerful enough to specify this in the query. And so the output format would have whatever the database can provide as result.
"You could just return the paths that you get, which are unique paths through the graph in themselves".
Useful suggestion, i'll play around with this idea :)
"The dump command of the neo4j-shell uses the approach of pulling the cypher results into an in-memory structure, enriching it".
Does the enriching process fetch additional data from the database or is the data already contained in the initial result?
There is more to it.
First of all as you said tabular results from queries are really commonplace and needed to integrate with other systems and databases.
Secondly oftentimes you don't actually return raw graph data from your queries, but aggregated, projected, sliced, extracted information out of your graph. So the relationships to the original graph data are already lost in most of the results of queries I see being used.
The only time that people need / use the raw graph data is when to export subgraph-data from the database as a query result.
The problem of doing that as a de-duplicated graph is that the db has to fetch all the result data data in memory first to deduplicate, extract the needed relationships etc.
Normally it just streams the data out as it comes and uses little memory with that.
Even if you use geoff, graphml or the gephi format you have to keep all the data in memory to deduplicate the results (which are returned as paths with potential duplicate nodes and relationships).
There is also the questions on what you want to include in your output? Just the nodes and rels returned? Or additionally all the other rels between the nodes that you return? Or all the rels of the returned nodes (but then you also have to include the end-nodes of those relationships).
You could just return the paths that you get, which are unique paths through the graph in themselves:
MATCH p = (n)-[r]-(m)
WHERE ...
RETURN p
Another way to address this problem in Neo4j is to use sensible aggregations.
E.g. what you can do is to use collect to aggregate data per node (i.e. kind of subgraphs)
MATCH (n)-[r]-(m)
WHERE ...
RETURN n, collect([r,type(r),m])
or use the new literal map syntax (Neo4j 2.0)
MATCH (n)-[r]-(m)
WHERE ...
RETURN {node: n, neighbours: collect({ rel: r, type: type(r), node: m})}
The dump command of the neo4j-shell uses the approach of pulling the cypher results into an in-memory structure, enriching it and then outputting it as cypher create statement(s).
A similar approach can be used for other output formats too if you need it. But so far there hasn't been the need.
If you really need this functionality it makes sense to write a server-extension that uses cypher for query specification, but doesn't allow return statements. Instead you would always use RETURN *, aggregate the data into an in-memory structure (SubGraph in the org.neo4j.cypher packages). And then render it as a suitable format (e.g. JSON or one of those listed above).
These could be a starting points for that:
https://github.com/jexp/cypher-rs
https://github.com/jexp/cypher_websocket_endpoint
https://github.com/neo4j-contrib/rabbithole/blob/master/src/main/java/org/neo4j/community/console/SubGraph.java#L123
There are also other efforts, like GraphJSON from GraphAlchemist: https://github.com/GraphAlchemist/GraphJSON
And the d3 json format is also pretty useful. We use it in the neo4j console (console.neo4j.org) to return the graph visualization data that is then consumed by d3 directly.
I've been working with neo4j for a while now and I can tell you that if you are concerned about memory and performances you should drop cypher at all, and use indexes and the other graph-traversal methods instead (e.g. retrieve all the relationships of a certain type from or to a start node, and then iterate over the found nodes).
As the documentation says, Cypher is not intended for in-app usage, but more as a administration tool. Furthermore, in production-scale environments, it is VERY easy to crash the server by running the wrong query.
In second place, there is no mention in the docs of an API method to retrieve the output as a graph-like structure. You will have to process the output of the query and build it.
That said, in the example you give you say that there is only one A and that you know it before the data is fetched, so you don't need to do:
MATCH (A)-->(B) RETURN A, B
but just
MATCH (A)-->(B) RETURN B
(you don't need to receive A three times because you already know these are the nodes connected with A)
or better (if you need info about the relationships) something like
MATCH (A)-[r]->(B) RETURN r
I read in "hadoop design pattern" book, "HBase supports batch queries, so it would be ideal to buffer all the queries we want to execute up to some predetermined size. This constant depends on how many records you can comfortably store in memory before querying HBase."
I tried to search some examples online but couldn't find any, can someone show me the example using java map reduce?
Thanks.
Dan
Is this what you want? You can save HBase Get object in a list and submit the list at the same time. It's a little better than invoke table.get(get) multiple times.
Configuration conf = HBaseConfiguration.create();
pool = new HTablePool(conf, 5);
HTableInterface table = pool.getTable('table');
List<Get> gets = new ArrayList<Get>();
table.get(gets);
Preface: I don't have experience with rules engines, building rules, modeling rules, implementing data structures for rules, or whatnot. Therefore, I don't know what I'm doing or if what I attempted below is way off base.
I'm trying to figure out how to store and process the following hypothetical scenario. To simplify my problem, say that I have a type of game where a user purchases an object, where there could be 1000's of possible objects, and the objects must be purchased in a specified sequence and only in certain groups. For example, say I'm the user and I want to purchase object F. Before I can purchase object F, I must have previously purchased object A OR (B AND C). I cannot buy F and A at the same time, nor F and B,C. They must be in the sequence the rule specifies. A first, then F later. Or, B,C first, then F later. I'm not concerned right now with the span of time between purchases, or any other characteristics of the user, just that they are the correct sequence for now.
What is the best way to store this information for potentially thousands of objects that allows me to read in the rules for the object being purchased, and then check it against the user's previous purchase history?
I've attempted this, but I'm stuck at trying to implement the groupings such as A OR (B AND C). I would like to store the rules in a database where I have these tables:
Objects
(ID(int),Description(char))
ObjectPurchRules
(ObjectID(int),ReqirementObjectID(int),OperatorRule(char),Sequence(int))
But obviously as you process through the results, without the grouping, you get the wrong answer. I would like to avoid excessive string parsing if possible :). One object could have an unknown number of previous required purchases. SQL or psuedocode snippets for processing the rules would be appreciated. :)
It seems like your problem breaks down to testing whether a particular condition has been satisfied.
You will have compound conditions.
So given a table of items:
ID_Item Description
----------------------
1 A
2 B
3 C
4 F
and given a table of possible actions:
ID_Action VerbID ItemID ConditionID
----------------------------------------
1 BUY 4 1
We construct a table of conditions:
ID_Condition VerbA ObjectA_ID Boolean VerbB ObjectB_ID
---------------------------------------------------------------------
1 OWNS 1 OR MEETS_CONDITION 2
2 OWNS 2 AND OWNS 3
So OWNS means the id is a key to the Items table, and MEETS_CONDITION means that the id is a key to the Conditions table.
This isn't meant to restrict you. You can add other tables with quests or whatever, and add extra verbs to tell you where to look. Or, just put quests into your Items table when you complete them, and then interpret a completed quest as owning a particular badge. Then you can handle both items and quests with the same code.
This is a very complex problem that I'm not qualified to answer, but I've seen lots of references to. The fundamental problem is that for games, quests and items and "stats" for various objects can have non-relational dependencies. This thread may help you a lot.
You might want to pick up a couple books on the topic, and look into using LUA as a rules processor.
Personally I would do this in code, not in SQL. Each item should be its own class implementing an interface (i.e. IItem). IItem would have a method called OkToPurchase that would determine if it is OK to purchase that item. To do that, it would use one or more of a collection of rules (i.e. HasPreviouslyPurchased(x), CurrentlyOwns(x), etc.) that you can build.
The nice thing is that it is easy to extend this approach with new rules without breaking all the existing logic.
Here's some pseudocode:
bool OkToPurchase()
{
if( HasPreviouslyPurchased('x') && !CurrentlyOwns('y') )
return true;
else
return false;
}
bool HasPreviouslyPurchased( item )
{
return purchases.contains( item )
}
bool CurrentlyOwns( item )
{
return user.Items.contains( item )
}