AppEngine share list of objects - google-app-engine

I need to share across multiple App Engine instances a list of objects representing connections from client (for example an ID and some parameters).
How can I do this?
I read that Memchace works with keys and values, but what if I would like to iterate over the entire list?

I don't know your detailed requirements and a context, but can you consider a Firestore collection? Depending on your security and latency requirements, the Firestore might be a cheap and quick solution.

Related

Database for web app with multiple clients

In a real-world web app, how does one go about storing data for multiple clients/companies/customers?
Lets assume we have the following collections for one client:
- users
- tasks
How would I extent this system to a second client? Is there a standard approach?
Note: I am using Firestore (no-sql).
We use a separate set of collections for each client. Our data structure works really well for us and looks like this...
/clients/{clientId}/reportingData
/clients/{clientId}/billingData
/clients/{clientId}/privateData
Using security rules, we allow clients to read their reportingData and billingData collections, but not the privateData collection.
However, if you need to query data across multiple clients at the same time (for internal use, for example), then Frank's option 1 would work better, with a clientId field.
We do the same thing with users...
/users/{uid}/publicProfile (anyone can read this, only the user can write)
/users/{uid}/userProfile (only the user can read and write)
/users/{uid}/privateProfile (internal data that the user can't read or write)
You have a few options for implementing such a multi-tenant solution on Cloud Firestore:
Have all clients in a single set of collections
Have a separate set of collections for each client
Have a separate project for each client
There is no approach which is pertinently better or worse.
I recommend that you at least consider having a separate project for each client. Isolating the clients from each other, makes maintenance and (possibly future) billing a lot easier.
Having all clients in a single set of collections is also possible. You'll just have to make sure that clients can't see each other's data. Since your likely accessing the database directly from the client, use security rules to ensure that clients can only access their own data.

Store data permanently to Persistent storage

Can data be permanently stored in Persistent storage, such that anytime the applications loads I can get the last inserted item?
If yes, can I get a code sample or a link to some examples.
Of course! That's how modern databases work.
My advice is to consider storing your data as JSON in MongoDB for maximum portability:
MongoDB is an open-source document database that provides high
performance, high availability, and automatic scaling.
I highly recommend mlab, who offer MongoDB as a service - you can host up to 500MB on one or more databases. Follow the steps in their getting started documentation (5-10 mins). Then download Robomongo and configure it with the connection settings from mlab; you'll be able to view all of your databases, collections (aka tables) and documents (aka datums), and interact with them using either basic point and click, or programmatically using its Mongo Shell interface.
See the developer guide section on storage where the various types of storage options are discussed in depth. There are many samples in that section but you will need to clarify a more exact use case of what you are trying to accomplish.
If you just want to save one variable you can use something like:
Preferences.set("var", val);
String var = Preferences.get("var", defaultValue);

Building personalized feed on App Engine

I have been working on a social app. I'll first explain the problems, and then summarize in the questions below.
In the network, there would be channels, and users. Users can subscribe to these channels, and to other users. This way, we have two sources from which posts can be generated.
Now, we can simply keep one Activity model where we record all the actions, their kind, and what they affect. Be it from channels, or from the users. And refer these while creating a feed for each user.
I found a solution given in a talk by Brett Slatkin which basically suggests using ListProperty to link each post with each subscriber. But Guido suggests not to use lists if there's going to be more than 1000 elements. So if there's going to be more than 1000 subscribers to a channel, this will probably run into problem. Even if this were to work --
I want to rank the posts based on popularity (based on number of votes, comments), and apply some time decay function. More like Reddit. To do so, I will have to keep the Activity in memory, and filter and order it based on ranks for each user. I'll also need to do it periodically since new activities will keep occurring also old activities will gain, or lose their values.
The challenge is -- To keep the data in memory (for processing the feed as well as to keep things fast). I will have to store a copy of each users feed to persistent storage, but if the order of posts is going to be changing, how do I keep track of that in the database?
Also: I have kept my options open -- I will move to AWS if I have to.
To summarize:
Is there a better solution to keep track of subscribers without using Lists? Using something like PostID > SubscriberID in one entity would be very, very expensive and inefficient.
If there's any cost-effective and fast solution to the problem above, how do I deal with the next challenge -- which is to generate a personalized feed? (memory issues - unknown size of memcache)
If I can generate a personalized feed (which will be dynamic, will be changing) how to I keep it in the database?.
I have gone through several articles and I can probably solve first two problems with AWS, but I am trying to stay away from the manual scaling work. If there is no way, I am willing to move to AWS. Even if I move to AWS, I can't think of a solution to the third problem.
Any thoughts, directions, resources would be helpful! Thanks!

Graph Database to Count Direct Relations

I'm trying to graph the linking structure of a web site so I can model how pages on a given domain link to each other. Note I'm not graphing links to sites not on the root domain.
Obviously this graph could be considerable in size. One of the main queries I want to perform is to count how many pages directly link into a given url. I want to run this against the whole graph (shudder) such that I end up with a list of urls and the count of incoming links to that url.
I know one popular way of doing this would be via some kind of map reduce - and I may still end up going that way - however I have a requirement to be able to view this report in (near) realtime which isn't generally map reduce friendly.
I've had a quick look at neo4j and OrientDb. While both of these could model the relationship I want it's not clear if I could query them to generate the report I want. At this point I'm not committed to any particularly technology.
Any help would be greatly appreciated.
Thanks,
Paul
both OrientDB and Neo4J supports Blueprints as common API to make graph operations like traversal, counting, etc.
If I've understood well your use case your graph seems pretty simple: you have a "URL" Vertex that links each other with one type of Edge "Links".
To execute operation against graphs take a look at Gremlin.
You might have a look at structr. It is a open source CMS running on top of Neo4j and exactly has those types of inter-page links.
For getting the number of links pointing to the page you just have to iterate the incoming LINKS_TO links for the current page-node.
What is the use-case for your query ? A popular page list? So it would just contain the top-n pages? You might then try to just start at random places of the graph traverse incoming LINKS_TO relationships to your current node(s) in parallel and put them into a sorting structure, so you always start/continue with the first 20 or so top page-nodes that already have the highest number of incoming links (until they're finished).
Marko Rodriguez has some similar "page-rank" examples in the Gremlin documentation. He's also got several blog posts where he talks about this.
Well with Neo4J you won't be able to split the graph across servers to distribute the load. you could replicate the database to distribute the computation, but then updating will be slow (as you have to replicate the updates). I would attack the problem by updating a count of inbound links to each node as new relationships are added as a property of the node. Neo4J has excellent write performance. Of course you don't need to persist this information because direct relationships are cheap to retrieve (you don't get a collection of all related nodes just an iterator).
You should also take a look at a highly scalable graph database product, such as InfiniteGraph. If you email their technical support I think they will be able to point you at some sample code that does a large part of what you've described here.

How graph databases store data to a persistent storage?

How graph databases store data to a persistent storage?
PKV
I would expect that every implementation of a graph database uses a different approach.
To take one example, look at Neo4j's NeoStore class, and the other kinds of store it refers to. It seems that Neo4j uses multiple files, each containing fixed-length records; one for nodes, one for keys of properties of nodes, one for values of properties of nodes, etc. Records in each contain indexes to refer to records in the others. It seems overcomplicated to me, but it evidently seemed like a good idea to the guys who wrote it!
To know more about how OrientDB stores graphs look at: http://code.google.com/p/orient/wiki/Concepts#Storage

Resources