I am experimenting with App engine. One of my stumbling blocks has been the support for managed relations or lack there off, this is further compounded by the lack of join support.
Without getting into details of the issues I have run into (which I will post under different topic), I would like to ask two things.
1. Has any one of you used managed relations in something substantial. If so if you can share some best practices that will help.
2. Is there any good comprehensive example(s) that you have come across which you can point me at.
Thanks in advance.
I think this answer might disappoint you, but before you develop on the app engine you should read it anyway, and confirm this in the docs.
No. No one on the app engine has used managed relations for anything 'substantial', simply because Bigtable is not built for managed relations. It is a sharded and sorted array, and as such is a very different kind of data structure than what you would normally use.
Now there are attempts to set up managed relationships - the GAE/Java team is pushing JDO features that come close to this, and there's more info on this blog, but this simply isn't the natural state of things on the app engine, and you'll very quickly run into problems if you decide to spend too much time wrapping yourself in a leaky abstraction.
Its a lot easier to actually look at what bigtable really is - there are a ton of videos on the google i/o pages for 2010 and 2009 that do a fantastic job of explaining that, and then figure out ways to map your problem according to the capabilities of the datastore. It may sound unreasonable, but think about it... the GAE is a tool that can do certain things exceedingly well, and if you can figure out your problem in terms of ideas like object stores, sets, merge joins, task queues, pre-computation and caching, then you can use this tool to kick ass.
Related
So I'm designing this blog engine and I'm trying to just keep my blog data without considering comments or membership system or any other type of multi-user data.
The blog itself is surrounded around 2 types of data, the first is the actual blog post entry which consists of: title, post body, meta data (mostly dates and statistics), so it's really simple and can be represented by simple json object. The second type of data is the blog admin configuration and personal information. Comment system and other will be implemented using disqus.
My main concern here is the ability of such engine to scale with spiked visits (I know you might argue this but lets take it for granted). So since I've started this project I'm moving well with the rest of my stack except the data layer. Now I've been having this dilemma choosing the database, I've considered MongoDB but some reviews and articles/benchmarking were suggesting slow reads after collections read certain size. Next I was looking at Redis and using its persistence features RDB and AOF, while Redis is good at both fast reading/writing I'm afraid of using it because I'm not familiar with it. And this whole search keeps going on to things like "PostgreSQL 9.4 is now faster than MongoDB for storing JSON documents" etc.
So is there any way I can settle this issue for good? considering that I only need to represent my data in key,value structure and only require fast reading but not writing and the ability to be fault tolerant.
Thank you
If I were you I would start small and not try to optimize for big data just yet. A lot of blogs you read about the downsides of a NoSQL solution are around large data sets - or people that are trying to do relational things with a database designed for de-normalized data.
My list of databases to consider:
Mongo. It has huge community support and based on recent funding - it's going to be around for a while. It runs very well on a single instance and a basic replica set. It's easy to set up and free, so it's worth spending a day or two running your own tests to settle the issue once and for all. Don't trust a blog.
Couchbase. Supports key/value storage and also has persistence to disk. http://www.couchbase.com/couchbase-server/features Also has had some recent funding so hopefully that means stability. =)
CouchDB/PouchDB. You can use PouchDB purely on the client side and it can connect to a server side CouchDB. CouchDB might not have the same momentum as Mongo or Couchbase, but it's an actively supported product and does key/value with persistence to disk.
Riak. http://basho.com/riak/. Another NoSQL that scales and is a key/value store.
You can install and run a proof-of-concept on all of the above products in a few hours. I would recommend this for the following reasons:
A given database might scale and hit your points, but be unpleasant to use. Consider picking a database that feels fun! Sort of akin to picking Ruby/Python over Java because the syntax is nicer.
Your use case and domain will be fairly unique. Worth testing various products to see what fits best.
Each database has quirks and you won't find those until you actually try one. One might have quirks that are passable, one will have quirks that are a show stopper.
The benefit of trying all of them is that they all support schemaless data, so if you write JSON, you can use all of them! No need to create objects in your code for each database.
If you abstract the database correctly in code, swapping out data stores won't be that painful. In other words, your code will be happier if you make it easy to swap out data stores.
This is only an option for really simple CMSes, but it sounds like that's what you're building.
If your blog is super-simple as you describe and your main concern is very high traffic then the best option might be to avoid a database entirely and have your CMS generate static files instead. By doing this, you eliminate all your database concerns completely.
It's not the best option if you're doing anything dynamic or complex, but in this small use case it might fit the bill.
I'm building a website that will rely on heavy computations to make guess and suggestion on objects of objects (considering the user preferences and those of users with similar profiles). Right now I'm using MongoDB for my projects, but I suppose that I'll have to go back to SQL for this one.
Unfortunately my knowledge on the subject is high school level. I know that there are a lot of relational databases, and was wondering about what could have been some of the most appropriate for this kind of heavily dynamic cluster analysis. Also I would really appreciate some suggestion regarding possible readings (would be really nice if free and online, but I won't mind reading a book. Just maybe not a 1k pages one if possible).
Thanks for your help, extremely appreciated.
Recommondations are typically a graph like problem, so you should also consider looking into graph databases, e.g. Neo4j
I have a traditional RDBMS based PHP app that I need to convert over to GAE and would like to properly learn how BigTable works prior to doing this. However, I'd kinda like to do it through sample problems or examples that show the maximal way to think about and utilize a non RDBMS platform such as BigTable...
It seems that this would be the best route to take prior to just jumping in and screwing some things up in a one-to-one conversion that would likely happen by the both feet in first method.
Anyone able to recommend a good starting path that perhaps helped you or something of this nature that will properly initiate someone with App Engine and BigTable?
A good way is to see the sources codes of a good projects running in GAE like jaikuengine and rietveld.
For articles, Google IO 2009 and 2010 and GAE articles offer a good resource.
Also you can learn a Column-oriented Database in Wikipedia and see all other projects like cassandra...
I would recommend having a play with the App Engine Cookbook to see how things work. It has some really good examples and has helped me a lot when trying to understand the DataStore
http://appengine-cookbook.appspot.com/cat/?id=ahJhcHBlbmdpbmUtY29va2Jvb2tyFwsSCENhdGVnb3J5IglEYXRhc3RvcmUM
I'm going to create a fairly large (from my point of view anyway) web project with a friend. We will create a site with roads and other road related info.
Our calculations is that we will have around 100k items in our database. Each item will contain some information like location, name etc. (about 30 thing each). We are counting on having a few hundred thousand unique visitors per month.
The 100k items and their locations (that will be searchable) will be the main part of the page but we will also have some articles, comments, news and later on some more social functions (accounts, forums, picture uploads etc.).
We were going to use Google AppEngine to develop our project since it is really scalable and free (at least for a while). But I'm actually starting to doubt that AppEngine is right for us. It seems to be for webbapps and not sites like ours.
Which system (language/framework etc.) would you guys recommend us to use? It doesn't really mater if we know the language since before (we like learning new stuff) but it would be good if it's something that is future proof.
I think that GAE can do the job. Google claims that Google App Engine is able to handle 5 million visitors for free and you will have to start paying only if you exceed their free quota.
It's also pretty easy to get started. If you don't have experience on administrating websites and choose a regular hosting service, you will have to worry about several things that you don't even imagine now.
My only concern would be with respect of the kind of data and queries you will have to do, since it does not have a relational database. Anyway, there is an open source project for GAE, called GeoModel that gives GAE the ability to do complex geo spacial queries, like proximity fetch. Have a look at their tutorial and the demo app.
About your impression that GAE was intended only for small web apps, there are a couple of CMS that run on it.
Good luck!
If once of your concerns is scalability, and you don't want to depend on expensive or commercial tools, I would recommend that you take a look at this tech stack:
Erlang - A programming language designed for concurrency and distribution.
Nitrogen - An Erlang web framework with a lot of cool stuff, like transparent AJAX.
NoSQL scalable databases, such as CouchDB or Riak - Save the the hassle of SQL code and are more scalable than plain MySQL. Both has direct native Erlang API.
To be honest, I don't know if this tool set is your cup of tea; These are not mainstream solutions. I just suggest these to everyone who ask about size-sensitive web applications.
All serious web frameworks will provide you with what you need. The real issues (for example scalability) might be tackled in a different way depending on what you use, but you wont be limited if you choose a well-known one. The choice of database system might be more important for that (sql vs nosql), even if both of those will do fine too.
It's all about
knowing how to use
enjoying to use
the tool(s) you've chosen.
In either case, name-dropping some suggestions:
Rails (Ruby)
Django (Python)
Nitrogen (Erlang)
ASP.NET MVC (C#)
And please note, if you really want to learn everything from the bottom, you'd be fine with any of these (or one of the other gazillion out there). But if you want to perform your best, choose one that supports a language you know well or uses techniques/tools you have experience of etc. Think twice about how you value this is fun and we learn a lot against we want to be productive and do a really good job.
Just came across FlockDB graph database. Details at github /flockDB. Twitter claims it uses FlockDB for the following:
Twitter runs FlockDB on a large cluster of machines.
we use it to store social graphs (who
follows whom, who blocks whom) and
secondary indices at twitter.
At first glance, setup and trying it doesn't look straight forward. Have anyone already used it / setup this? If so, please answer the following general queries.
What kind of applications is it
better suited for? (Twitter claims it
is simple and very rough, it remains
to see what it meant though)
How is FlockDB better than other graph db /
noSQL db. Have you setup FlockDB,
used it for a application?
Early advices any?
Note: I am evaluating the FlockDB and other graph databases mainly for learning them. Perhaps, I will build an application for that.
Flockdb is still Yet to be released by Twitter, which means the current version you are seeing won't run properly. Going by the history of commits i guess within a couple of days you can see a stable version which you can build and test.
Compared to something like Neo4J you can say Flockdb is not even a graph database. The toughest part of a graph database is how many levels of depth it can handle. From the little documentation of Flockdb it seems like it can't handle more than 1 level of depth. Where FlockDb wins compared to DBs like Neo4J is it's low latency, high throughput and inherent distributed nature.
Regarding Applications - i guess it will be a great fit whenever you need social networking or twitter like behavior. I don't think many will find such use cases though (who gets 20k friend requests per sec ?).
I Just started looking into Flockdb. Right now i am planning to use it in my forum software. Instead of user1 follows user2 relationship, i am planning to use it for user1 read post1, user1 favorites post1 etc. Being one of the highly active online communities we get a lot of such traffic(read/favorite). Can't think of any other use cases now.
Don't miss OrientDB. It's a document-graph dbms with special operator for traversing relationships: http://code.google.com/p/orient/wiki/GraphDatabase