In retrospect, best open source stack/tools to build facebook - google-app-engine

I have been doing high performance scientific computing in c++ most of my life. I am trying to learn to developing AJAXy web applications. As an exercise, I would like to build something that has a subset of functionality of facebook (profiles, posts with comment threads, friend lists) + the ability to search any post/comment.
I have no experience developing these kind of apps, except minor amount of toying with Google Appengine with GWT+Java and little bit of python. What tools/stack would you suggest using for it? I understand that this a very vague question, but I'd like to get a few opinions and your thought process about how would you go about using it.
How does the choice change, if you want a fast prototype as fast as possible, vs if you are trying to build something that can scale and last a few cycles of feature requests.
To be more specific, I'm lost in questions like, should I consider Drupal, should I consider Lucene for search, Would GWT get me what I want in the UI or would python+django be faster to develop. Probably I should not over think and pick something. But some perspective from others would be nice.

If you have started out with Python, that might be the easiest to get going with, especially since you have some experience with Google's App Engine already. However, if you have spent most of your time working with C++ ... did you know that C++ has at least two different full-stack web frameworks?
CppCMS
Wt (WebToolkit)
Remember, it's what you develop fastest in that makes the difference in the long run. What will slow you down most of all is dealing with what you dislike. So, if long compile times kill you, then try Python, Ruby, PHP, or some other dynamic language. If having code that is less than perfectly optimized (and slower that it could be) is what bothers you most, use C++, C#, or Java instead.

One disadvantage with google app engine is that there is no CMS like drupal or Joomla for google app engine so you're going to have to write your own if you want some of that functionality. The advantages of google app engine however outweigh the disadvantage since you have easier development, easier deployment, won't have to fiddle with phpmyadmin or other ugly sql interfaces, with app engine you also leverage google's huge infrastructure and since it's cloud computing you only pay for what you use. If you want something you as a developer will be most happy about - then I recommend you choose Google App Engine.

Related

Which web solution should I use for my project?

I'm going to create a fairly large (from my point of view anyway) web project with a friend. We will create a site with roads and other road related info.
Our calculations is that we will have around 100k items in our database. Each item will contain some information like location, name etc. (about 30 thing each). We are counting on having a few hundred thousand unique visitors per month.
The 100k items and their locations (that will be searchable) will be the main part of the page but we will also have some articles, comments, news and later on some more social functions (accounts, forums, picture uploads etc.).
We were going to use Google AppEngine to develop our project since it is really scalable and free (at least for a while). But I'm actually starting to doubt that AppEngine is right for us. It seems to be for webbapps and not sites like ours.
Which system (language/framework etc.) would you guys recommend us to use? It doesn't really mater if we know the language since before (we like learning new stuff) but it would be good if it's something that is future proof.
I think that GAE can do the job. Google claims that Google App Engine is able to handle 5 million visitors for free and you will have to start paying only if you exceed their free quota.
It's also pretty easy to get started. If you don't have experience on administrating websites and choose a regular hosting service, you will have to worry about several things that you don't even imagine now.
My only concern would be with respect of the kind of data and queries you will have to do, since it does not have a relational database. Anyway, there is an open source project for GAE, called GeoModel that gives GAE the ability to do complex geo spacial queries, like proximity fetch. Have a look at their tutorial and the demo app.
About your impression that GAE was intended only for small web apps, there are a couple of CMS that run on it.
Good luck!
If once of your concerns is scalability, and you don't want to depend on expensive or commercial tools, I would recommend that you take a look at this tech stack:
Erlang - A programming language designed for concurrency and distribution.
Nitrogen - An Erlang web framework with a lot of cool stuff, like transparent AJAX.
NoSQL scalable databases, such as CouchDB or Riak - Save the the hassle of SQL code and are more scalable than plain MySQL. Both has direct native Erlang API.
To be honest, I don't know if this tool set is your cup of tea; These are not mainstream solutions. I just suggest these to everyone who ask about size-sensitive web applications.
All serious web frameworks will provide you with what you need. The real issues (for example scalability) might be tackled in a different way depending on what you use, but you wont be limited if you choose a well-known one. The choice of database system might be more important for that (sql vs nosql), even if both of those will do fine too.
It's all about
knowing how to use
enjoying to use
the tool(s) you've chosen.
In either case, name-dropping some suggestions:
Rails (Ruby)
Django (Python)
Nitrogen (Erlang)
ASP.NET MVC (C#)
And please note, if you really want to learn everything from the bottom, you'd be fine with any of these (or one of the other gazillion out there). But if you want to perform your best, choose one that supports a language you know well or uses techniques/tools you have experience of etc. Think twice about how you value this is fun and we learn a lot against we want to be productive and do a really good job.

Pointers towards developing a quick and dirty business app

Some people have approached me lately about creating a business app for them (I'm a computer tech student specializing in programming, with a bit of experience in systems and driver programming) and it does sound simple, but I don't really have much of an idea how or where to start.
It should be a small-ish app with a database backend. Basically keeping track of invoices, clients, products and the attached data.
Are there any APIs that would make creating such an application much faster and easier? Platform isn't really an issue. I have a Mac, a Windows PC, and I am somewhat well-versed in linux in general, and the client will move to a platform of my choice.
I know very little MySQL, I know Objective C, C and a few others, but building a database product this way seems like a very complicated endeavour considering that a large amount of the code I'll be writing has probably been written before and by better programmers than I.
EDIT : If possible, I would definitely like not having to play around with web frameworks. This is not to say I don't want to see them, it's just that I'm not used at all to the web development model.
I would suggest that you look into Ruby on Rails for soemthing like this. It will take care of a lot of the low level details of database access for you and because it is built around the Model-View-Controller paradigm, it will take away some of the architectural decision from you and make you focus on getting the app done. Using Ruby on Rails, I've built a couple of sites of smallish scale that sound like what you have done in no time at all.
For quick and dirty, I suggest Ruby on Rails (if you fancy a bit of Ruby), or Grails (if you fancy a bit of Java/Groovy, and is essentially the Java platform equivalent of RoR).

High performance site

What technologies should I use when designing for a large social website (with a lot of transactions, like twitter)? using open source solutions
- database
- webserver
- os
Twitter uses Ruby-on-rails and Scala
Facebook uses PHP
StackOverflow uses asp.net mvc
As you can see, it doesn't really matter what you choose; all of these sites have lots of traffic, but are based on very different technologies.
What matters most in a social networking sites is the backend, since most of the bottleneck will be from there. You might want to consider No-SQL databases.
Facebook and Twitter use Cassandra
LinkedIn uses Voldemort
There are a few others like:
Hypertable
MongoDB, used by Sourceforge.
CouchDB
As for the programming language, as others have said, it does not matter that much. But if you really can not decide, you might want to consider a non-blocking webserver like Tornado.
Doesn't matter what kind of scripting language you'll choose, as long as you'll heavily utilize memcached. Having the right caching hierarchy is a must.
At the end of the day, this is a matter of personal preference. Twitter uses Ruby on Rails. Wikipedia runs on PHP. Reddit uses a Python library called web.py, but intitially, it was written in Lisp. I would say pick the technologies you are most familiar with.
A good book on optimizing for high performance websites from the Yahoo engineers is High Performance Web Sites: Essential Knowledge for Front-End Engineers. It is nice and short and basically a bulleted guide on the steps to take to make websites faster by optimizing the less well explored front-end.
As Joel says
People all over the world are constantly building web applications using .NET, using Java, and using PHP all the time. None of them are failing because of the choice of technology.
Choose whichever of the "big 3" (.Net, Java or PHP) that you know best - these technologies are known to be scalable, the real question of whether or not your site will scale is how the site is structured and the quality of the code - using whichever framework you are most familiar with gives you the best chance of achieving that.
Any technologies that suite your taste, In your situation I think algorithms is more important.
Technologies, techniques ,
research what other scaled sites have used and done and what the problems they had were less than he successes, there are podcasts on iTunes, talks and interviews on Youtube
look at industry best practices and follow them to a degree
don't take peoples word for it, make sure you see the problem or the success as opposed to the pr glitz about it
avoid obvious things that do not scale vertically or horizontally, database connectivity, sessions - cookies and the like
look at nosql storage as an sql alternative less overhead but less functionality
take care when looking at the language/framework. frameworks come with lots of baggage you do not need, they speed you up initially and slow you down eventually, i.e. you spend more time hacking the framework than building the site, same with languages does it do what you want rather than be trendy, cool to programme in etc.
If you are building something like Facebook, then your choices are a little limited, Facebook made their own PHP Runtime, check HipHop For PHP

Web Scraping with Google App Engine

I am trying to scrape some website and republish the data as a RSS feed. How hard is this to setup with Google App Engine? Disadvantages and Advantages using GAE. Any recommendations and guidelines greatly appreciated!
Google AppEngine offers much more functionality (and complexity) than you will need if truly all you will want to do is republish some structured data as RSS.
Personally, I would use something like Yahoo pipes for a task like this.
That being said... if you want/need to get your feet wet with GAE, go for it!
Working with Google App Engine is pretty straight forward. I would recommend going through the Getting Started guide. It's short and simple and touches on essential GAE topics. There are more pros and cons than I will list here.
Pros:
In general, App Engine is designed for high traffic web applications that need to scale. Furthermore, it is designed from a programmer's perspective. Much of the scalability issues (database optimization, server administration, etc) are dealt with by Google. Having said that, I find it to be a nice platform. It is still being actively developed by Google engineers, and scheduling of tasks (a feature that has been long requested) is in the current road map.
Cons:
Perhaps the biggest downside right now is again the lack of official scheduling support and the quota limits currently set for free accounts. However you can't complain much if its free. Currently it only supports Python as a programming interface (although a new language [Java I predict] is coming soon). Furthermore, Python 2.6 (and 3.0 for that matter) are not yet supported. In addition, Django 1.0 is not officially supported in App Engine (although you can package Django 1.0 with your application).
Harder than it would be in most other technologies.
GAE can sort of do scheduled batch stuff like this now, but it's really not intended for that type of thing. Pick pretty much any other language and platform for this particular task, and you'll make your life a lot easier.
I think BeautifulSoup could run on GAE, so all your scraping needs are handled :D
Also, GAE has a geturl thingy. The only problem I think you might have is not having enough time to get the data (30 secs limitation).
I am working on a same project and I've decided that it's easier to prepare the data on another server and push them to GAE.
You might also want to look into Yahoo! Query Language (YQL)

Which programming language Google app engine is most likely to work with next and why?

Their roadmap says their next release will be in March 2009, and that they'll be adding a new 'runtime language'. I'm hoping its either Java or PHP but realy not sure, and would like to know which language is the most probable so i can plan accordingly for a project I plan on hosting with google app engine.
Any ideas?
I'd say Java, if only for the reason Android (or, at least, the SDK) is written in Java and they went to the trouble of writing their own interpreter/VM.
If not Java, then Ruby would be my guess. Not sure why, but it feels like a good fit.
I would say that you have to look at a few factors:
The language needs to:
be sandboxable
be controllable
be expandable
be different from python
appeal to people who want to write massively scalable applications
can be run on developer computers easily
run on Linux
Sandboxable
The language must be safe to run on Google servers. Portions of the language/VM/modules|libraries must be able to be disabled and/or replaced.
Controllable
Notice how Google uses languages that are not controlled by companies?
Python's BDFL GvR works for Google.
Dunno about Javascript.
Java is open-sourced enough for their taste I suppose.
So the language evolution must allow Google's input at the very least.
Expandable
Google needs to be able to add stuff to the language, and that nearly implies an open-source language. I don't think they are interested in doing an internal fork of an existing language.
Different from Python
Python is mature, easy to learn, and powerful. The new language would have to have significant differences with python, otherwise, why not just use Python. Maybe a very functional language?
Appeal to massive scalability
Execution time would not be necessarily critical, but the language must be able to support easy start and stop, easy provisioning to other servers, and appeal to the sort of people who are into writing massively scalable applications.
Developer computers
The language needs to be able to be easy to install, maintain, and develop for on Windows, Mac, and Linux. It has to be either fully manageable with text editors or already have rock solid tools for editing and managing on these platforms.
Linux
Google servers would run the programs, so these must be able to be safely transferred on google servers and run there, and must be able to be controllable by the Google App Engine load-balancer, so they need to be unixy.
Brainstorming
I don't think it will be Java (too heavy, hard to modify VM), php (too leaky), ruby (hard to modify VM), C++ (can't be sandboxed(that I know of)). I don't think it would be JavaScript either, because it's hard to modularize, and it's not an easy language to learn. That rules out Lisp as well--the hard-to-learn part.
So something else.
Remember though that they want adoption of the tool, and they need a language that would be adoptable by a lot of people and a lot of businesses.
So I lean to C# with mono. I think that makes the most sense. I know it sounds scary but lately the developers of the language are looking at changing C# quite a bit, to incorporate python-like dynamic typing, that sort of thing.
Conclusion
So that's what I think. And if they can pull that off, they will be able to leapfrog the competition. Mono is under MIT X11 license (as of April 2008), and I guess Miguel de Icaza can be hired by Google in the future, along with key team members.
So my prediction is C#.
Languages used for production code inside Google are limited to C++, Java, Python, and JavaScript.
Apps Engine already runs Python, so what's next?
It's most likely JavaScript. I recall Steve Yegge working on a Rails equivalent for JavaScript. See Stevey's Blog Rants: Rhino on Rails.
Java is less likely, but possible. Java servlet containers tend to be heavy-weight.
C++ is possible (Native Client and Chrome are two examples of sandboxed C++ code), but unlikely at this point.
I would say Java too, so they can support Ruby with JRuby, compatible with Python with Jython, Groovy and so on.
My guess is C# just to stick it to Microsoft.
Yup, JavaScript.
Why?
First, it fits. While there are obvious architectural differences (notably the OOP system) between Python and JavaScript, they are closer than they are farther apart, so converting the GAE Python API to A JS API should not be a dramatic leap in design or implementation. In the end, the JS API will likely have much the same flavor of the Python API.
Second, safety. The JS runtime idiom is identical to the Python idiom in that effectively you're going to have JS processes running independently from each other for each request. That is, the classic Apache forking model.
As a hosting service, this model is extremely robust and much, much easier to control than something like Java. What you lose in efficiency via a threaded implementation, you gain by simply being Google with a gazillion machines. At Googles scale, administrative overhead trumps performance every day of the week. Simpler and more robust is better, and that's what the process model is.
Third, technology speed. JS is moving VERY quickly right now. Look at the larger number of commercial enterprises writing JS interpreter/compiler/runtimes, as well as the advancements of the language itself. JS script has rushed to the front with a vengeance.
Finally, popularity.
While not popular on the server side, JS is still likely the most deployed language in the world, and thereby the most accessible language in the world. Every hack web designer on the planet is becoming a JS programmer, whether they like it or not.
Now, I don't know how many web designers you've met, but most of the ones I have met are NOT programmers. So, adopting JS for them is going to be a cut and paste and painful experience for them, but it's pretty much a requirement for the modern web. Taking that skill to push back and do some lightweight processing on the back end, in the SAME LANGUAGE, will be a boon to these people. Do not discount the power of familiarity in a normally scary environment (and despite the advances, computers are still "scary" to the vast majority of the population).
JS, it's not a toy any more, it's a sleeping giant. Really.
JRuby on Rails.
Already works with Python. There have been rumors about PHP, which is logical choice considering it's popularity.
I'm going to throw in my 2 cents on Java as well. They have a heavy number of tools already written in Java (GWT anyone? etc. etc.)
Though, Javascript would be most intriguing.
I`ve heard once that Google likes Python the most!

Resources