Related
I have been doing high performance scientific computing in c++ most of my life. I am trying to learn to developing AJAXy web applications. As an exercise, I would like to build something that has a subset of functionality of facebook (profiles, posts with comment threads, friend lists) + the ability to search any post/comment.
I have no experience developing these kind of apps, except minor amount of toying with Google Appengine with GWT+Java and little bit of python. What tools/stack would you suggest using for it? I understand that this a very vague question, but I'd like to get a few opinions and your thought process about how would you go about using it.
How does the choice change, if you want a fast prototype as fast as possible, vs if you are trying to build something that can scale and last a few cycles of feature requests.
To be more specific, I'm lost in questions like, should I consider Drupal, should I consider Lucene for search, Would GWT get me what I want in the UI or would python+django be faster to develop. Probably I should not over think and pick something. But some perspective from others would be nice.
If you have started out with Python, that might be the easiest to get going with, especially since you have some experience with Google's App Engine already. However, if you have spent most of your time working with C++ ... did you know that C++ has at least two different full-stack web frameworks?
CppCMS
Wt (WebToolkit)
Remember, it's what you develop fastest in that makes the difference in the long run. What will slow you down most of all is dealing with what you dislike. So, if long compile times kill you, then try Python, Ruby, PHP, or some other dynamic language. If having code that is less than perfectly optimized (and slower that it could be) is what bothers you most, use C++, C#, or Java instead.
One disadvantage with google app engine is that there is no CMS like drupal or Joomla for google app engine so you're going to have to write your own if you want some of that functionality. The advantages of google app engine however outweigh the disadvantage since you have easier development, easier deployment, won't have to fiddle with phpmyadmin or other ugly sql interfaces, with app engine you also leverage google's huge infrastructure and since it's cloud computing you only pay for what you use. If you want something you as a developer will be most happy about - then I recommend you choose Google App Engine.
I'm interested in web development and by that I mean the bigger projects like facebook or twitter. I know the basics of java, css, php and mysql. I know there is a lot more out there. I read about it. But I don't know what the purpose is and how to put in place.
Things like: Scribe, thrift, casandra, Unix/Linux, shell/perl/python scripting, PostgreSQL, MongoDB, non-relational NoSQL datastores, JVM, nginx
I want to know why they need it, how they use it and what te purpose is.
What I need is a book like technical background of facebook for dummies or so.
Are there any books or websites that explain this from scratch?
Thank you!
EDIT:
Thank you for your answers! You have been very helpful. I was in the assumption, experienced programmers know almost anything about the technology there's used today. But as I read, you can only know so much and I need to figure out which technology to use. I take on the encouragement to start building small. And will take on php and improve my skills from there.
Thanks again!
http://highscalability.com/
This is one of the best sites out there. There are several case studies describing what and why many websites use, and pointers to further references. I would also look at the Google Scalability Conference 2007 talks
http://www.google.com/search?q=Google+Scalability&hl=en&client=firefox-a&hs=YUg&rls=org.mozilla:en-US:official&prmd=v&source=univ&tbs=vid:1&tbo=u&ei=fl4OTPUkorIwueCQxQw&sa=X&oi=video_result_group&ct=title&resnum=4&ved=0CDIQqwQwAw
It's all about choosing the right tool for the job in my eyes. There is so much technology out there it's impossible to learn it all. Just choose the subset that will work for you.
The best place to start is by building small simple websites, and as you come accross problems that you need solved you research the tools needed to solve those problems.
If you attack all of the areas at once, it's going to be overwhelming and you will not get anywhere.
For a general overview on what each of the technologies does, Wikipedia gives a good overview on most technologies.
If you are interested in database content which it seems like you are, a good place to start is reading up on normalisation.
Scribe, thrift, casandra, Unix/Linux, shell/perl/python scripting, PostgreSQL, MongoDB, non-relational NoSQL datastores, JVM, nginx
Those I would search on Wikipedia for to get a quick overview. Facebook is written in PHP/MySQL. There are some books on the subject of creating social networking sites, and some books have gotten decent reviews on Amazon.com, however, I have not read any of them myself.
If I were you, I'd start with PHP/MySQL and sit down and write a simple social network. Break the project down into components and tasks and Google for each challenge you encounter such as sessions, database structure, security, friend structure, and processing POST and GET requests.
You'll learn a lot and you get the big picture. Once you see the big picture, you can take another look at different technologies that are available and then decide which component you could have developed better with other tools. I personally don't think that looking too much into the technology available is good for someone who is still in the beginning stages. Start doing, learn from it, and then your questions become much more specific and a lot of things will make more sense.
The problem you're having is you're looking at smaller, specialty products, and not at larger, more mature technologies. Wikipedia will actually give you a decent overview of most of the medium-and-large projects out there.
Cassandra, Hadoop, Mongo, and NoSQL are all lovely... but they're specialty tools. SQL is a general purpose solution that works for 99% of the sites on the net.
Unix/Linux isn't a specialty tool; you might want to try going to Ubuntu's website and installing Linux, and just using it day-to-day, the way you'd use Windows. When you need to figure out something new, like setting up a webserver, do it on the Linux box and a Windows box, and you'll eventually learn linux pretty darn well.
As far as scripting, O'Reilly makes a great line of books on Bash, Perl, and Python.
JVM is a Java Virtual Machine, which is a core of getting Java code to go. Sun's website has a great set of tutorials on learning Java.
It might be much, much easier to pick a project (or three) that you'd like to learn, and learn some of these by doing. I'd probably suggest learning some SQL before learning the newly established alternatives; that lets you learn the rest of the system, as SQL is pretty easy. Once you've got the rest of the thing solid, try swapping in a NoSQL solution at that point.
There are a lot of frameworks that do a lot of different things. You've named a lot of different things from a lot of different areas. The best way to think of these things is to group them by category. Here's an example:
Suppose you have a laptop and you want to host a website. You'll need the following at a minimum:
1) Web Server software. Two popular options are Microsoft's IIS and Apache Web Server.
That's really all you need. You can set up your www_root folder and load files into it. Assuming everything is configured properly, you can now load HTML pages into that folder and access them through your IP address. Every page you view in your web browser is in HTML format. CSS is a stylesheet language that defines how your HTML will be formatted. You can also start writing Javascript, as most modern browsers support the client-side scripting language.
Chances are you'll want the following as well:
2) Database software. Two popular options are Microsoft's SQL Server and MySQL
3) Server-side scripting. PHP is very popular, as is ASP. You'll need the runtime deployed on your server. Python, Ruby, Perl, etc all fall under this category.
4) Web Application Framework(s). This will provide you with libraries for your language of choice to help develop web applications and websites. CakePHP, Ruby on Rails, and the Google Web Toolkit are examples of web application frameworks.
Additionally, you may want to utilize:
5) Additional libraries. JQuery, for example, is quickly becoming a popular library for Javascript that handles a lot of common tasks for you. Instead of writing complex effects code and what-not yourself, just use the pre-written code in the JQuery library.
6) Data interchange technology. If you are passing a lot of information back and forth, you will likely want to encapsulate this data in a logical format. Ideally, this format would describe the data and allow your applications to easily read/process it following a standard. This is where XML and JSON come into play.
I can't recommend a good book for you to learn this stuff, but I feel that the collective replies to your question here should be more than enough to get you started.
Ultimately, what you need to do is determine what technologies you need, and then choose the right one for the job. Don't go building an application using Ruby on Rails just because it's what Twitter used, but rather choose it because it provides some advantage to you over the other options.
What technologies should I use when designing for a large social website (with a lot of transactions, like twitter)? using open source solutions
- database
- webserver
- os
Twitter uses Ruby-on-rails and Scala
Facebook uses PHP
StackOverflow uses asp.net mvc
As you can see, it doesn't really matter what you choose; all of these sites have lots of traffic, but are based on very different technologies.
What matters most in a social networking sites is the backend, since most of the bottleneck will be from there. You might want to consider No-SQL databases.
Facebook and Twitter use Cassandra
LinkedIn uses Voldemort
There are a few others like:
Hypertable
MongoDB, used by Sourceforge.
CouchDB
As for the programming language, as others have said, it does not matter that much. But if you really can not decide, you might want to consider a non-blocking webserver like Tornado.
Doesn't matter what kind of scripting language you'll choose, as long as you'll heavily utilize memcached. Having the right caching hierarchy is a must.
At the end of the day, this is a matter of personal preference. Twitter uses Ruby on Rails. Wikipedia runs on PHP. Reddit uses a Python library called web.py, but intitially, it was written in Lisp. I would say pick the technologies you are most familiar with.
A good book on optimizing for high performance websites from the Yahoo engineers is High Performance Web Sites: Essential Knowledge for Front-End Engineers. It is nice and short and basically a bulleted guide on the steps to take to make websites faster by optimizing the less well explored front-end.
As Joel says
People all over the world are constantly building web applications using .NET, using Java, and using PHP all the time. None of them are failing because of the choice of technology.
Choose whichever of the "big 3" (.Net, Java or PHP) that you know best - these technologies are known to be scalable, the real question of whether or not your site will scale is how the site is structured and the quality of the code - using whichever framework you are most familiar with gives you the best chance of achieving that.
Any technologies that suite your taste, In your situation I think algorithms is more important.
Technologies, techniques ,
research what other scaled sites have used and done and what the problems they had were less than he successes, there are podcasts on iTunes, talks and interviews on Youtube
look at industry best practices and follow them to a degree
don't take peoples word for it, make sure you see the problem or the success as opposed to the pr glitz about it
avoid obvious things that do not scale vertically or horizontally, database connectivity, sessions - cookies and the like
look at nosql storage as an sql alternative less overhead but less functionality
take care when looking at the language/framework. frameworks come with lots of baggage you do not need, they speed you up initially and slow you down eventually, i.e. you spend more time hacking the framework than building the site, same with languages does it do what you want rather than be trendy, cool to programme in etc.
If you are building something like Facebook, then your choices are a little limited, Facebook made their own PHP Runtime, check HipHop For PHP
I know this is not a programming question per se, but I wanted to get as much input from the SO community on a new project I hope to get started. The project is from being started from scratch and thus every decision for programming languages, databases, frameworks, platforms and what not are up in the air. I'm hoping to get your opinion on the matter, what you feel are the strengths and weaknesses of each option.
Database:
Currently I have the option of using MSSQL or MySQL. While I am leaning towards using MySQL because it is free and most probably has all the features I need. However, there is the possibility of having a lot of hierarchical data and the new hierarchical data type in MSSQL is quite appealing. Does it really simplify matters that much? Also MSSQL supports many more advanced SQL functions that may or may not be useful in the long run. While for development I can get access to Server 2008, multiple licenses as the development team grows and for production, are the costs justified?
Programming Languages:
The project will have a web based front end UI and a server based component that will do some heavy lifting.
For the web based UI, I was thinking of maybe doing Apache/IIS with PHP or IIS with ASP.Net in C#. I'd like to use a good framework to properly utilize good design patterns that should structure the code and development of the app. As well as make modifications in the long run easy to implement. I also want the GUI to look good and don't like the idea of buying .Net controls from component vendors. Instead I prefer the idea of using good CSS, and open sources like YUI and javascript to make the UI sleek.
For the server based component, I was thinking of using C#. I have no real development experience in C++ and I'd like good libraries and sufficient speed is good enough. However, while the web based UI and server based component is loosely coupled, there may be instances where the UI needs to communicate (call methods and what not) with the server based component and I want to pick languages/frameworks that will play nice with each other.
All suggestions on frameworks to incorporate are welcome.
Version Control:
I have had good experiences with SVN and a pretty bad experiences with TFS. I've never worked with GIT. Which do you think is better in terms of features as well as general developer familiarity. I want to pick something that other developers will know and not have trouble with.
I apologize if the questions are bit redundant or I'm not providing enough information or using bad terminology. I plan to edit and improve the question as I get feedback. Thanks!
EDIT:
Who: This would most probably be a startup formed of college students or junior developers. I want the project to utilize technologies that most people are familiar with or are easy to pick up.
What: I'd need hours and days to explain the solution. But in the end when you break it down, its a web based UI (think standard web app to just manage database data) that would be used to knowledgeable clients. The server based component would be very separate except for the fact that it should be able to communicate with the web app.
I can provide more information as required but I would appreciate an opportunity for users to answer and provide their ideas before you hastily close the question.
Obviously it depends a lot on specific requirements, but then again, even with those I probably wouldn't be able to tell for sure!
I've been working on a from-scratch project myself for a couple of months, and have generally found:
Choosing Microsoft for all the layers just goes down much easier (my subjective opinion). For example I would use C# for the UI, the back end, and use MSSQL for the database. Nothing at all wrong with non-Microsoft vendors, I'm no Microsoft fan-boy, I just struggle to get productive with unfamiliar tools. Depends where your experience lies though.
Database: In particular I've found that .NET and MSSQL go easily together. When I started the project I was using a PostgreSQL (because it's free, fully featured and has open-source warm fuzzies). However I abandoned it in favour of MSSQL simply because it was taking me too long to get database work done in an unfamiliar language with unfamiliar tools. Also, I'm not sure MSSQL is so expensive anymore, for example for a web application, MSSQL 2008 Web Edition is pretty damn cheap per-processor I think (only on SPLA licensing though). If you're concerned about database features in a free implementation though, personally I think PostgreSQL has a very full feature set, nicely standardised, and rapidly growing.
UI: I'm pretty inexperienced, but ASP.NET MVC looks far less painful to me than ASP.NET Web Forms. I like PHP too, but again I'd match the UI language with the back-end language, so would recommend .NET.
On frameworks, I'm immersed in DALs at the moment. I like Subsonic for lightweight data, NHibernate for heavy-weight.
I still have a long way to go with my project so perhaps I can only see the short-term benefits and drawbacks at the moment. But in general I would say: use the technologies that you're most comfortable using, as you'll be way more productive and the end result will probably be about the same anyway. If you want to learn new technologies though, and who doesn't? - go ahead, just expect it to take a lot longer.
Didn't want to answer 'cause it's so open ended. But a few points:
Money
First, check out BizSpark. That should take care of any money aspect for 3 years. For a service company, that means not only free VS Team Suite and Office and so on, but free Windows, SQL, etc. If your startup can't afford to spend a bit on MS tech in 3 years, it's probably a bad business. So that takes out licensing.
On a similar note, Sun has Startup Essentials. Could be interesting on the hardware side of things, but I haven't actually competitively priced them versus Dell/HP.
Software
It doesn't sound like you have hard enough requirements to say "oh, this slightly-less-popular software X is perfect for my domain Y and is gonna give me a very big boost". In fact, your project might not be like that at all. Maybe it, technically, is going to be a relatively plain application just pushing data around or whatever. You didn't specify.
For a small startup, personal productivity is probably going to trump any other argument. If your people are excellent in X, then that's one of your top arguments right there.
If you really don't have any particular system you're most comfortable with, be conservative. Stick with .NET or Java, as they'll give you the widest range of useful possibilities.
As far as things like OS and Database, I'm biased, but I think Microsoft will give you platforms that are easier to take advantage of than you'll find elsewhere. For instance, setting up load balancing, clustering, centralized authentication, managing servers (updates, events, etc.) is going to be easier to get going on Windows than it would be on another platform, assuming you're not an expert in either. Configuring SQL Server, even the advanced features, is a piece of cake. (Go time someone who knows neither: Setup a DB mirror in MSSQL and MySQL -- which is going to take more work?) Again, this is all predicated on you not having experts in a particular set of technology.
Don't mix -- whatever you do, stick with the platform. If you go .NET, MSSQL is going to work better with the data providers (or things like Linq-to-SQL). If you decide to do PHP, then use MySQL as everyone else uses it and you'll encounter less resistance. If you're not inventing stuff on the technical side, don't become an edge case.
You should pick the platform first, then the language that is best for that platform (if there is any choice).
One thing you should consider is the labor pool, and labor pool cost, for specific platforms and languages. Human Resources can often get cost metrics, if you don't have ideas already.
In my town, for example, .NET platform is much more expensive per Software Engineer than open source, because the .NET developers have a higher rate (40% roughly). C# is a little higher rate than VB.NET, but also tends to bring more well rounded candidates.
Just to throw in something totally different: How about weblocks as a web framework? It uses Hunchentoot as a server, which can run either standalone or with Apache. This is all done in Common Lisp. Weblocks can use cl-sql as a backend store, which can connect to many different RDBMs (MySQL, PostgreSQL, Oracle, ODBC, SQLite).
Their roadmap says their next release will be in March 2009, and that they'll be adding a new 'runtime language'. I'm hoping its either Java or PHP but realy not sure, and would like to know which language is the most probable so i can plan accordingly for a project I plan on hosting with google app engine.
Any ideas?
I'd say Java, if only for the reason Android (or, at least, the SDK) is written in Java and they went to the trouble of writing their own interpreter/VM.
If not Java, then Ruby would be my guess. Not sure why, but it feels like a good fit.
I would say that you have to look at a few factors:
The language needs to:
be sandboxable
be controllable
be expandable
be different from python
appeal to people who want to write massively scalable applications
can be run on developer computers easily
run on Linux
Sandboxable
The language must be safe to run on Google servers. Portions of the language/VM/modules|libraries must be able to be disabled and/or replaced.
Controllable
Notice how Google uses languages that are not controlled by companies?
Python's BDFL GvR works for Google.
Dunno about Javascript.
Java is open-sourced enough for their taste I suppose.
So the language evolution must allow Google's input at the very least.
Expandable
Google needs to be able to add stuff to the language, and that nearly implies an open-source language. I don't think they are interested in doing an internal fork of an existing language.
Different from Python
Python is mature, easy to learn, and powerful. The new language would have to have significant differences with python, otherwise, why not just use Python. Maybe a very functional language?
Appeal to massive scalability
Execution time would not be necessarily critical, but the language must be able to support easy start and stop, easy provisioning to other servers, and appeal to the sort of people who are into writing massively scalable applications.
Developer computers
The language needs to be able to be easy to install, maintain, and develop for on Windows, Mac, and Linux. It has to be either fully manageable with text editors or already have rock solid tools for editing and managing on these platforms.
Linux
Google servers would run the programs, so these must be able to be safely transferred on google servers and run there, and must be able to be controllable by the Google App Engine load-balancer, so they need to be unixy.
Brainstorming
I don't think it will be Java (too heavy, hard to modify VM), php (too leaky), ruby (hard to modify VM), C++ (can't be sandboxed(that I know of)). I don't think it would be JavaScript either, because it's hard to modularize, and it's not an easy language to learn. That rules out Lisp as well--the hard-to-learn part.
So something else.
Remember though that they want adoption of the tool, and they need a language that would be adoptable by a lot of people and a lot of businesses.
So I lean to C# with mono. I think that makes the most sense. I know it sounds scary but lately the developers of the language are looking at changing C# quite a bit, to incorporate python-like dynamic typing, that sort of thing.
Conclusion
So that's what I think. And if they can pull that off, they will be able to leapfrog the competition. Mono is under MIT X11 license (as of April 2008), and I guess Miguel de Icaza can be hired by Google in the future, along with key team members.
So my prediction is C#.
Languages used for production code inside Google are limited to C++, Java, Python, and JavaScript.
Apps Engine already runs Python, so what's next?
It's most likely JavaScript. I recall Steve Yegge working on a Rails equivalent for JavaScript. See Stevey's Blog Rants: Rhino on Rails.
Java is less likely, but possible. Java servlet containers tend to be heavy-weight.
C++ is possible (Native Client and Chrome are two examples of sandboxed C++ code), but unlikely at this point.
I would say Java too, so they can support Ruby with JRuby, compatible with Python with Jython, Groovy and so on.
My guess is C# just to stick it to Microsoft.
Yup, JavaScript.
Why?
First, it fits. While there are obvious architectural differences (notably the OOP system) between Python and JavaScript, they are closer than they are farther apart, so converting the GAE Python API to A JS API should not be a dramatic leap in design or implementation. In the end, the JS API will likely have much the same flavor of the Python API.
Second, safety. The JS runtime idiom is identical to the Python idiom in that effectively you're going to have JS processes running independently from each other for each request. That is, the classic Apache forking model.
As a hosting service, this model is extremely robust and much, much easier to control than something like Java. What you lose in efficiency via a threaded implementation, you gain by simply being Google with a gazillion machines. At Googles scale, administrative overhead trumps performance every day of the week. Simpler and more robust is better, and that's what the process model is.
Third, technology speed. JS is moving VERY quickly right now. Look at the larger number of commercial enterprises writing JS interpreter/compiler/runtimes, as well as the advancements of the language itself. JS script has rushed to the front with a vengeance.
Finally, popularity.
While not popular on the server side, JS is still likely the most deployed language in the world, and thereby the most accessible language in the world. Every hack web designer on the planet is becoming a JS programmer, whether they like it or not.
Now, I don't know how many web designers you've met, but most of the ones I have met are NOT programmers. So, adopting JS for them is going to be a cut and paste and painful experience for them, but it's pretty much a requirement for the modern web. Taking that skill to push back and do some lightweight processing on the back end, in the SAME LANGUAGE, will be a boon to these people. Do not discount the power of familiarity in a normally scary environment (and despite the advances, computers are still "scary" to the vast majority of the population).
JS, it's not a toy any more, it's a sleeping giant. Really.
JRuby on Rails.
Already works with Python. There have been rumors about PHP, which is logical choice considering it's popularity.
I'm going to throw in my 2 cents on Java as well. They have a heavy number of tools already written in Java (GWT anyone? etc. etc.)
Though, Javascript would be most intriguing.
I`ve heard once that Google likes Python the most!