I know the big picture but can't put it in place - database

I'm interested in web development and by that I mean the bigger projects like facebook or twitter. I know the basics of java, css, php and mysql. I know there is a lot more out there. I read about it. But I don't know what the purpose is and how to put in place.
Things like: Scribe, thrift, casandra, Unix/Linux, shell/perl/python scripting, PostgreSQL, MongoDB, non-relational NoSQL datastores, JVM, nginx
I want to know why they need it, how they use it and what te purpose is.
What I need is a book like technical background of facebook for dummies or so.
Are there any books or websites that explain this from scratch?
Thank you!
EDIT:
Thank you for your answers! You have been very helpful. I was in the assumption, experienced programmers know almost anything about the technology there's used today. But as I read, you can only know so much and I need to figure out which technology to use. I take on the encouragement to start building small. And will take on php and improve my skills from there.
Thanks again!

http://highscalability.com/
This is one of the best sites out there. There are several case studies describing what and why many websites use, and pointers to further references. I would also look at the Google Scalability Conference 2007 talks
http://www.google.com/search?q=Google+Scalability&hl=en&client=firefox-a&hs=YUg&rls=org.mozilla:en-US:official&prmd=v&source=univ&tbs=vid:1&tbo=u&ei=fl4OTPUkorIwueCQxQw&sa=X&oi=video_result_group&ct=title&resnum=4&ved=0CDIQqwQwAw

It's all about choosing the right tool for the job in my eyes. There is so much technology out there it's impossible to learn it all. Just choose the subset that will work for you.

The best place to start is by building small simple websites, and as you come accross problems that you need solved you research the tools needed to solve those problems.
If you attack all of the areas at once, it's going to be overwhelming and you will not get anywhere.
For a general overview on what each of the technologies does, Wikipedia gives a good overview on most technologies.
If you are interested in database content which it seems like you are, a good place to start is reading up on normalisation.

Scribe, thrift, casandra, Unix/Linux, shell/perl/python scripting, PostgreSQL, MongoDB, non-relational NoSQL datastores, JVM, nginx
Those I would search on Wikipedia for to get a quick overview. Facebook is written in PHP/MySQL. There are some books on the subject of creating social networking sites, and some books have gotten decent reviews on Amazon.com, however, I have not read any of them myself.
If I were you, I'd start with PHP/MySQL and sit down and write a simple social network. Break the project down into components and tasks and Google for each challenge you encounter such as sessions, database structure, security, friend structure, and processing POST and GET requests.
You'll learn a lot and you get the big picture. Once you see the big picture, you can take another look at different technologies that are available and then decide which component you could have developed better with other tools. I personally don't think that looking too much into the technology available is good for someone who is still in the beginning stages. Start doing, learn from it, and then your questions become much more specific and a lot of things will make more sense.

The problem you're having is you're looking at smaller, specialty products, and not at larger, more mature technologies. Wikipedia will actually give you a decent overview of most of the medium-and-large projects out there.
Cassandra, Hadoop, Mongo, and NoSQL are all lovely... but they're specialty tools. SQL is a general purpose solution that works for 99% of the sites on the net.
Unix/Linux isn't a specialty tool; you might want to try going to Ubuntu's website and installing Linux, and just using it day-to-day, the way you'd use Windows. When you need to figure out something new, like setting up a webserver, do it on the Linux box and a Windows box, and you'll eventually learn linux pretty darn well.
As far as scripting, O'Reilly makes a great line of books on Bash, Perl, and Python.
JVM is a Java Virtual Machine, which is a core of getting Java code to go. Sun's website has a great set of tutorials on learning Java.
It might be much, much easier to pick a project (or three) that you'd like to learn, and learn some of these by doing. I'd probably suggest learning some SQL before learning the newly established alternatives; that lets you learn the rest of the system, as SQL is pretty easy. Once you've got the rest of the thing solid, try swapping in a NoSQL solution at that point.

There are a lot of frameworks that do a lot of different things. You've named a lot of different things from a lot of different areas. The best way to think of these things is to group them by category. Here's an example:
Suppose you have a laptop and you want to host a website. You'll need the following at a minimum:
1) Web Server software. Two popular options are Microsoft's IIS and Apache Web Server.
That's really all you need. You can set up your www_root folder and load files into it. Assuming everything is configured properly, you can now load HTML pages into that folder and access them through your IP address. Every page you view in your web browser is in HTML format. CSS is a stylesheet language that defines how your HTML will be formatted. You can also start writing Javascript, as most modern browsers support the client-side scripting language.
Chances are you'll want the following as well:
2) Database software. Two popular options are Microsoft's SQL Server and MySQL
3) Server-side scripting. PHP is very popular, as is ASP. You'll need the runtime deployed on your server. Python, Ruby, Perl, etc all fall under this category.
4) Web Application Framework(s). This will provide you with libraries for your language of choice to help develop web applications and websites. CakePHP, Ruby on Rails, and the Google Web Toolkit are examples of web application frameworks.
Additionally, you may want to utilize:
5) Additional libraries. JQuery, for example, is quickly becoming a popular library for Javascript that handles a lot of common tasks for you. Instead of writing complex effects code and what-not yourself, just use the pre-written code in the JQuery library.
6) Data interchange technology. If you are passing a lot of information back and forth, you will likely want to encapsulate this data in a logical format. Ideally, this format would describe the data and allow your applications to easily read/process it following a standard. This is where XML and JSON come into play.
I can't recommend a good book for you to learn this stuff, but I feel that the collective replies to your question here should be more than enough to get you started.
Ultimately, what you need to do is determine what technologies you need, and then choose the right one for the job. Don't go building an application using Ruby on Rails just because it's what Twitter used, but rather choose it because it provides some advantage to you over the other options.

Related

How to develop a rapid database & web application using Sybase and Power Builder?

I am new in Sybase and Power Builder.
What are the best references and web resources to learn them in a useful and fast way ?
For now, I use http://www.sybase.com as my base reference.
Does anyone know good and practical tutorials for Power Builder V.12 .Net ?
From Where I can download a complete version of it and use it for building my application ?
I know the data window is the magical part in Power Builder and I need to know how to create and use a professional data window and how to make interaction and pass paramters between them and also how to dispaly different views like Master-Detail relationship and Tree-Sturcture or List-Structure and so on.
I would like to know the information I need to build a rapid web and database application plus customzing and editing the existing desktop application.
There is a 45 trial version of Powerbuilder which you can download from here:
http://response.sybase.com/forms/PB12Eval
Sybase's books that come with Powerbuilder are fairly comprehensive and quite a good way to get started.
http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.help.infocenter.pb.12.1/doc/html/title.html
I can't help with Powerbuilder but Rob Verschoor's Sypron.nl is the place to pick up Sybase information - there's loads in there from common "gotachas" to quizzes. His quick reference is terrific and well worth investing in.
Well, it all depends on how much experience you already have (not counting Sybase & PB).
Then there is the issue that app development and Db design are completely different disciplines: people who are good at one are rarely good at the other. Assuming you are the app developer, hire a good modeller/DBA. Product manuals are for reference only; you cannot learn how to code; put an app together; what code segments should be deployed where; best practice; etc from them.
To take even reasonable advantage of the DataWindow, you need a good Database (normalised, genuinely relational, security, etc), AND reasonable experience handling the client/server model (send SQL batch to server; receive & process result set).
You need a good PFC Library. The lib that comes with PB is fat as, and very slow. The first thing we do is strip that down, and create our own, to reduce .pbl size and increase speed.
Last but not least, a good handle on stored proc and Transaction rules. That requirement is true for any app, not just Sybase/Pb.
It sort of depends on what you want to do. PB12 comes with 2 IDEs -- PB12.NET is for creating WPF applications in .NET. PB12 Classic is for creating traditional PB applications as well as WinForm and WebForm .NET type applications.
There are some videos available on PB (some free, some paid)
Yakov Werde has a video titled "Essential PowerBuilder Series" that's about $700. You can see a free preview of it here.
Sybase has some free videos and tutorials here.
Also, there are some older PowerBuilder books like "PowerBuilder 9 Advanced Client/Server Development" that may help you. They're a little dated and don't cover any of the .NET stuff, but a lot of the basics are still the same.
The information you are looking for has always been somewhat of a challenge to find. Without all the corporate training I'm not sure how I would have learned as quickly.
If you are like me then learning by example is probably the best way. I'd go to codeplex and get yourself a working (and well designed) application to learn by example.
Also, believe it or not, I have learned a thing or two by reading the online documentation. Considering you are working with a new version of PB you've got your work cut out for you there isn't much out there. You may contact me if you have specific questions, if I have time then I would be happy to help a fellow developer.

Which web solution should I use for my project?

I'm going to create a fairly large (from my point of view anyway) web project with a friend. We will create a site with roads and other road related info.
Our calculations is that we will have around 100k items in our database. Each item will contain some information like location, name etc. (about 30 thing each). We are counting on having a few hundred thousand unique visitors per month.
The 100k items and their locations (that will be searchable) will be the main part of the page but we will also have some articles, comments, news and later on some more social functions (accounts, forums, picture uploads etc.).
We were going to use Google AppEngine to develop our project since it is really scalable and free (at least for a while). But I'm actually starting to doubt that AppEngine is right for us. It seems to be for webbapps and not sites like ours.
Which system (language/framework etc.) would you guys recommend us to use? It doesn't really mater if we know the language since before (we like learning new stuff) but it would be good if it's something that is future proof.
I think that GAE can do the job. Google claims that Google App Engine is able to handle 5 million visitors for free and you will have to start paying only if you exceed their free quota.
It's also pretty easy to get started. If you don't have experience on administrating websites and choose a regular hosting service, you will have to worry about several things that you don't even imagine now.
My only concern would be with respect of the kind of data and queries you will have to do, since it does not have a relational database. Anyway, there is an open source project for GAE, called GeoModel that gives GAE the ability to do complex geo spacial queries, like proximity fetch. Have a look at their tutorial and the demo app.
About your impression that GAE was intended only for small web apps, there are a couple of CMS that run on it.
Good luck!
If once of your concerns is scalability, and you don't want to depend on expensive or commercial tools, I would recommend that you take a look at this tech stack:
Erlang - A programming language designed for concurrency and distribution.
Nitrogen - An Erlang web framework with a lot of cool stuff, like transparent AJAX.
NoSQL scalable databases, such as CouchDB or Riak - Save the the hassle of SQL code and are more scalable than plain MySQL. Both has direct native Erlang API.
To be honest, I don't know if this tool set is your cup of tea; These are not mainstream solutions. I just suggest these to everyone who ask about size-sensitive web applications.
All serious web frameworks will provide you with what you need. The real issues (for example scalability) might be tackled in a different way depending on what you use, but you wont be limited if you choose a well-known one. The choice of database system might be more important for that (sql vs nosql), even if both of those will do fine too.
It's all about
knowing how to use
enjoying to use
the tool(s) you've chosen.
In either case, name-dropping some suggestions:
Rails (Ruby)
Django (Python)
Nitrogen (Erlang)
ASP.NET MVC (C#)
And please note, if you really want to learn everything from the bottom, you'd be fine with any of these (or one of the other gazillion out there). But if you want to perform your best, choose one that supports a language you know well or uses techniques/tools you have experience of etc. Think twice about how you value this is fun and we learn a lot against we want to be productive and do a really good job.

Pointers towards developing a quick and dirty business app

Some people have approached me lately about creating a business app for them (I'm a computer tech student specializing in programming, with a bit of experience in systems and driver programming) and it does sound simple, but I don't really have much of an idea how or where to start.
It should be a small-ish app with a database backend. Basically keeping track of invoices, clients, products and the attached data.
Are there any APIs that would make creating such an application much faster and easier? Platform isn't really an issue. I have a Mac, a Windows PC, and I am somewhat well-versed in linux in general, and the client will move to a platform of my choice.
I know very little MySQL, I know Objective C, C and a few others, but building a database product this way seems like a very complicated endeavour considering that a large amount of the code I'll be writing has probably been written before and by better programmers than I.
EDIT : If possible, I would definitely like not having to play around with web frameworks. This is not to say I don't want to see them, it's just that I'm not used at all to the web development model.
I would suggest that you look into Ruby on Rails for soemthing like this. It will take care of a lot of the low level details of database access for you and because it is built around the Model-View-Controller paradigm, it will take away some of the architectural decision from you and make you focus on getting the app done. Using Ruby on Rails, I've built a couple of sites of smallish scale that sound like what you have done in no time at all.
For quick and dirty, I suggest Ruby on Rails (if you fancy a bit of Ruby), or Grails (if you fancy a bit of Java/Groovy, and is essentially the Java platform equivalent of RoR).

High performance site

What technologies should I use when designing for a large social website (with a lot of transactions, like twitter)? using open source solutions
- database
- webserver
- os
Twitter uses Ruby-on-rails and Scala
Facebook uses PHP
StackOverflow uses asp.net mvc
As you can see, it doesn't really matter what you choose; all of these sites have lots of traffic, but are based on very different technologies.
What matters most in a social networking sites is the backend, since most of the bottleneck will be from there. You might want to consider No-SQL databases.
Facebook and Twitter use Cassandra
LinkedIn uses Voldemort
There are a few others like:
Hypertable
MongoDB, used by Sourceforge.
CouchDB
As for the programming language, as others have said, it does not matter that much. But if you really can not decide, you might want to consider a non-blocking webserver like Tornado.
Doesn't matter what kind of scripting language you'll choose, as long as you'll heavily utilize memcached. Having the right caching hierarchy is a must.
At the end of the day, this is a matter of personal preference. Twitter uses Ruby on Rails. Wikipedia runs on PHP. Reddit uses a Python library called web.py, but intitially, it was written in Lisp. I would say pick the technologies you are most familiar with.
A good book on optimizing for high performance websites from the Yahoo engineers is High Performance Web Sites: Essential Knowledge for Front-End Engineers. It is nice and short and basically a bulleted guide on the steps to take to make websites faster by optimizing the less well explored front-end.
As Joel says
People all over the world are constantly building web applications using .NET, using Java, and using PHP all the time. None of them are failing because of the choice of technology.
Choose whichever of the "big 3" (.Net, Java or PHP) that you know best - these technologies are known to be scalable, the real question of whether or not your site will scale is how the site is structured and the quality of the code - using whichever framework you are most familiar with gives you the best chance of achieving that.
Any technologies that suite your taste, In your situation I think algorithms is more important.
Technologies, techniques ,
research what other scaled sites have used and done and what the problems they had were less than he successes, there are podcasts on iTunes, talks and interviews on Youtube
look at industry best practices and follow them to a degree
don't take peoples word for it, make sure you see the problem or the success as opposed to the pr glitz about it
avoid obvious things that do not scale vertically or horizontally, database connectivity, sessions - cookies and the like
look at nosql storage as an sql alternative less overhead but less functionality
take care when looking at the language/framework. frameworks come with lots of baggage you do not need, they speed you up initially and slow you down eventually, i.e. you spend more time hacking the framework than building the site, same with languages does it do what you want rather than be trendy, cool to programme in etc.
If you are building something like Facebook, then your choices are a little limited, Facebook made their own PHP Runtime, check HipHop For PHP

Advice on platforms/frameworks/languages/etc for a new project

I know this is not a programming question per se, but I wanted to get as much input from the SO community on a new project I hope to get started. The project is from being started from scratch and thus every decision for programming languages, databases, frameworks, platforms and what not are up in the air. I'm hoping to get your opinion on the matter, what you feel are the strengths and weaknesses of each option.
Database:
Currently I have the option of using MSSQL or MySQL. While I am leaning towards using MySQL because it is free and most probably has all the features I need. However, there is the possibility of having a lot of hierarchical data and the new hierarchical data type in MSSQL is quite appealing. Does it really simplify matters that much? Also MSSQL supports many more advanced SQL functions that may or may not be useful in the long run. While for development I can get access to Server 2008, multiple licenses as the development team grows and for production, are the costs justified?
Programming Languages:
The project will have a web based front end UI and a server based component that will do some heavy lifting.
For the web based UI, I was thinking of maybe doing Apache/IIS with PHP or IIS with ASP.Net in C#. I'd like to use a good framework to properly utilize good design patterns that should structure the code and development of the app. As well as make modifications in the long run easy to implement. I also want the GUI to look good and don't like the idea of buying .Net controls from component vendors. Instead I prefer the idea of using good CSS, and open sources like YUI and javascript to make the UI sleek.
For the server based component, I was thinking of using C#. I have no real development experience in C++ and I'd like good libraries and sufficient speed is good enough. However, while the web based UI and server based component is loosely coupled, there may be instances where the UI needs to communicate (call methods and what not) with the server based component and I want to pick languages/frameworks that will play nice with each other.
All suggestions on frameworks to incorporate are welcome.
Version Control:
I have had good experiences with SVN and a pretty bad experiences with TFS. I've never worked with GIT. Which do you think is better in terms of features as well as general developer familiarity. I want to pick something that other developers will know and not have trouble with.
I apologize if the questions are bit redundant or I'm not providing enough information or using bad terminology. I plan to edit and improve the question as I get feedback. Thanks!
EDIT:
Who: This would most probably be a startup formed of college students or junior developers. I want the project to utilize technologies that most people are familiar with or are easy to pick up.
What: I'd need hours and days to explain the solution. But in the end when you break it down, its a web based UI (think standard web app to just manage database data) that would be used to knowledgeable clients. The server based component would be very separate except for the fact that it should be able to communicate with the web app.
I can provide more information as required but I would appreciate an opportunity for users to answer and provide their ideas before you hastily close the question.
Obviously it depends a lot on specific requirements, but then again, even with those I probably wouldn't be able to tell for sure!
I've been working on a from-scratch project myself for a couple of months, and have generally found:
Choosing Microsoft for all the layers just goes down much easier (my subjective opinion). For example I would use C# for the UI, the back end, and use MSSQL for the database. Nothing at all wrong with non-Microsoft vendors, I'm no Microsoft fan-boy, I just struggle to get productive with unfamiliar tools. Depends where your experience lies though.
Database: In particular I've found that .NET and MSSQL go easily together. When I started the project I was using a PostgreSQL (because it's free, fully featured and has open-source warm fuzzies). However I abandoned it in favour of MSSQL simply because it was taking me too long to get database work done in an unfamiliar language with unfamiliar tools. Also, I'm not sure MSSQL is so expensive anymore, for example for a web application, MSSQL 2008 Web Edition is pretty damn cheap per-processor I think (only on SPLA licensing though). If you're concerned about database features in a free implementation though, personally I think PostgreSQL has a very full feature set, nicely standardised, and rapidly growing.
UI: I'm pretty inexperienced, but ASP.NET MVC looks far less painful to me than ASP.NET Web Forms. I like PHP too, but again I'd match the UI language with the back-end language, so would recommend .NET.
On frameworks, I'm immersed in DALs at the moment. I like Subsonic for lightweight data, NHibernate for heavy-weight.
I still have a long way to go with my project so perhaps I can only see the short-term benefits and drawbacks at the moment. But in general I would say: use the technologies that you're most comfortable using, as you'll be way more productive and the end result will probably be about the same anyway. If you want to learn new technologies though, and who doesn't? - go ahead, just expect it to take a lot longer.
Didn't want to answer 'cause it's so open ended. But a few points:
Money
First, check out BizSpark. That should take care of any money aspect for 3 years. For a service company, that means not only free VS Team Suite and Office and so on, but free Windows, SQL, etc. If your startup can't afford to spend a bit on MS tech in 3 years, it's probably a bad business. So that takes out licensing.
On a similar note, Sun has Startup Essentials. Could be interesting on the hardware side of things, but I haven't actually competitively priced them versus Dell/HP.
Software
It doesn't sound like you have hard enough requirements to say "oh, this slightly-less-popular software X is perfect for my domain Y and is gonna give me a very big boost". In fact, your project might not be like that at all. Maybe it, technically, is going to be a relatively plain application just pushing data around or whatever. You didn't specify.
For a small startup, personal productivity is probably going to trump any other argument. If your people are excellent in X, then that's one of your top arguments right there.
If you really don't have any particular system you're most comfortable with, be conservative. Stick with .NET or Java, as they'll give you the widest range of useful possibilities.
As far as things like OS and Database, I'm biased, but I think Microsoft will give you platforms that are easier to take advantage of than you'll find elsewhere. For instance, setting up load balancing, clustering, centralized authentication, managing servers (updates, events, etc.) is going to be easier to get going on Windows than it would be on another platform, assuming you're not an expert in either. Configuring SQL Server, even the advanced features, is a piece of cake. (Go time someone who knows neither: Setup a DB mirror in MSSQL and MySQL -- which is going to take more work?) Again, this is all predicated on you not having experts in a particular set of technology.
Don't mix -- whatever you do, stick with the platform. If you go .NET, MSSQL is going to work better with the data providers (or things like Linq-to-SQL). If you decide to do PHP, then use MySQL as everyone else uses it and you'll encounter less resistance. If you're not inventing stuff on the technical side, don't become an edge case.
You should pick the platform first, then the language that is best for that platform (if there is any choice).
One thing you should consider is the labor pool, and labor pool cost, for specific platforms and languages. Human Resources can often get cost metrics, if you don't have ideas already.
In my town, for example, .NET platform is much more expensive per Software Engineer than open source, because the .NET developers have a higher rate (40% roughly). C# is a little higher rate than VB.NET, but also tends to bring more well rounded candidates.
Just to throw in something totally different: How about weblocks as a web framework? It uses Hunchentoot as a server, which can run either standalone or with Apache. This is all done in Common Lisp. Weblocks can use cl-sql as a backend store, which can connect to many different RDBMs (MySQL, PostgreSQL, Oracle, ODBC, SQLite).

Resources