Database schema for a site like SO? - database

Since I took the basic undergrad course in databases design and SQL I haven't really touched anything like this.
So my question is - how would the database schema for a site like this one would usually look like? What are you generally expected to find? For instance, how are questions and answers stored?
Are there some tools which allow you to design it? or is it just something the devs come up with?

Stack Overflow used the MediaWiki database schema as a template before they went through a database refactoring a few weeks back.

There are few principles in database design which im sure you pretty much know if you took that course. They all say basically that your data should not be duplicated across multiple tables and all your columns should be integral to the table they appear in. Then there are common entities aplied in all software design like objects and relations. Managing them is easiest part because it comes instinctively. Then there is optimization for scalability and performance which shouldn't be hard if first steps are done right. And this is usually done together with software team which is writing code for your db.

You can use tools to design a database, but they are normally just templates for creating the right shapes in a diagram.
The logical structure will be designed in the same way arn architect would design a building, using their best knowledge and experiences.
Also, always "work in pencil" until you are happy.

Related

Choosing a column db

I am trying to evaluate and use column db for my application.
I have evaluated InfiniDb and InfoBright. Can you suggest some other Column DB's
Your question might suggest that column-store database are NoSQL database.
Those are two different kind of database / way of representing datas that should not be confused.
I do personally use LucidDB that works like a charm. I use it for Business Intelligence use, and LucidDB is optimized for Business Intelligence. One of the activest member of the community Nicholas Goodman , someone really influent in the world of Open Source BI.
For this use, it is really well though and I would recommend it for this use.
There's a good list on wikipedia

Basic Database Question?

I am intrested to know a little bit more about databases then i currently know. I know how to setup a database backend for any webapp that i happen to be creating but that is all. For example if i was creating three different apps i would simply create three different databases and then configure each database for the particular app. This is all simple knowledge and i would now like to have a deeper understanding of how databases actually work.
Lets say that I developed an application for example that needed lot of space and processing power.This database would then have to be spread over numerous machines. How exactly would a database be spread across numerous machines and still be able to write records and then retreieve them. Would each table get their own machine and what software is needed to make sure that the different machines have all performed their transactions successfully.
As you can see i am quite a database ignoramus lol.
Any help in clearing this up would be greatly appreciated.
I don't know what RDBMS you're using but I have two book suggestions.
For theory (which should come first, in my opinion): Database in Depth: Relational Theory for Practitioners
For implementation: High Performance MySQL: Optimization, Backups, Replication, and More
I own both these books and they are both pretty great, especially the first one.
That's quite a broad topic... You might want to start with Multi-master replication, High-availability clustering and Massively parallel processing.
If you want to know about how to keep databases running with ever increasing load, then it's not a basic question. Several well known web companies are struggling to find the right way to make their database scalable.
Using memcached to cache database information is one way to decrease load on your database if your application is read-intensive. If you application is write-intensive then may be you would want to consider using a NOSQL datastore like MongoDB or Redis.
Database Design for Mere Mortals
This is the best book about the subject if you don't have any experience with databases. It's got historical background and practical examples. Most books often skip the historical stuff because they assume you know what a db is, or it doesn't matter, and jump right to the practical. This book gives you the complete picture.

Best way to design database tables with interbase?

I am about to start redoing a company database in a proper fashion. Our current database is a mess and has little to no documentation. I was wondering what people recommend to use when designing an Interbase database? Is there some sort of good visual schema designer that will generate the SQL? Is it better to do it all by hand?
Basically, what are the steps people usually take when designing and documenting a database? If it matters, I intend to use Hibernate as an ORM for the database. (Specific tips with Interbase would be appreciated as well).
thanks!
Usually, I use a text editor. Occasionally, I use Database Workbench. Last I heard, Embarcadero was going to add InterBase support to some of their database modeling tools, but I don't know if that has shipped yet.
If this is a brand new database app that has recently been created and is not being relied on in the business, then I say full steam ahead with your re-write / new database. I suspect however that you are dealing with a database that has been around for a few years and is heavily used.
If I am right about the database being a few years old, I strongly recommend against starting from scratch. Almost any production database that has been around a few years will be "messy". This is usually because the real world requirements for programs usually demand the solutions to be somewhat messy. This will be true of your brand new database (should you go this route) a few years from now as well.
Here are some reasons I would not recreate a production database from scratch:
The live database contains years worth of transactions and customer data that is very valuable. It will be very difficult to transfer this data into a completely different database structure. Believe me, even if the company tells you now they will not need to access this old data, they will.
Many business rules have probably been built into the database structure, in the form of defaults, triggers, stored procedures, even the data types of the columns, and without examining these very carefully and documenting them, you are likely to leave them out of your new database and spend much time debugging and adding these in when people start using the system and discover the rules are not being applier properly
You are liable to make mistakes in your new database design, or realise later that the structure needs to change to accommodate a new feature. If you have been making changes to your current database, and learning from that, future changes become easier and more intuitive.
Here is the approach I recommend:
Understand and document the current database, which will give you a really good understanding of the information flows in your business.
When you see what appears to be bad or messy design, look at it carefully. You may be right, and see potential for change, or you might find a trade off was made for performance or other reasons, and you can learn from this.
Make incremental improvements to the database structure, being sure to update documentation, alter the programs that rely on those areas (or work with your programmer if that isn't you).
I know this seems like a very long way around, but take it from someone who has been maintaining and creating databases for 12 years now - your current database is probably messy because the real-world requirements are messy.

Why can application developers do datasebase stuff but database developers try to stay clear of application stuff?

In my experience, this has been a contentious issue between "backend" (database developer) and "frontend" guys (application developer, client and server side).
There have been many heated pub discussions on this subject.
I just want to know is it just people have different mindsets, or lazy to learn more and feel comfortable in what they know, or something else.
I might re-phrase the question: why do (some) application developers think they can do "database stuff" without actually bothering to understand it properly? Whereas database developers do not (in general) assume they can write a good application without some training and experience!
It is about levels of abstraction. A database is the lowest level of abstraction in a typical business application (software-wise). It is much more likely that a developer working on an outer layer of the abstraction would have knowledge of an inner layer than a developer in an inner layer would know about the outer layer.
This is because inner layers of abstraction best perform when they are ignorant of the outer layers who depend on them.
So a designer in the presentation layer of a website may know a bit about the server-side code they depend on because they interact with it. But the developer working on the server does not need to know anything about design at all.
I would say it's on a need to know basis. Applications developers often need to know how to connect to databases, add records, delete records etc... This is taken further with new technologies such as LINQ where developers can write database queries within their actual code.
Database developers on the other hand only really need to know how to write database queries as that is their job and probably won't need to worry about the code at application level.
Because programmers very often must understand and interact with databases to do their job, but DBAs very often don't need to do any programming (outside of the DBMS) to do their jobs.
I believe it stems from the fact that programming in sql looks easy, and to get started you have to have a small amount of knowledge (Really for a programmer to learn SELECT * FROM Table is pretty easy). Application programming is not the same way. It becomes very complex in a small amount of time, and that discourages a lot of people. Now I am not saying that database people are any less intelligent it is just what they do looks easier than building applications.
If you develop applications, then the chances are, that sooner or later, you'll have to connect an app to a back-end.
The opposite is not as true.
I think it stems from necessity. If you consider the roles of each person, a programmer needs to to database related stuff far more than database workers need to do programming tasks.
From my experience, having developed both "databases" and "applications" (following your nomenclature...), I guess there's a big difference in state management.
Properly designed databases are always in a "clean" state, and every transaction keeps this consistency. So when developing a database, you have to very clearly specify your data abstractions into tables and which updates are legal and so on.
I've found that most application developers (myself included :)) do a very sloppy job in keeping this consistent state in the application. Any non-trivial interface has many more possible states to manage than a modest database, and it's not as easy to make sure it's always in a clean state. It's also harder to analyze every possible sequence of steps that users will perform.
From my experience, the application developers don't do all the database stuff. Consider all the administration that is related to the databse, backups, replication, etc.
A typical DBA (at least on most of the projects I've been involved to) takes care about everything that is related to project databases - all administration, cooperates with application developers on performance tuning, gives advices about SQL used by the app, does some of the stored procs coding, creates (or, at least reviews and consults) physical DB designs, etc.
So, aren't the database guys "lazy", or "fine with what they already know" just from an application developer's perspective? I'm an app developer myself and there is a whole lot of things that I just don't know about the DBs we're using on our projects.
Part of my education ensured I got a decent understanding of how Databases work. I went into the field expecting to do database work, and a lot of it. I'm a web app guy; it comes with the territory I guess.
My two jobs as a developer have been at two shops that would best be described as tiny (2 people myself included, and then just me) and tiny (3 developers, briefly having a fourth). I have not observed an immediate business need for, nor worked anywhere that had the resources to employ a dedicated DB guy. I can envision some scenarios where that would change (including a new job :P).
As to the rest, I agree that abstraction is also a factor and as developers we're way up on top/outside looking in. I can't imagine doing web app development without DB skills, and I consider Sql/DB Management to be both an important tool and an area I need to stay sharp in.
I'll add that I treat the database side as its own field. There's skills that translate between the two, but there's a lot of specialized knowledge I need to acquire to get better at it, and that being a good programmer doesn't necessarily mean I'm doing a good job on the back end either (fortunately, I'm not a good programmer ;) ). Also, I'm pretty sure that's what she said.
2 reasons:
DB Vendors facilitate bad SQL, and
SQL is hidden from view while
application UI is front and center.
Most naive developers think SQL is a procedural language and write it as such because vendors ensure that the tools exist so that they can do so. DBAs know that good SQL is set-oriented and has optimization principles that are totally different from those involved in application programming.
The visibility aspect makes it so the application developers can write bad SQL against a database and get it to perform in a marginal way, and no one ever sees quite how bad it is. When a DBA writes an application, there are immediate critiques on its appearance and behavior because it's directly visible to the end user.
Good question. Actually why developers do Database Stuff because where no dedicated Database guys then developers have to do that. But a company have Database Guys also have Development guys.
:) what is your idea ?

Defining the database schema in the application or in the database?

I know that the title might sound a little contradictory, but what I'm asking is with regards to ORM frameworks (SQLAlchemy in this case, but I suppose this would apply to any of them) that allow you to define your schema within your application.
Is it better to change the database schema directly and then update the column types in your program manually, or does it make more sense to define the tables in your application and then use the ORM framework's table generation functions to make the schema and then build the tables on the database side for you?
Bear in mind that applications and databases tend to live in a M:M relationship in any but the most trivial cases. If your application is at all likely to have interfaces to other systems, reports, data extracts or loads, or data migrated onto or off it from another system then the database has more than one stakeholder.
Be nice to the other stakeholders in your application. Take the time and get the schema right and put some thought into data quality in the design of your application. Keep an eye on anyone else using the application and make sure you don't break bits of the schema that they depend on without telling them. This means that the database has a life of its own to a greater or lesser extent. The more integration, the more independent the database.
Of course, if nobody else uses or cares about the data, feel free to ignore my advice.
My personal belief is that you should design the database on its own merits. The database is the best place to handle things modeling your Domain data. The database is also the biggest source of slow down in applications and letting your ORM design your database seems like a bad idea to me. :)
Of course, I've only got a couple of big projects behind me. I'm still learning daily. :)
The best way to define your database schema is to start with modeling your application domain (domain driven design anyone?) and seeing what tables take shape based on the domain objects you define.
I think this is the best way because really the database is simply a place to persist information from the application, it should never lead the design. It's not the only place to persist information as well. We have users that want to work from flat files or the database for instance. They could also use XML files. So by starting with your domain objects and then generating tables (or flat file or XML schema or whatever) from there will lead to a much better design in the end.
While this may depend on you using an object-oriented language, using an ORM tool like Hibernate/NHibernate, SubSonic, etc. can really make this transition easy for you up to, and including generating the database creation scripts.
In reference to performance, performance should be one of the last things you look at in an application, it should never drive the design. After you get a good schema up and running based on your domain you can always make tweaks to improve its performance.
Alot depends on your skill level with the specific database product that you're going to use. Think of it as the difference between a "manual" and "automatic" transmission car. ORMs provide you with that "automatic" transmission, just start designing your classes, and let the ORM worry about getting it stored into the database somehow.
Sounds good. The problem with most ORMs is that in their quest to be PI "persistence ignorant", they often don't take advantage of specific database features that can provide elegant solutions for a given task. Notice, I didn't say ALL ORMs, just most.
My take is to design the conceptual data model first yourself. Then you can go in either direction, up towards the application space, or down towards the physical database. But remember, only YOU know if it's more advantageous to use a view instead of a table, should you normalize or de-normalize a table, what non-clustered index(es) make sense with this table, is a natural or surrogate key more appropriate for this table, etc... Of course, if you feel that these questions are beyond your grasp, then let the ORM help you out.
One more thing, you really need to seperate the application design from the database design. They are almost never the same. How important is that data? Could another application be designed to use that data? It's a lot easier to refactor an application than it is to refactor a database with a billion rows of data spread across thousands of tables.
Well, if you can get away with it, doing it in the application is probably the best way. Since it's a perfect example of the DRY principle.
Having said that however, getting away with it is always going to be hard to pull off since you're practically choosing to give up most database specific optimizations. (more so, with querying, but it still applies to schemas (indexes, etc)).
You'll probably end up changing the schema by hand anyway, and then you'll be stuck with a brittle database schema that's going to be the source of your worst nightmares :)
My 2 Cents
Design each based on their own requirements as much as possible. Trying to keep them in too rigid sync is a good illustration of increased coupling/decreased cohesion.
Come to think of it, ORMs can easily be used to spread coupling (even though it can be avoided to some degree).

Resources