BALD-D battle Against Bad Database Design

BALD-D battle Against Bad Database Design - database

I'm no DBA, but I respect database theory. Isn't adding columns like isDeleted and sequenceOrder bad database practice?

That depends. Being able to soft-delete a tuple (i.e., mark it as deleted rather then actually deleting it) is essential if there's any need to later access that tuple (e.g., to count deleted things, or do some type of historical analysis). It also has the possible benefit, depending on how indexes are structured, to cause less of a disk traffic hit when soft-deleting a row (by having to touch fewer indexes). The downside is that the application takes on responsibility for managing foreign keys to soft-deleting things.
If soft deleting is done for performance, a periodic (e.g., nightly, weekly) tasks can clean soft-deleted tuples out during a low-traffic period.
Using an explicit 'sequence order' for some tuples is useful in several cases, esp. when it's not possible or wise to depend on some other field (e.g., ids, which app developers are trained not to trust) to order things that need to be ordered in some specific way for business reasons.

IsDeleted columns have two purposes.
To hide a record from users instead of deleting it, thus retaining the record in the database for later use.
To provide a two-stage delete process, where one user marks a record for deletion, and another user confirms.
Not sure what SequenceOrder is about. Do you have a specific application in mind?

Absolutely not. Each database has different requirements, and based on those requirements, you may need columns such as those.
An example for isDeleted could be if you want to allow the user interface to delete unneeded things, but retain them in the database for auditing or reporting purposes. Or if you have incredibly large datasets, deleting is a very slow operation and may not be possible to perform in real-time. In this case, you can mark it deleted, and run a batch clean-up periodically.
An example for sequenceOrder is to enable arbitrary sorting of database rows in the UI, without relying on intrinsic database order, or sequental insertion. If you insert rows in order, you can usually get them out of order..until people start deleting and inserting new rows.

SequenceOrder doesn't sound great (although you've given no background at all), but I've used columns like IsDeleted for soft deletions all my career.

Since you explicitly state that you're interested in the theoretical perspective, here goes :
At the level of the LOGICAL design, it is almost by necessity a bad idea to have a boolean attribute in a table (btw the theory's correct term for this is "relvar", not "table"). The reason being that having a boolean attribute makes it very awkward to define/document the meaning (relational theory names this the "Predicate") that the relvar has in your system. If you include the boolean attribute, then the predicate defining such a relvar's meaning would have to include some construct like "... and it is -BOOLEANATTRIBUTENAME here- that this tuple has been deleted.". That is awkward circumlocution.
At the logical design level, you should have two distinct tables, one for the non-deleted rows, and one for the deleted-rows-that-someone-might-still-be-interested-in.
At the PHYSICAL desing level, things may be different. If you have a lot of delete-and-undelete, or even just a lot of delete activity, then physically having two distinct tables is likely to be a bad idea. One table with a boolean attribute that acts as a "distinguishing key" between the two logical tables might indeed be better. If otoh, you have a lot of query activity that only needs the non-deleted ones, and the volume of deleted ones is usually large in comparison to the non-deleted ones, it might be better to keep them apart physically too (and bite the bullet about the probably worse update performance you'll get - if that were noticeable).
But you said you were interested in the theoretical perspective, and theory (well, as far as I know it) has actually very little to say about matters of physical design.
Wrt the sequenceOrder column, that really depends on the particular situation. I guess that most of the time, you wouldn't need them, because ordering of items as required by the business is most likely to be on "meaningful" data. But I could imagine sequenceOrder columns getting used to mimick insertion timestamps and the like.

Backing up what others have said, both can have their place.
In our CRM system I have an isDeleted - like field in our customer table so that we can hide customers we are no longer servicing while leaving all the information about them in the database. We can easily restore deleted customers and we can strictly enforce referential integrity. Otherwise, what happens when you delete a customer but do not want to delete all records of the work you have done for them? Do you leave references to the customer dangling?
SequenceOrder, again, is useful to allow user-defined ordering. I don't think I use it anywhere, but suppose you had to list say your five favorite foods in order. Five tasks to complete in the order they need to be completed. Etc.

Others have adequately tackled isDeleted.
Regarding sequenceOrder, business rules frequently require lists to be in an order that may not be determined by the actual data.
Consider a table of Priority statuses. You might have rows for High, Low, and Medium. Ordering the the description will give you either High, Low, Medium or Medium, Low, High.
Obviously that order does not give information about the relationship that exists between the three records. Instead you would need a sequenceOrder field so that it makes sense. So that you end up with [1] High, [2] Medium, [3] Low; or the reverse.
Not only does this help with human readability, but system processes can now give appropriate weight to each one.

Related

Should I put optional record properties in a separate table?

I have a table of about 1,000 records. Around half of them will utilise a set of fields containing certain characteristics. There's about 10 relevant fields. The other half of the records won't need that information filled in.
This table is central to the database and will be taking the bulk of the operations. Though at only around 1,000 records, it's not much.
The hardware that the database is stored on is old and slow (spinning hard drive not SSD... ) so I want to have a fairly optimised structure to make the most of it. Obviously the increased size of the database alone due to the blank fields isn't a major concern, but if it's slowing down queries then that's not good.
I guess I should describe the setup. Currently Access 2007 client and Access backend, though the backend will soon move to SQL server. Currently the backend is on the main server rack, but when moved to SQL Server it will get its own older server rack.
So should I make a separate table to store the aforementioned set of characteristics, or should I leave it as is?

The querying overhead of putting the optional fields into a separate table and then using a join doesn't provide much benefit to size or data managment. Especially if it's 1-to-1 like in your example. For size, the optional fields will NULL don't affect you much. And yes, 75% is good random threshold for when you should start moving things out but even then, you're not actually normalizing anything by moving out the optional fields (if they are 1-to-1 with the record and you will always be fetching it along with the main record).
Worth noting: With most DBs, getting large rows in single queries is better than several small queries...in case you later have the urge to get the optional data in the 2nd table in a separate query. In Access 2007 this may matter less though.
And regardless of whether or not you move those optional fields out, add indexes for those fields which you may use in a where/having/join.

My impression from what you've said is that you should use separate tables. The dependencies you want to represent and the needs of data integrity ("business rules") should determine which table(s) any attribute goes in.
In your case it sounds like you have two kinds of facts to be represented. Those fact types have distinct sets of attributes and therefore they belong in different tables. If you combine two different fact types into one table and make one set of attributes nullable then you could compromise data integrity: i.e. by permitting values for some attribute when the business rules require no such value and by allowing a value to be absent when business rules in fact require it.
For a more formal way of answering this, see Fifth Normal Form and the Principle of Orthogonal Design. If you aren't already aware of those design principles then you should familiarise yourself with them.

Vertical partitioning makes sense for a large data set to make the cache more efficient. 1000 rows doesn't qualify as "large" even on a rather old hardware.
So unless there are other reasons to redesign this table (you didn't merge lookups didn't you?), you are good to go.

database row/ record pointers

I don't know the correct words for what I'm trying to find out about and as such having a hard time googling.
I want to know whether its possible with databases (technology independent but would be interested to hear whether its possible with Oracle, MySQL and Postgres) to point to specific rows instead of executing my query again.
So I might initially execute a query find some rows of interest and then wish to avoid searching for them again by having a list of pointers or some other metadata which indicates the location on a database which I can go to straight away the next time I want those results.
I realise there is caching on databases, but I want to keep these "pointers" else where and as such caching doesn't ultimately solve this problem. Is this just an index and I store the index and look up by this? most of my current tables don't have indexes and I don't want the speed decrease that sometimes comes with indexes.
So whats the magic term I've been trying to put into google?
Cheers

In Oracle it is called ROWID. It identifies the file, the block number, and the row number in that block. I can't say that what you are describing is a good idea, but this might at least get you started looking in the right direction.
Check here for more info: http://www.orafaq.com/wiki/ROWID.
By the way, the "speed decrease that comes with indexes" that you are afraid of is only relevant if you do more inserts and updates than reads. Indexes only speed up reads, so if the read ratio is high, you might not have an issue and an index might be your best solution.

most of my current tables don't have
indexes and I don't want the speed
decrease that sometimes comes with
indexes.
And you also don't want the speed increase which usually comes with indexes but you want to hand-roll a bespoke pseudo-cache instead?
I'm not being snarky here, this is a serious point. Database designers have expended a great deal of skill and energy into optimizing their products. Wouldn't it be more sensible to learn how to take advantage of their efforts rather re-implementing some core features?

In general, the best way to handle this sort of requirement is to use the primary key (or in fact any convenient, compact unique identifier) as the 'pointer', and rely on the indexed lookup to be swift - which it usually will be.
You can use ROWID in more DBMS than just Oracle, but it generally isn't recommended for a variety or reasons. If you succumb to the 'every table has an autoincrement column' school of database design, then you can record the autoincrement column values as the identifiers.
You should have at least one index on (almost) all of your tables - that index will be for the primary key. The exception might be for a table so small that it fits in memory easily and won't be updated and will be used enough not to be evicted from memory. Then an index might be a distraction; however, such tables are typically seldom updated so the index won't harm anything, and the optimizer will ignore it if the index doesn't help (and it may not).
You may also have auxilliary indexes. In a system where most of the activity is reading the data, you may want to erro on the side of having more indexes rather than fewer, because access time is most critical. If your system was update intensive, then you would go with fewer indexes because there is a cost associated with updating indexes when data is added, removed or updated. Clearly, you need to design the indexes to work well with the queries that your users actually perform (or your applications perform).

You may also be interested in cursors. (Note that the index debate is still valid with cursors.)
Wikipedia definition here.

Access database performance

For a few different reasons one of my projects is hosted on a shared hosting server
and developed in asp.Net/C# with access databases (Not a choice so don't laugh at this limitation, it's not from me).
Most of my queries are on the last few records of the databases they are querying.
My question is in 2 parts:
1- Is the order of the records in the database only visual or is there an actual difference internally. More specifically, the reason I ask is that the way it is currently designed all records (for all databases in this project) are ordered by a row identifying key (which is an auto number field) ascending but since over 80% of my queries will be querying fields that should be towards the end of the table would it increase the query performance if I set the table to showing the most recent record at the top instead of at the end?
2- Are there any other performance tuning that can be done to help with access tables?
"Access" and "performance" is an euphemism but the database type wasn't a choice
and so far it hasn't proven to be a big problem but if I can help the performance
I would sure like to do whatever I can.
Thanks.
Edit:
No, I'm not currently experiencing issues with my current setup, just trying to look forward and optimize everything.
Yes, I do have indexes and have a primary key (automatically indexes) on the unique record identifier for each of my tables. I definitely should have mentioned that.
You're all saying the same thing, I'm already doing all that can be done for access performance. I'll give the question "accepted answer" to the one that was the fastest to answer.
Thanks everyone.

As far as I know...
1 - That change would just be visual. There'd be no impact.
2 - Make sure your fields are indexed. If the fields you are querying on are unique, then make sure you make the fields a unique key.

Yes there is an actual order to the records in the database. Setting the defaults on the table preference isn't going to change that.
I would ensure there are indexes on all your where clause columns. This is a rule of thumb. It would rarely be optimal, but you would have to do workload testing against different database setups to prove the most optimal solution.
I work daily with legacy access system that can be reasonably fast with concurrent users, but only for smallish number of users.

You can use indexes on the fields you search for (aren't you already?).
http://www.google.com.br/search?q=microsoft+access+indexes

The order is most likely not the problem. Besides, I don't think you can really change it in Access anyway.
What is important is how you are accessing those records. Are you accessing them directly by the record ID? Whatever criteria you use to find the data you need, you should have an appropriate index defined.
By default, there will only be an index on the primary key column, so if you're using any other column (or combination of columns), you should create one or more indexes.
Don't just create an index on every column though. More indexes means Access will need to maintain them all when a new record is inserted or updated, which makes it slower.
Here's one article about indexes in Access.

Have a look at the field or fields you're using to query your data and make sure you have an index on those fields. If it's the same as SQL server you won't need to include the primary key in the index (assuming it's clustering on this) as it's included by default.
If you're running queries on a small sub-set of fields you could get your index to be a 'covering' index by including all the fields required, there's a space trade-off here, so I really only recommend it for 5 fields or less, depending on your requirements.

Are you actually experiencing a performance problem now or is this just a general optimization question? Also from your post it sounds like you are talking about a db with 1 table, is that accurate? If you are already experiencing a problem and you are dealing with concurrent access, some answers might be:
1) indexing fields used in where clauses (mentioned already)
2) Splitting tables. For example, if only 80% of your table rows are not accessed (as implied in your question), create an archive table for older records. Or, if the bulk of your performance hits are from reads (complicated reports) and you don't want to impinge on performance for people adding records, create a separate reporting table structure and query off of that.
3) If this is a reporting scenario, all queries are similar or the same, concurrency is somewhat high (very relative number given Access) and the data is not extremely volatile, consider persisting the data to a file that can be periodically updated, thus offloading the querying workload from the Access engine.

In regard to table order, Jet/ACE writes the actual table date in PK order. If you want a different order, change the PK.
But this oughtn't be a significant issue.
Indexes on the fields other than the PK that you sort on should make sorting pretty fast. I have apps with 100s of thousands of records that return subsets of data in non-PK sorted order more-or-less instantaneously.
I think you're engaging in "premature optimization," worrying about something before you actually have an issue.
The only circumstances in which I think you'd have a performance problem is if you had a table of 100s of thousands of records and you were trying to present the whole thing to the end user. That would be a phenomenally user-hostile thing to do, so I don't think it's something you should be worrying about.
If it really is a concern, then you should consider changing your PK from the Autonumber to a natural key (though that can be problematic, given real-world data and the prohibition on non-Null fields in compound unique indexes).

I've got a couple of things to add that I didn't notice being mentioned here, at least not explicitly:
Field Length, create your fields as large as you'll need them but don't go over - for instance, if you have a number field and the value will never be over 1000 (for the sake of argument) then don't type it as a Long Integer, something smaller like Integer would be more appropriate, or use a single instead of a double for decimal numbers, etc. By the same token, if you have a text field that won't have more than 50 chars, don't set it up for 255, etc, etc. Sounds obvious, but it's done, often times with the idea that "I might need that space in the future" and your app suffers in the mean time.
Not to beat the indexing thing to death...but, tables that you're joining together in your queries should have relationships established, this will create indexes on the foreign keys which greatly increases the performance of table joins (NOTE: Double check any foreign keys to make sure they did indeed get indexed, I've seen cases where they haven't been - so apparently a relationship doesn't explicitly mean that the proper indexes have been created)
Apparently compacting your DB regularly can help performance as well, this reduces internal fragmentation of the file and can speed things up that way.
Access actually has a Performance Analyzer, under tools Analyze > Performance, it might be worth running it on your tables & queries at least to see what it comes up with. The table analyzer (available from the same menu) can help you split out tables with alot of redundant data, obviously, use with caution - but it's could be helpful.
This link has a bunch of stuff on access performance optimization on pretty much all aspects of the database, tables, queries, forms, etc - it'd be worth checking out for sure.
http://office.microsoft.com/en-us/access/hp051874531033.aspx

To understand the answers here it is useful to consider how access works, in an un-indexed table there is unlikely to be any value in organising the data so that recently accessed records are at the end. Indeed by the virtue of the fact that Access / the JET engine is an ISAM database it's the other way around. (http://en.wikipedia.org/wiki/ISAM) That's rather moot however as I would never suggest putting frequently accessed values at the top of a table, it is best as others have said to rely on useful indexes.

Have you ever worked with Database no "Relations", no "PKs", and no "FKs", just raw data?

Nowadays, I'm working on a database, with no "Relations, PKs and FKs", just
raw data.
I can say that database is just set of papers.
When I asked about this, I had this; "Hide the Business".
Also, one of my friends said, this always happens in "Large systems".
In large systems, they are tyring to hide thier business through raw data.
Regarding development; relations, constraints, validation, are done in database using triggers and of course user interface.
What do you think regarding this?

Well, this may have point on large databases, when you need fast responce on massive DML (INSERT / UPDATE / DELETE).
The problem is that if you rely on database's way to ensure integrity, you hardly can optimize it.
There is also thing called SQL/PLSQL context switching in Oracle: if you create an empty trigger on the table, it will slow down DML about 20 times — with the mere fact that the trigger exists.
In Oracle, when you write a ON UPDATE trigger and update 50,000 rows in the table, the trigger and the query in it gets called 50,000 times. Foreign keys perform better, but they may also get laggy (and you can do nothing with the underlying queries)
In this case, it's better to put the results you want to update into a temporary table, issue a MERGE, check integrity before and after, and apply the business rules. A single query that processes 50,000 rows works faster than a loop of 50,000 queries processing single row.
Of course, it's very hard to implement and only pays for itself when you have really large database and need to perform really massive updates on it.
In Oracle, in any case, FOREING KEY constraints perform better than tiggers implementing the same functionality.
PRIMARY KEYS will most likely improve performance, as a primary key implies creating the UNIQUE INDEX on the constrained field, and this index may be efficiently used in the queries. A UNIQUE INDEX is also a natural and most efficent way to enforce uniqueness.
But of course, as any index, is slows down INSERTS and those UPDATES and DELETES whose WHERE condition is not selective.
I. e. if you need to UPDATE or DELETE 1 row of 2,000,000, then the index is your friend; if you need to UPDATE or DELETE 1,500,000 rows of 2,000,000, the index is your enemy. It's a matter of tradeoff.
You may also see my answer here.

I would say sacrificing data integrity for security through obscurity is a bad trade.

I think I came across at least two applications with databases lacking relations and FKs.
The idea is probably that it's more difficult to reverse engineer the brillant database schema.
The side effect is that often the applications are not so good at checking constraints themselves, leading to lot of rubbish data in the database, which in fact does make it more difficult to reverse engineer, as FK constraints are not enforced ;)
My view is that once it's a database, somebody else can look into it, and trying to work around this "feature" of visibility is pointless and generally Not a Good Thing, considering the drawbacks (no relations, no SPs, no triggers, etc).

Primary Keys, relations etc etc are tools for making database development easier and for making the final result faster and more efficient. I can only think of a few rare cases where not having a key/index would be a good idea.
Did your friend explain why they held this view?

This kind of thing can happen in financial systems. It's the opposite of what you'd expect, you'd think that because it's finance that best practice would be applied more rigorously. However, the converse is often true. Many of these databases may have started out in excel.
I have seen a number of databases where they don't bother with fks or pks. I can't say I like it, but sometimes you have to have to just live with it, or leave and go work somewhere else for less money but with more database integrity.
Perhaps this is why they need you.

(me=MSSQL)
Deleting a row form a large table that has lots of FKs is slow.
Our APP has never been denied a Delete for a FK constraint violation on that table
I have considered dropping the FKs to improve performance.
Too chicken to have actually done it though :(
PS We would keep the FKs on the DEV / TEST systems

Maybe this was converted from a former system that used a file based structure to hold the data. The tables in the new database are just a reflection of the individual files.

What's the better database design: more tables or more columns?

A former coworker insisted that a database with more tables with fewer columns each is better than one with fewer tables with more columns each. For example rather than a customer table with name, address, city, state, zip, etc. columns, you would have a name table, an address table, a city table, etc.
He argued this design was more efficient and flexible. Perhaps it is more flexible, but I am not qualified to comment on its efficiency. Even if it is more efficient, I think those gains may be outweighed by the added complexity.
So, are there any significant benefits to more tables with fewer columns over fewer tables with more columns?

I have a few fairly simple rules of thumb I follow when designing databases, which I think can be used to help make decisions like this....
Favor normalization. Denormalization is a form of optimization, with all the requisite tradeoffs, and as such it should be approached with a YAGNI attitude.
Make sure that client code referencing the database is decoupled enough from the schema that reworking it doesn't necessitate a major redesign of the client(s).
Don't be afraid to denormalize when it provides a clear benefit to performance or query complexity.
Use views or downstream tables to implement denormalization rather than denormalizing the core of the schema, when data volume and usage scenarios allow for it.
The usual result of these rules is that the initial design will favor tables over columns, with a focus on eliminating redundancy. As the project progresses and denormalization points are identified, the overall structure will evolve toward a balance that compromises with limited redundancy and column proliferation in exchange for other valuable benefits.

It doesn't sound so much like a question about tables/columns, but about normalization. In some situations have a high degree of normalization ("more tables" in this case) is good, and clean, but it typically takes a high number of JOINs to get relevant results. And with a large enough dataset, this can bog down performance.
Jeff wrote a little about it regarding the design of StackOverflow. See also the post Jeff links to by Dare Obasanjo.

I would argue in favor of more tables, but only up to a certain point. Using your example, if you separated your user's information into two tables, say USERS and ADDRESS, this gives you the flexibility to have multiple addresses per user. One obvious application of this is a user who has separate billing and shipping addresses.
The argument in favor of having a separate CITY table would be that you only have to store each city's name once, then refer to it when you need it. That does reduce duplication, but in this example I think it's overkill. It may be more space efficient, but you'll pay the price in joins when you select data from your database.

A fully normalized design (i.e, "More Tables") is more flexible, easier to maintain, and avoids duplication of data, which means your data integrity is going to be a lot easier to enforce.
Those are powerful reasons to normalize. I would choose to normalize first, and then only denormalize specific tables after you saw that performance was becoming an issue.
My experience is that in the real world, you won't reach the point where denormalization is necessary, even with very large data sets.

Each table should only include columns that pertain to the entity that's uniquely identified by the primary key. If all the columns in the database are all attributes of the same entity, then you'd only need one table with all the columns.
If any of the columns may be null, though, you would need to put each nullable column into its own table with a foreign key to the main table in order to normalize it. This is a common scenario, so for a cleaner design, you're likley to be adding more tables than columns to existing tables. Also, by adding these optional attributes to their own table, they would no longer need to allow nulls and you avoid a slew of NULL-related issues.

It depends on your database flavor. MS SQL Server, for example, tends to prefer narrower tables. That's also the more 'normalized' approach. Other engines might prefer it the other way around. Mainframes tend to fall in that category.

The multi-table database is a lot more flexible if any of these one to one relationships may become one to many or many to many in the future. For example, if you need to store multiple addresses for some customers, it's a lot easier if you have a customer table and an address table. I can't really see a situation where you might need to duplicate some parts of an address but not others, so separate address, city, state, and zip tables may be a bit over the top.

Like everything else: it depends.
There is no hard and fast rule regarding column count vs table count.
If your customers need to have multiple addresses, then a separate table for that makes sense. If you have a really good reason to normalize the City column into its own table, then that can go, too, but I haven't seen that before because it's a free form field (usually).
A table heavy, normalized design is efficient in terms of space and looks "textbook-good" but can get extremely complex. It looks nice until you have to do 12 joins to get a customer's name and address. These designs are not automatically fantastic in terms of performance that matters most: queries.
Avoid complexity if possible. For example, if a customer can have only two addresses (not arbitrarily many), then it might make sense to just keep them all in a single table (CustomerID, Name, ShipToAddress, BillingAddress, ShipToCity, BillingCity, etc.).
Here's Jeff's post on the topic.

There are advantages to having tables with fewer columns, but you also need to look at your scenario above and answer these questions:
Will the customer be allowed to have more than 1 address? If not, then a separate table for address is not necessary. If so, then a separate table becomes helpful because you can easily add more addresses as needed down the road, where it becomes more difficult to add more columns to the table.

i would consider normalizing as the first step, so cities, counties, states, countries would be better as separate columns... the power of SQL language, together with today DBMS-es allows you to group your data later if you need to view it in some other, non-normalized view.
When the system is being developed, you might consider 'unnormalizing' some part if you see that as an improvement.

I think balance is in order in this case. If it makes sense to put a column in a table, then put it in the table, if it doesn't, then don't. Your coworkers approach would definately help to normalize the database, but that might not be very useful if you have to join 50 tables together to get the information you need.
I guess what my answer would be is, use your best judgement.

There are many sides to this, but from an application efficiency perspective mote tables can be more efficient at times. If you have a few tables with a bunch of columns every time the db as to do an operation it has a chance of making a lock, more data is made unavailable for the duration of the lock. If locks get escalated to page and tables (well hopefully not tables :) ) you can see how this can slow down the system.

Hmm.
I think its a wash and depends on your particular design model. Definitely factor out entities that have more than a few fields out into their own table, or entities whose makeup will likely change as your application's requirements changes (for instance - I'd factor out address anyways, since it has so many fields, but I'd especially do it if you thought there was any chance you'd need to handle foreign country addresses, which can be of a different form. The same with phone numbers).
That said, when you're got it working, keep an eye out on performance. If you've spun an entity out that requires you to do large, expensive joins, maybe it becomes a better design decision to spin that table back into the original.

When you design your database, you should be as close as possible from the meaning of data and NOT your application need !
A good database design should stand over 20 years without a change.
A customer could have multiple adresses, that's the reality. If you decided that's your application is limited to one adresse for the first release, it's concern the design of your application not the data !
It's better to have multiple table instead of multiple column and use view if you want to simplify your query.
Most of time you will have performance issue with a database it's about network performance (chain query with one row result, fetch column you don't need, etc) not about the complexity of your query.

There are huge benefits to queries using as few columns as possible. But the table itself can have a large number. Jeff says something on this as well.
Basically, make sure that you don't ask for more than you need when doing a query - performance of queries is directly related to the number of columns you ask for.

I think you have to look at the kind of data you're storing before you make that decision. Having an address table is great but only if the likelihood of multiple people sharing the same address is high. If every person had different addresses, keeping that data in a different table just introduces unnecessary joins.
I don't see the benefit of having a city table unless cities in of themselves are entities you care about in your application. Or if you want to limit the number of cities available to your users.
Bottom line is decisions like this have to take the application itself into considering before you start shooting for efficiency. IMO.

First, normalize your tables. This ensures you avoid redundant data, giving you less rows of data to scan, which improves your queries. Then, if you run into a point where the normalized tables you are joining are causing the query to take to long to process (expensive join clause), denormalize where more appropriate.

Good to see so many inspiring and well based answers.
My answer would be (unfortunately): it depends.
Two cases:
* If you create a datamodel that is to be used for many years and thus possibly has to adept many future changes: go for more tables and less rows and pretty strict normalization.
* In other cases you can choose between more tables-less rows or less tables-more rows. Especially for people relatively new to the subject this last approach can be more intuitive and easy to comprehend.
The same is valid for the choosing between the object oriented approach and other options.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight