In a discussion with a friend, I got to hear two things -
Using constraints causes slight decrease in performance. eg. Consider a uniqueness constraint. Before insertion, DBMS would have to check for the uniqueness in all of existing data, thus causing extra computation.
He suggested to make sure that these constraints are handled at the application level logic itself. eg. Delete rows from both table yourself properly, instead of putting foreign integrity constraint etc.
First one sounds a little logical to me, but the second one seems pretty wrong intuitively. I don't have enough experience in DBMS to really judge these claims though.
Q. Is the claim 1 correct ? If so, is claim 2 even the right way to handle such scenarios ?

If your data needs to be correct, you need to enforce the constraints, and if you need to enforce the constraints, letting the database do it for you will be faster than anything else (and likely more correct too).
Attempting to enforce something like key uniqueness at the application-level can be done correctly or quickly, but not both. For example, let's say you want to insert a new row. A naive application-level algorithm could look something like this:
Search the table for the (key fields of) new row.
If not found, insert the new row.
And that would actually work in a single-client / single-threaded environment. However, in a concurrent environment, some other client could write that same key value in between your steps 1 and 2, and presto: you have yourself a duplicate in your data without even knowing it!
To prevent such a race condition, you'd have to use some form of locking, and since you are inserting a new row, there is no row to lock yet - you'll likely end-up locking the entire table, destroying scalability in the process.
OTOH, if you let the DBMS do it for you, it can do it in a special way without too much locking, which has been tested and double-tested for correctness in all the tricky concurrent edge cases, and whose performance has been optimized over the time the DBMS has been on the market.
Similar concerns exist for foreign keys as well.
So yeah, if your application is the only one accessing the database (e.g. when using an embedded database), you may get away with application-level enforcement, although why would you if the DBMS can do it for you?
But in a concurrent environment, leave keys and foreign keys to the database - you'll have plenty of work anyway, enforcing your custom "business logic" (that is not directly "declarable" in the DBMS) in a way that is both correct and performant...
That being said, feel free to perform any application-level "pre-checks" that benefit your user experience. But do them in addition to database-level constraints, not instead of them.

Claim 1 is correct, claim 2 is incorrect, just like you concluded.
Database's job is to handle the data and its integrity. App's job is to ask the database about the data and then perform work with that data.
If you handle #2 trough the application:
you have to handle concurrency - what happens when there's more than 1 connection active to the db? You need to lock tables to perform operations ensuring uniqueness or integrity. Since this connection can break at any time, you've got a huge problem at your hands. How to unlock tables when the process that locked it died?
you can't do a better job from the app than the database can on its own. You still need to check the rows for uniqueness, meaning that you need to retrieve all the data, perform the check on the whole dataset and then write it. You can't do anything better or faster than database can - by definition, it will be slower since you need to transfer the data from db to your app
databases are made with concurrency in mind. Creating optimizations using logic of your friend is what leads to unstable apps, duplicate data, unresponsive databases etc. Never do that. Let the db do its job, it's made for such purposes.
When checking for uniqueness, MySQL utilizes indexes which is a data structure made for fast access. The speed at which MySQL performs uniqueness check is incomparable in performance compared to what any app can do - it's simply going to do the work faster. If you need unique data, you need to ensure that you have unique data - this is a workload that can't be avoided and people that develop databases are using proven algorithms designed for speed. It works at optimum speed already.
As for integrity - the same, MySQL (or any other RDBMS) is made to handle such scenarios. If foreign key constraints would be better if implemented in app logic, then we'd never have FK's available to us in the first place. Like I mentioned before - the database's job is to take care of that.
ACID for relational databases isn't there for no reason. Atomicity, Consistency, Isolation, Durability MySQL's InnoDB implements and allows for those, if you need it - then you use it. There's no app in any language that anyone can create which performs better in any way compared to MySQL's internal handling of those.
TL;DR: you are correct in your thinking.

Yes, it's true that checking a constraint is going to take time and slow down database updates.
But it's not at all clear how moving this logic to the application will result in a net performance improvement. Now you have at least two separate trips to the database: one to check the constraint and another to perform the update. Every trip to the database costs: It takes time to make a connection, it takes time for the database engine to parse the query and construct a query plan, it takes time to send results back. As the database engine doesn't know what you're doing or why, it can't optimize. In practice, one "big visit" is almost always cheaper than two "small visits" that accomplish the same thing.
I'm speaking here mostly of uniqueness constraints and relational integrity constraints. If you have a constraint that can be tested without visiting the database, like a range limit on an individual field, it would be faster to do that in the application. Maybe still not a good idea for a variety of reasons, but it would be faster.

Constraints do generally cause a slight decrease in performance. Nothing is free. There are, however, two important considerations:
The performance hit is usually so slight that it is lost in the "noise" of the natural variability of a running system so it would take tests involving thousands or millions of test queries to determine the difference.
One has to ask "Affects the performance where?" Constraints affect the performance of DML operations. But if the constraints were not there, then every query would have to perform additional testing to verify the accuracy of the data being read. I can assure you, this will be at a far greater performance hit than the constraints.
There are exceptions, of course, but most databases are queried a lot more often than modified. So if you can shift performance hits from queries to DML, you generally speed up the overall performance of the system.
Perform separate constraint checking at the app level by all means. It is a tremendous benefit to provide the user with feedback during the process of collecting data ("Delivery date cannot be in the past!") rather than waiting until the attempt to insert the data into the database fails.
But that doesn't mean remove them from the database. This redundancy is important. Can you absolutely guarantee that the only operations ever performed on the database will originate from the app? Absolutely not. There is too much normal maintenance activity going on outside the app to make that promise. Not to mention that there are generally more than one app so the guarantee must apply to each one. Too many loose ends.
When designing a database, data integrity is your number one priority. Never sacrifice that for the sake of performance, especially since performance of a well-designed database is not often an issue and even when it is, there are far too many ways to improve performance that does not involve removing constraints (or denormalizing, another mistake many still make in order to improve the performance of an OLTP system).

Q. Is the claim 1 correct ?
Yes. In my experience, using constraints can cause a massive decrease in performance.
The performance impact is relative to the amount of constraints and records in the tables. As table records grow, the performance is impacted and DB performance can move from great to bad fast.
For example. In one auditing company I worked for, part of the process was to serialize an excel matrix containing a large number of responsibilities/roles/functions into a set of tables which had many FK constraints.
Initially the performance was fine, but within 6 months to a year this serialization process took a few minutes to complete. We optimised as much as we could with little affect. If we switched off the constraints, this process completed in a few seconds.
If so (if claim 1 is correct), is claim 2 even the right way to handle such scenarios ?
Yes, but under certain circumstances.
You have a large number of constraints
You have a large number / ever growing records in your DB tables.
The DB hardware provided is not able to be improved upon for whatever reason and you are experiencing performance problems.
So with the performance problem we had at the auditing company, we looked at moving the constraint checks into an application dataset. So in essence the dataset was used to check and validate the constraints and the matrix DB tables used simply for storage (and processing).
NOTE: This worked for us because the matrix data never changed once inserted and each matrix was independent of all other past inserted matrices.


Is a strongly normalized relational database not efficient?

I was reading this question and I got the following question: using a very normalized db is not efficient?
How should be found the right compromise?
I'm not sure if this question better fits here or on programmer. Here there are some similar but if I should move, just ask me.
Whether it will speed it up or slow it down depends strongly on the nature of the data, the size of the tables, the type of querying, the indexing. I have seen it go both ways although, more often than not in my experience, normalization to the third normal form speeds things up. Relational databases are built to be normalized and designed so that those things are expected.
One thing the denormalization advocates often forget is that speed is critical to transactions (possibly more critical due to blocking potential) and that denormalization often slows down updates. You can't measure performance just on select statements. Denormalized database tables are often wider and wider tables can often cause slowdowns too.
Denormalized databases are a major problem to keep the data integrity in and a change of a company name in a normalized database might result in one record needing to be updated and in a denormalized one might result in 100,000,000 records needing to be updated. That is why denormalization is generally only preferred for databases (like data warehouses) where the data is loaded through an ETL process but the database itself is frequently queried for complex reporting scenarios. Transactional databases that have a lot of user updates and deletions and inserts are often much faster if they are normalized to the third normal forma at least. Now you can go crazy with normalization too, don't get me wrong. I shouldn't have to join to 10 tables to get a simple address especially if I get them often. Data that is often used together often belongs together especially if the items are unlikely to change a million records if a change is made. For instance in the address, it would require a large update if Chicago changed it's name to New Chicago, but those types of massive address changes are pretty rare in my part of the world. On the other hand, company name changes are frequent and could cause massive disruption if they needed to be made to millions of denormalized records.
If you are not designing a data warehouse, then normalize your data. Never denormalize unless you are a database specialist with at least 5 years experience in large systems. You can harm things tremendously if you don't know what you are doing. If things are slow denormalization is one of the last performance improvements to try. Generally, the problem is fixed by writing better queries that are sargable and which do not use poorly performing techniques like correlated subqueries or by getting the correct indexing applied.
Normalization optimizes storage requirements and data consistency. As a tradeoff, it can make queries more complex and slow.
How should be found the right compromise?
Unfortunately, that cannot be answered with generality.
It all depends on your application and its requirements.
If your queries run too slow, and indexing or caching or query rewriting or database parameter tuning don't cut it, denormalization may be appropriate for you.
(OTOH, if your queries run just fine, or can be made to run just fine, there is probably no need to go there).
It depends. Every time I've worked to normalize a database, it has radically sped up. But, the performance problems with the non-normalized DBs were that they needed many indices, most of which were not used for any particular query, having too many columns, forced DISTINCT constraints on queries that wouldn't have been needed with a normalized DB, and inefficient table searching.
If common queries need to perform many joins on large tables for the simplest of lookups, or hit many tables for writes to update what the user/application sees as an atomic update of a single entity, then as traffic grows, so will that burden, at a rate higher than with lower/no normalization. Typically what happens is that everything runs OK until either the database and application are put on different production servers, while they were on the same dev server, or when the data gets big enough to start hitting the disks all the time.
DBMS products couple logical layout and physical storage, so while it may be as likely to increase speed as decrease it, normalization of base tables will in some way affect performance of the system.
Usually, the right compromise is views, with an SQL DBMS. If you are using any variation of design by contract, views are likely the correct design decision even without any concerns for normalization or performance, so that the application gets a model fitting its needs. Scalability concerns, like for major websites, create problems that don't have quick and easy solutions, at this point in time.
Additionally to Thilo's post:
normalizing on SAP HANA is wrong due to the fact the db normalize the data itself. If you do it anyway you will slow down the database.

What are the first issues to check while optimizing an existing database?

What are the top issues and in which order of importance to look into while optimizing (performance tuning, troubleshooting) an existing (but unknown to you) database?
Which actions/measures in your previous optimizations gave the most effect (with possibly the minimum of work) ?
I'd like to partition this question into following categories (in order of interest to me):
one needs to show the performance boost (improvements) in the shortest time. i.e. most cost-effective methods/actions;
non-intrusive or least-troublesome most effective methods (without changing existing schemas, etc.)
intrusive methods
Suppose I have a copy of a database on dev machine without access to production environment to observe stats, most used queries, performance counters, etc. in real use.
This is development-related but not DBA-related question.
Suppose the database was developed by others and was given to me for optimization (review) before it was delivered to production.
It is quite usual to have outsourced development detached from end-users.
Besides, there is a database design paradigm that a database, in contrast to application data storage, should be a value in itself independently on specific applications that use it or on context of its use.
Update3: Thanks to all answerers! You all pushed me to open subquestion
How do you stress load dev database (server) locally?
Create a performance Baseline (non-intrusive, use performance counters)
Identify the most expensive queries (non-intrusive, use SQL Profiler)
Identify the most frequently run queries (non-intrusive, use SQL Profiler)
Identify any overly complex queries, or those using slowly performing constructs or patterns. (non-intrusive to identify, use SQL Profiler and/or code inspections; possibly intrusive if changed, may require substantial re-testing)
Assess your hardware
Identify Indexes that would benefit the measured workload (non-intrusive, use SQL Profiler)
Measure and compare to your baseline.
If you have very large databases, or extreme operating conditions (such as 24/7 or ultra high query loads), look at the high end features offered by your RDBMS, such as table/index partitioning.
This may be of interest: How Can I Log and Find the Most Expensive Queries?
If the database is unknown to you, and you're under pressure, then you may not have time for Mitch's checklist which is good best practice to monitor server health.
You also need access to production to gather real info from assorted queries you can run. Without this, you're doomed. The server load pattern is important: you can't reproduce many issue yourself on a development server because you won't use the system like an end user.
Also, focus on "biggest bang for the buck". An expensive query running once daily at 3am can be ignored. A not-so-expensive one running every second is well worth optimising. However, you may not know this without knowing server load pattern.
So, basic steps..
Assuming you're firefighting:
server logs
SQL Server logs
sys.sysprocesses eg ASYNC_NETWORK_IO waits
Slow response:
profiler, with a duration filter. What runs often and is lengthy
most expensive query, weighted for how often used
open transaction with plan
weighted missing index
Things you should have:
Tested restore of aforementioned backups
Regular index and statistic maintenance
Regular DBCC and integrity checks
Edit: After your update
Static analysis is best practices only: you can't optimise for usage. This is all you can do. This is marc_s' answer.
You can guess what the most common query may be, but you can't guess how much data will be written or how badly a query scales with more data
In many shops developers provide some support, either directly or as *3rd line"
If you've been given a DB for review by another team that you hand over to another team to deploy: that's odd.
If you're not interested in the runtime behavior of the database, e.g. what are the most frequently executed queries and those that consume the most time, you can only do a "static" analysis of the database structure itself. That has a lot less value, really, since you can only check for a number of key indicators of bad design - but you cannot really tell much about the "dynamics" of the system being used.
Things I would check for in a database that I get as a .bak file - without the ability to collect live and actual runtime performance statistics - would be:
normalization - is the table structure normalized to third normal form? (at least most of the time - there might be some exceptions)
do all tables have a primary key? ("if it doesn't have a primary key, it's not a table", after all)
For SQL Server: do all the tables have a good clustering index? A unique, narrow, static, and preferably ever-increasing clustered key - ideally an INT IDENTITY, and most definitely not a large compound index of many fields, no GUID's and no large VARCHAR fields (see Kimberly Tripp's excellent blog posts on the topics for details)
are there any check and default constraints on the database tables?
are all the foreign key fields backed up by a non-clustered index to speed up JOIN queries?
are there any other, obvious "deadly sins" in the database, e.g. overly complicated views, or really badly designed tables etc.
But again: without actual runtime statistics, you're quite limited in what you can do from a "static analysis" point of view. The real optimization can only really happen when you have a workload from a regular day of operation, to see what queries are used frequently and put the most stress on your database --> use Mitch's checklist to check those points.
The most important thing to do is collect up-to-date statistics. Performance of a database depends on:
the schema;
the data in the database; and
the queries being executed.
Looking at any of those in isolation is far less useful than the whole.
Once you have collected the statistics, then you start identifying operations that are sub-par.
For what it's worth, the vast majority of performance problems we've fixed have been by either adding indexes, adding extra columns and triggers to move the cost of calculations away from the select to the insert/update, and tactfully informing the users that their queries are, shall we say, less than optimal :-)
They're usually pleased that we can just give them an equivalent query that runs much faster.

Database Designing: An art or headache (Managing relationships)

I have seen in my past experience that most of the people don't use physical relationships in tables and they try to remember them and apply them through coding only.
Here 'Physical Relationships' refer to Primary Key, Foreign Key, Check constraints, etc.
While designing a database, people try to normalize the database on paper and keep things documented. Like, if I have to create a database for a marketing company, I will try to understand its requirements.
For example, what fields are mandatory, what fields will contain only (a or b or c) etc.
When all the things are clear, then why are most of the people afraid of the constraints?
Don't they want to manage things?
Do they have a lack of knowledge
(which I don't think is so)?
Are they not confident about future
Is it really a tough job managing all these entities?
What is the reason in your opinion?
I always have the DBMS enforce both primary key and foreign key constraints; I often add check constraints too. As far as I am concerned, the data is too important to run the risk of inaccurate data being stored.
If you think of the database as a series of stored true logical propositions, you will see that if the database contains a false proposition - an error - then you can argue to any conclusion you want. Given a false premise, any conclusion is true.
Why don't other people use PK and FK constraints, etc?
Some are unaware of their importance (so lack of knowledge is definitely a factor, even a major factor). Others are scared that they will cost too much in performance, forgetting that one error that has to be fixed may easily use up all the time saved by not having the DBMS do the checking for you. I take the view that if the current DBMS can't handle them well, it might be (probably is) time to change DBMS.
Many developers will check the constraints in code above the database before they actually go to perform an operation. Sometimes, this is driven by user experience considerations (we don't want to present choices / options to users that can't be saved to the database). In other cases, it may be driven by the pain associated with executing a statement, determining why it failed, and then taking corrective action. Most people would consider code more maintainable if it did the check upfront, along with other business logic that might be at play, rather than taking corrective action through an exception handler. (Not that this is necessarily an ideal line of thinking, but it is a prevalent one.) In any case, if you are doing the check in advance of issuing the statement, and not particularly conscious of the fact that the database might get touched by applications / users who are not coming in through your integrity-enforcing code, then you might conclude that database constraints are unnecessary, especially with the performance hit that could be incurred from their use. Also, if you are checking integrity in the application code above the database, one might consider it a violation of DRY (Don't Repeat Yourself) to implement logically equivalent checks in the database itself. The two manifestations of integrity rules (those in database constraints and those in application code above the database) could in principle become out-of-sync if not managed carefully.
Also, I would not discount option 2, that many developers don't know much about database constraints, too readily.
Well, I mean, everyone is entitled to their own opinion and development strategy I suppose, but in my humble opinion these people are almost certainly wrong :)
The reason, however, someone may wish to avoid constraints is efficiency. Not because constraints are slow, but because storing redundant data (i.e. caching) is a very effective way of speeding up (well, avoiding) an expensive calculation. This is an acceptable approach, when implemented properly (i.e. the cache is updated a regular/appropriate intervals, generally I do this with a trigger).
As to the motivation to not us FKs without a caching motivation, I can't imagine it. Perhaps they aim to be 'flexible' in their DB structure. If so, fine, but then don't use a relational DB, because it's pointless. Non-relational DBs (OO dbs) certainly have their place, and may even arguably be better (quite arguable, but interesting to argue) but it's a mistake to use a relational DB and not use it's core properties.
I would always define PK and FK constraints. especially when using an ORM. it really makes the life easy for everybody to let the ORM reverse engineer the database instead of manually configuring it to use some PKs and FKs
There are several reasons for not enforcing relationships in descending order of importance:
People-friendly error handling.
Your program should check constraints and send an intelligible message to the user. For some reason normal people dont like "SQL exception code -100013 goble rule violated for table gook'.
Operational flexibility.
You dont really want your operators trying to figure out which order you must load your tables in at 3 a.m., nor do you want your testers pulling their hair out 'cause they cannot reset the database back to its starting position.
Cheking constraints does consume IO and CPU.
Its a cheap way to save details for later recovery. For instance in an on line order system you could leave the detail item rows in the table when the users kills a parent order, if he later reinstates the order the details re-appear as if by a miracle -- you acheive this extra feature by deleteing lines of code. (course you need some housekeeping process but it is trivial!)
As things get more complex and more tables and relationships are needed in the database, how can you ensure the database developer remembers to check all of them? When you makea change to the schema that adds a new "informal" relationship, how can you ensure all the application code which might be affected gets changed?
Suddenly you could be deleting records that should stay because they have related data the developer forgot to check when writng the delete process or because that process was in place before the last ten related tables were added to the schema.
It is foolhardy in the extreme to not formally set up PK/FK relationships. I process data received from many different vendors and databases. You can tell which ones have data integrity problems most likely caused by a failure to explicitly define relationships by the poor quality of their data.

Why don't databases intelligently create the indexes they need?

I just heard that you should create an index on any column you're joining or querying on. If the criterion is this simple, why can't databases automatically create the indexes they need?
Well, they do; to some extent at least...
See SQL Server Database Engine Tuning Advisor, for instance.
However, creating optimal indexes is not as simple as you mentioned. An even simpler rule could be to create indexes on every column (which is far from optimal)!
Indexes are not free. You create indexes at the cost of storage and update performance among other things. They should be carefully thought about to be optimal.
Every index you add may increase the speed of your queries. It will decrease the speed of your updates, inserts and deletes and it will increase disk space usage.
I, for one, would rather keep the control to myself, using tools such as DB Visualizer and explain statements to provide the information I need to evaluate what should be done. I do not want a DBMS unilaterally deciding what's best.
It's far better, in my opinion, that a truly intelligent entity be making decisions re database tuning. The DBMS can suggest all it wants but the final decision should be left up to the DBAs.
What happens when the database usage patterns change for one week? Do you really want the DBMS creating indexes and destroying them a week later? That sounds like a management nightmare scenario right up alongside Skynet :-)
This is a good question. Databases could create the indexes they need based on data usage patterns, but this means that the database would be slow the first time certain queries were executed and then get faster as time goes on. For example if there is a table like this:
-- --------
: then the username would be used to look up the users very often. After some time the database could see that say 50% of queries did this, in which case it could add an index on the username.
However the reason that this hasn't been implemented in great detail is simply because it is not a killer feature. Adding indexes is performed relatively few times by the DBA, and by automating this (which is a very big task) is probably just not worth it for the database vendors. Remember that every query will have to be analyzed to enable auto indexes, and also the query response time, and result set size as well, so it is non-trivial to implement.
Because databases simply store and retrieve data - the database engine has no clue how you intend to retrieve that data until you actually do it, in which case it is too late to create an index. And the column you are joining on may not be suitable for an efficient index.
It's a non-trivial problem to solve, and in many cases a sub-optimal automatic solution might actually make things worse. Imagine a database whose read operations were sped up by automatic index creation but whose inserts and updates got hosed as a result of the overhead of managing the index? Whether that's good or bad depends on the nature of your database and the application it's serving.
If there were a one-size-fits-all solution, databases would certainly do this already (and there are tools to suggest exactly this sort of optimization). But tuning database performance is largely an app-specific function and is best accomplished manually, at least for now.
An RDBMS could easily self-tune and create indices as it saw fit but this would only work for simple cases with queries that do not have demanding execution plans. Most indices are created to optimize for specific purposes and these kinds of optimizations are better handled manually.

30 million records a day, SQL Server can't keep up, other kind of database system needed?

Some time ago I thought an new statistics system over, for our multi-million user website, to log and report user-actions for our customers.
The database-design is quite simple, containing one table, with a foreignId (200,000 different id's), a datetime field, an actionId (30 different id's), and two more fields containing some meta-information (just smallints). There are no constraints to other tables. Furthermore we have two indexes each containing 4 fields, which cannot be dropped, as users are getting timeouts when we are having smaller indexes. The foreignId is the most important field, as each and every query contains this field.
We chose to use SQL server, but after implementation doesn't a relational database seem like a perfect fit, as we cannot insert 30 million records a day (it's insert only, we don't do any updates) when also doing alot of random reads on the database; because the indexes cannot be updated fast enough. Ergo: we have a massive problem :-) We have temporarily solved the problem, yet
a relational database doesn't seem to be suited for this problem!
Would a database like BigTable be a better choice, and why? Or are there other, better choices when dealing with this kind of problems?
NB. At this point we use a single 8-core Xeon system with 4 GB memory and Win 2003 32-bit. RAID10 SCSI as far as I know. The index size is about 1.5x the table size.
You say that your system is capable of inserting 3000 records per second without indexes, but only about 100 with two additional non-clustered indexes. If 3k/s is the maximum throughput your I/O permits, adding two indexes should in theory reduces the throughput at about 1000-1500/sec. Instead you see a degradation 10 times worse. The proper solution and answer is 'It Dependts' and some serious troubleshooting and bottleneck identification would have to be carried out. With that in mind, if I was to venture a guess, I'd give two possible culprits:
A. Th additional non-clustered indexes distribute the writes of dirty pages into more allocation areas. The solution would be to place the the clustered index and each non-clustered index into its own filegroup and place the three filegroups each onto separate LUNs on the RAID.
B. The low selectivity of the non-clustered indexes creates high contention between reads and writes (key conflicts as well as %lockres% conflicts) resulting in long lock wait times for both inserts and selects. Possible solutions would be using SNAPSHOTs with read committed snapshot mode, but I must warn about the danger of adding lot of IO in the version store (ie. in tempdb) on system that may already be under high IO stress. A second solution is using database snapshots for reporting, they cause lower IO stress and they can be better controlled (no tempdb version store involved), but the reporting is no longer on real-time data.
I tend to believe B) as the likely cause, but I must again stress the need to proper investigation and proper root case analysis.
'RAID10' is not a very precise description.
How many spindles in the RAID 0 part? Are they short-striped?
How many LUNs?
Where is the database log located?
Where is the database located?
How many partitions?
Where is tempdb located?
As on the question whether relational databases are appropriate for something like this, yes, absolutely. There are many more factors to consider, recoverability, availability, toolset ecosystem, know-how expertise, ease of development, ease of deployment, ease of management and so on and so forth. Relational databases can easily handle your workload, they just need the proper tuning. 30 million inserts a day, 350 per second, is small change for a database server. But a 32bit 4GB RAM system hardly a database server, regardless the number of CPUs.
It sounds like you may be suffering from two particular problems. The first issue that you are hitting is that your indexes require rebuilding everytime you perform an insert - are you really trying to run live reports of a transactional server (this is usually considered a no-no)? Secondly, you may also be hitting issues with the server having to resize the database - check to ensure that you have allocated enough space and aren't relying on the database to do this for you.
Have you considered looking into something like indexed views in SQL Server? They are a good way to remove the indexing from the main table, and move it into a materialised view.
You could try making the table a partitioned one. This way the index updates will affect smaller sets of rows. Probably daily partitioning will be sufficient. If not, try partitioning by the hour!
You aren't providing enough information; I'm not certain why you say that a relational database seems like a bad fit, other than the fact that you're experiencing performance problems now. What sort of machine is the RDBMS running on? Given that you have foreign ID's, it seems that a relational database is exactly what's called for here. SQL Server should be able to handle 30 million inserts per day, assuming that it's running on sufficient hardware.
Replicating the database for reporting seems like the best route, given heavy traffic. However, a couple of things to try first...
Go with a single index, not two indexes. A clustered index is probably going to be a better choice than non-clustered. Fewer, wider indexes will generally perform better than more, narrower, indexes. And, as you say, it's the indexing that's killing your app.
You don't say what you're using for IDs, but if you're using GUIDs, you might want to change your keys over to bigints. Because GUIDs are random, they put a heavy burden on indexes, both in building indexes and in using them. Using a bigint identity column will keep the index running pretty much chronological, and if you're really interested in real-time access for queries on your recent data, your access pattern is much better suited for monotonically increasing keys.
Sybase IQ seems pretty good for the goal as our architects/DBAs indicated (as in, they explicitly move all our stats onto IQ stating that capability as the reason). I can not substantiate myself though - merely nod at the people in our company who generally know what they are talking about from past experience.
However, I'm wondering whether you MUST store all 30mm records? Would it not be better to store some pre-aggregated data?
Not sure about SQL server but in another database system I have used long ago, the ideal method for this type activity was to store the updates and then as a batch turn off the indexes, add the new records and then reindex. We did this once per night. I'm not sure if your reporting needs would be a fit for this type solution or even if it can be done in MS SQL, but I'd think it could.
You don't say how the inserts are managed. Are they batched or is each statistic written separately? Because inserting one thousand rows in a single operation would probably be way more efficient than inserting a single row in one thousand separate operations. You could still insert frequently enough to offer more-or-less real time reporting ;)
