Database with "Open Schema" - Good or Bad Idea? - database

The co-founder of Reddit gave a presentation on issues they had while scaling to millions of users. A summary is available here.
What surprised me is point 3:
Instead, they keep a Thing Table and a Data Table. Everything in Reddit is a Thing: users, links, comments, subreddits, awards, etc. Things keep common attribute like up/down votes, a type, and creation date. The Data table has three columns: thing id, key, value. There’s a row for every attribute. There’s a row for title, url, author, spam votes, etc. When they add new features they didn’t have to worry about the database anymore. They didn’t have to add new tables for new things or worry about upgrades.
This seems like a terrible idea to me, but it seems to have worked out for Reddit. Is it a good idea in general, though? Or is it a peculiarity of Reddit that happened to work out for them?

This is a data model known as EAV for entity-attribute-value. It has its uses. A prime example is patient test data which is naturally sparse since there are hundreds of thousands of tests which might be run, but typically only a handful are present for a patient. A table with hundreds of thousands of columns is silly, but a table with EAV makes good sense.

Most of the really big web sites end up using some sort of incredibly simple on the database side of things. This has the advantage that it's fast and scalable. It has the disadvantage that all the relationships that you'd get the database to enforce automatically (via triggers and such) you need to enforce yourself in your client code instead. Maintaining consistency is a pain in the neck, and there's almost always at least some chance that your data will be inconsistent, at least for short periods of time.
For a social networking site, it's a worthwhile compromise. Data that's mostly right most of the time is adequate (e.g., who really cares if the number of up-votes you receive for an item is really 20 milliseconds out of date when it's sent), and keeping costs reasonable while scaling to support a gazillion users matters a lot.

I noted that they did not mention anything about the ease or difficulty in creating reports against that data. When used in a narrow set of circumstances, EAVs can be beneficial. As a central part of most systems it will become a nightmare when you hit reporting. The problem with EAVs is that most of the benefit is at the outset of the project and most of the pain is later in the analysis and reporting especially due to the severe lack of data integrity. "Not having to worry about foreign keys" to me sounds like a nightmare of orphaned rows. Add in the use of surrogate keys for everything and you have a tangled morass which generally ends in a complete rewrite

We worked in a similar problem not long ago, i could say at first it wasn't easy and fun, but after some point that you get used to it, it has it's own benefit, it's like developing another Database withing your tables, in some area it's an overkill task, but when you pass these levels, it provide you with lots of functionality, basically after one point, we didn't create any new table, and we just created dynamic forms for everything's, even for our own programming tasks.
as for performance, system didn't get million of rows to be fair comparison, but for daily usage i never noticed any differences.
some problems i want to share.
we didn't delete any rows we just hide them and setting a flag, and a nightly (weekly) service clean the physical rows
orphan rows, we basically didn't care about cleaning childs, we just set "IsDeleted" On father, and nightly service would clean every rows that is orphan or not needed anymore.
3.you should keep your indexes up to date, but you should skip of building them whenever possible (again nightly service keep index up to date)
we prepared our reporting data ahead of time (AOT) which means we were behind of actual data here :))
we tried hard no to jump in ocean of rows to calc some values by user demand. if we prepared it you can use it, if not then you cant
at the end, there are so many unique challenges to this approach that you should find a way to solve, problems you never faced before in your routine job, but after all of these you earn more flexibility that you can spend.

Related

In what cases are bitemporal tables actually used?

I am trying to collect information about temporal databases. I know it is not a modern technology, but I saw that many people who work with databases don't ever know how temporal approach works (I asked some senior programmers and system analysts about temporal databases and they answered something like "Huh?").
I know there are valid-time state tables and transaction-time state tables, along with bitemporal tables. I think that bitemporal tables are way too complex for most usages, because nowadays space is not a problem anymore, and it is more efficient to write the same information on 2 different tables, even if data is redundant. However, I made many searches online trying to see where bitemporal tables are actually used, but I didn't find anything useful.
Are there cases when use of a bitemporal table is really convenient than valid-time and transaction-time state tables separately? Are there real-world examples?
Of course! Take for example, balance sheet data. You will find that this information will change from WD1 (Working Day) to WD x due to late arriving data, adjustments, manual errors and suchlike.
In order to enable repeatable reporting, audit trail and temporal comparisons, a record must be kept of 'old' (invalid?) results. Bitemporal is a great way to manage such updates, especially on an intraday basis. I don't think it's that complicated from a user perspective - just another filter on the where clause.
I admit that the loading process is complicated, but it's not that bad.. I literally just finished writing a generic transform (in SAS, coping with all scenarios for a unique business key) and it took a single day.
Coming back to use cases.. Having both valid (business) time and transaction (version) time on the same table enables:
Repeatable results (having separate tables and corresponding updates could mean getting different results for the same query on two different days)
Comparable results (can answer questions such as "what was the value of X, as we knew it at time Y?")
Rapid results (only dealing with a single table, updated in a transparent and easy-to-query way).
In this sense it is an appropriate structure to use on many, if not all tables in a DWh.
UPDATE 2020: A bitemporal data transform for SAS (both SAS 9 and Viya) is available with Data Controller for SAS. A demo version is available: https://docs.datacontroller.io/dcc-tables/#var_busfrom-var_busto
I think your question raises more issues but it all comes down to how much is enough.
I developed a Bi_Temporal SQL Server engine that supports object versioning and relationship by time as well as all the other beautiful parts of Temporal DB's.
This was because the project needed to be able to be rewound to a place in time and see everything as it was at that time.
I mean everything including data, relationships and User access.
It was the most complex thing I have built but in the end it was so complex no-one else could maintain it, or understand what was happening.
So there was a real world use case and a deliverable.
Is not everyones cup of tea as you have to be able to think in time dimension as well as object version changes as all db's do.
Hope this helps someone. I know the post is old but as it was the first I found when searching Temporal DB's it might be of interest to someone.

How to share data across an organization

What are some good ways for an organization to share key data across many deparments and applications?
To give an example, let's say there is one primary application and database to manage customer data. There are ten other applications and databases in the organization that read that data and relate it to their own data. Currently this data sharing is done through a mixture of database (DB) links, materialized views, triggers, staging tables, re-keying information, web services, etc.
Are there any other good approaches for sharing data? And, how do your approaches compare to the ones above with respect to concerns like:
duplicate data
error prone data synchronization processes
tight vs. loose coupling (reducing dependencies/fragility/test coordination)
architectural simplification
security
performance
well-defined interfaces
other relevant concerns?
Keep in mind that the shared customer data is used in many ways, from simple, single record queries to complex, multi-predicate, multi-sort, joins with other organization data stored in different databases.
Thanks for your suggestions and advice...
I'm sure you saw this coming, "It Depends".
It depends on everything. And the solution to sharing Customer data for department A may be completely different for sharing Customer data with department B.
My favorite concept that has risen up over the years is the concept of "Eventual Consistency". The term came from Amazon talking about distributed systems.
The premise is that while the state of data across a distributed enterprise may not be perfectly consistent now, it "eventually" will be.
For example, when a customer record gets updated on system A, system B's customer data is now stale and not matching. But, "eventually", the record from A will be sent to B through some process. So, eventually, the two instances will match.
When you work with a single system, you don't have "EC", rather you have instant updates, a single "source of truth", and, typically, a locking mechanism to handle race conditions and conflicts.
The more able your operations are able to work with "EC" data, the easier it is to separate these systems. A simple example is a Data Warehouse used by sales. They use the DW to run their daily reports, but they don't run their reports until the early morning, and they always look at "yesterdays" (or earlier) data. So there's no real time need for the DW to be perfectly consistent with the daily operations system. It's perfectly acceptable for a process to run at, say, close of business and move over the days transactions and activities en masse in a large, single update operation.
You can see how this requirement can solve a lot of issues. There's no contention for the transactional data, no worries that some reports data is going to change in the middle of accumulating the statistic because the report made two separate queries to the live database. No need to for the high detail chatter to suck up network and cpu processing, etc. during the day.
Now, that's an extreme, simplified, and very coarse example of EC.
But consider a large system like Google. As a consumer of Search, we have no idea when or how long it takes for a search result that Google harvests to how up on a search page. 1ms? 1s? 10s? 10hrs? It's easy to imaging how if you're hitting Googles West Coast servers, you may very well get a different search result than if you hit their East Coast servers. At no point are these two instances completely consistent. But by large measure, they are mostly consistent. And for their use case, their consumers aren't really affected by the lag and delay.
Consider email. A wants to send message to B, but in the process the message is routed through system C, D, and E. Each system accepts the message, assume complete responsibility for it, and then hands it off to another. The sender sees the email go on its way. The receiver doesn't really miss it because they don't necessarily know its coming. So, there is a big window of time that it can take for that message to move through the system without anyone concerned knowing or caring about how fast it is.
On the other hand, A could have been on the phone with B. "I just sent it, did you get it yet? Now? Now? Get it now?"
Thus, there is some kind of underlying, implied level of performance and response. In the end, "eventually", A's outbox matches B inbox.
These delays, the acceptance of stale data, whether its a day old or 1-5s old, are what control the ultimate coupling of your systems. The looser this requirement, the looser the coupling, and the more flexibility you have at your disposal in terms of design.
This is true down to the cores in your CPU. Modern, multi core, multi-threaded applications running on the same system, can have different views of the "same" data, only microseconds out of date. If your code can work correctly with data potentially inconsistent with each other, then happy day, it zips along. If not you need to pay special attention to ensure your data is completely consistent, using techniques like volatile memory qualifies, or locking constructs, etc. All of which, in their way, cost performance.
So, this is the base consideration. All of the other decisions start here. Answering this can tell you how to partition applications across machines, what resources are shared, and how they are shared. What protocols and techniques are available to move the data, and how much it will cost in terms of processing to perform the transfer. Replication, load balancing, data shares, etc. etc. All based on this concept.
Edit, in response to first comment.
Correct, exactly. The game here, for example, if B can't change customer data, then what is the harm with changed customer data? Can you "risk" it being out of date for a short time? Perhaps your customer data comes in slowly enough that you can replicate it from A to B immediately. Say the change is put on a queue that, because of low volume, gets picked up readily (< 1s), but even still it would be "out of transaction" with the original change, and so there's a small window where A would have data that B does not.
Now the mind really starts spinning. What happens during that 1s of "lag", whats the worst possible scenario. And can you engineer around it? If you can engineer around a 1s lag, you may be able to engineer around a 5s, 1m, or even longer lag. How much of the customer data do you actually use on B? Maybe B is a system designed to facilitate order picking from inventory. Hard to imagine anything more being necessary than simply a Customer ID and perhaps a name. Just something to grossly identify who the order is for while it's being assembled.
The picking system doesn't necessarily need to print out all of the customer information until the very end of the picking process, and by then the order may have moved on to another system that perhaps is more current with, especially, shipping information, so in the end the picking system doesn't need hardly any customer data at all. In fact, you could EMBED and denormalize the customer information within the picking order, so there's no need or expectation of synchronizing later. As long as the Customer ID is correct (which will never change anyway) and the name (which changes so rarely it's not worth discussing), that's the only real reference you need, and all of your pick slips are perfectly accurate at the time of creation.
The trick is the mindset, of breaking the systems up and focusing on the essential data that's necessary for the task. Data you don't need doesn't need to be replicated or synchronized. Folks chafe at things like denormalization and data reduction, especially when they're from the relational data modeling world. And with good reason, it should be considered with caution. But once you go distributed, you have implicitly denormalized. Heck, you're copying it wholesale now. So, you may as well be smarter about it.
All this can mitigated through solid procedures and thorough understanding of workflow. Identify the risks and work up policy and procedures to handle them.
But the hard part is breaking the chain to the central DB at the beginning, and instructing folks that they can't "have it all" like they may expect when you have a single, central, perfect store of information.
This is definitely not a comprehensive reply. Sorry, for my long post and I hope it adds to thoughts that would be presented here.
I have a few observations on some of the aspect that you mentioned.
duplicate data
It has been my experience that this is usually a side effect of departmentalization or specialization. A department pioneers collection of certain data that is seen as useful by other specialized groups. Since they don't have unique access to this data as it is intermingled with other data collection, in order to utilize it, they too start collecting / storing the data, inherently making it duplicate. This issue never goes away and just like there is a continuos effort in refactoring code and removing duplication, there is a need to continuously bring duplicate data for centralized access, storage and modification.
well-defined interfaces
Most interfaces are defined with good intention keeping other constraints in mind. However, we simply have a habit of growing out of the constraints placed by previously defined interfaces. Again a case for continuos refactoring.
tight coupling vs loose coupling
If any thing, most software is plagued by this issue. The tight coupling is usually a result of expedient solution given the constraint of time we face. Loose coupling incurs a certain degree of complexity which we dislike when we want to get things done. The web services mantra has been going rounds for a number of years and I am yet to see a good example of solution that completely alleviates the point
architectural simplification
To me this is the key to fighting all the issues you have mentioned in your question. SIP vs H.323 VoIP story comes into my mind. SIP is very simplified, easy to build while H.323 like a typical telecom standard tried to envisage every issue on the planet about VoIP and provide a solution for it. End result, SIP grew much more quickly. It is a pain to be H.323 compliant solution. In fact, H.323 compliance is a mega buck industry.
On a few architectural fads that I have grown up to.
Over years, I have started to like REST architecture for it's simplicity. It provides a simple unique access to data and easy to build applications around it. I have seen enterprise solution suffer more from duplication, isolation and access of data than any other issue like performance etc. REST to me provides a panacea to some of those ills.
To solve a number of those issues, I like the concept of central "Data Hubs". A Data Hub represents a "single source of truth" for a particular entity, but only stores IDs, no information like names etc. In fact, it only stores ID maps - for example, these map the Customer ID in system A, to the Client Number from system B, and to the Customer Number in system C. Interfaces between the systems use the hub to know how to relate information in one system to the other.
It's like a central translation; instead of having to write specific code for mapping from A->B, A->C, and B->C, with its attendance exponential increase as you add more systems, you only need to convert to/from the hub: A->Hub, B->Hub, C->Hub, D->Hub, etc.

Should we start with multiple small-grained databases for an app that may scale massively

We're developing a new eCommerce website and are using NHibernate for the first time. At present we are splitting our data into multiple SQL Server databases, divided per area of functionality. So we have one for UserInfo, one for Orders, one for ProductCatalogue and so on...
Our justification for this decision is twofold really:
the website has the potential to be HUGE (it is a new website for one of the largest online brands in the UK) and we feel that by partitioning our data along functional lines we will be able to move the databases onto their own servers which would give us an easy scaling route should we need it;
my team has always worked this way - partly as a consequence of following the MS Commerce Server pattern from previous projects.
However, reading up on this decision on the internet, we find that the normal response to this sort of model is extremely scathing. "Creating more work for the devs now in order to create more work for the devs later" is one sample comment from Stack Overflow!
In addition, NHibernate is much easier to use with only one database (just one SessionFactory needed). And knowing that Stack Overflow ran off just one box for a long time makes me think that maybe we should not try to be so clever.
So, my question is, "are we correct in thinking that using fine-grained databases might increase our ability to scale or should we sacrifice this for easier development"?
Why don't you just design your database properly and put the files on appropriate disk? Use a cluster if necessary. Creating multiple databases is not an inherently scaling solution. Also - cross database referential integrity? Good luck.
What's your definition of "HUGE"? SQL Server can handle massive databases, but one thing I've learnt is that people often have no idea what constitutes a lot of data.
I've never worked in a project like this. I'm used to databases with several hundred tables, which had never been a problem.
Therefore I can't say if your idea is a good idea, I never tried it. The "my team has always worked this way"-argument is a major driver for many decisions, and I can't even say that it is always wrong.
With NHibernate you organize your data in classes. They can be in different namespaces and assemblies. You usually don't work much with the database directly, you don't need this kind of structure there.
About the scalability argument: I'm not sure if it is really scaling well when you need to access several databases every time. I mean: you always need users and orders and probably more. Then you need to get all this data from several databases.
Agree fully with starskythehutch - keep your related tables together in the same DB. BUT, you may want to consider having separate databases for things that are not related or non-critical to your main product; but that are a part of the app.
For eg: if you decide to log every visit/hit to the site in a DB, you should probably keep that in a separate DB.
The reason you should consider:
1. huge number of transactions - say hundreds of thousands / sec. Having non-critical un-related stuff in a separate DB will ensure that tlog contentions because of this are avoided.
Restore, DBCC CHECKDB, backup times. If you stuff your non-related non-critical stuff in your main DB, you are essentially increasing the size of your DB and it will affect these operations. Having it in separate DB will help you improve performance of these operations.

How many tables/sprocs/functions in a database is too many?

I'm interested in database refactoring. I deal with several databases that don't have a large amount of data, just a few GB with at most a few hundred thousand rows. However, they have hundreds -- sometimes many hundreds -- of tables, views, sprocs and functions. In some places a divide-and-rule strategy using schemas has been implemented which has helped some problems of seeing ownership/usage of tables. However, it hasn't really helped object coupling.
We all read that integration via shared database isn't A Good Thing, but we also know that it is, at least for a while , a very productive thing as everything is in the database. We just don't apply the Single Responsibility Principle to databases like we do to objects.
Edit: I should add that I have no database performance issues. The tables are not large, the biggest has only a few hundred thousand rows. There is no real database performance issue; except when the database schema/logic/implementation is grotesquely inefficient (say requiring a cursor to do a sproc execution for each row in a result set in order to pre-process data for a report). Before you say I should change these, that is the whole point: I can't because the database is no longer in a state where the impact of changes can be assessed.
Clearly at some point you say "Enough!" and divide into multiple databases connected by messages, ETL, application tiers etc etc
The question is: how many is too many? What is the absolute upper limit of the number of sprocs/tables/functions that you can have before you go insane?
First, stop trying to think of databases in object oriented terms. Principles of object oriented programming simply do NOT apply to relational databases.
Shared databases are a very good thing from a business perspective. Multiple databases storing information that has to be transferred between them quickly becomes way more complex than your piddly many hundreds of objects. Data that is consistent between enterprise applications is priceless. Trying to reconcile if GE Corp and General Electric Corporation are really the same entity between two databases can be a nightmare.
Refactoring datbases is a nice goal, but it is very complex in reality. Don't do it unless you have a major performance issue that needs to be addressed or unless you are willing to commit to a process of identifying all the code that might be affected by a change. Even then, consider if you can know all the code that might change (this is one reason why database people hate, hate, hate dynamic code!).
Often the best way to refactor is to add your change and start changing over to using your new field, sp etc while leaving the old one in place until a set expiration date. Since you are on an annual cycle, you will need to manage those dates over a long period of time. To see if sps are being used, you can identify the ones you aren't sure of and add some code to them to insert to a table everytime they are run. If after your whole year cycle, they haven't been run, you can safely eliminate them. The cycle may be shorter depending on the sp.
If I'm writing something that will only be run annually, I would normally put the word annual in the sp name. But that may not be true where you are, however, the function of the sp should give you an idea if it is something that should only be run periodically. I wouldn't expect usp_send email proc to only run once a year but I might expect that a usp_attendance_report might not be run often. Of course as I said, I would have named it something more like usp_annual_attendance_report and you can consider doing that sort of thing moving forward.
But be aware that any refactoring you do will have to take place on a long cycle to ensure that you don't delete something you need. If your code is in a source control system (and all database tables, sp, views, UDFs, triggers, etc should be), you can probably eliminate some things knowing that if they fail you can pretty instantly put them back. Again, I'd examine the object to determine the possible risk eliminating them would have.
Of course if you have good automated tests in place, eliminating something on dev and running the tests can help you find out if something is still being referenced.
If you are looking for an easy way to refactor, I don't know of one. Refactoring databses is a time-consuming, risky activity and one which may not show enough improvement for the powers that be to be willing to pay for it.
A good book on refactoring databases is:http://www.amazon.com/Refactoring-Databases-Evolutionary-Addison-Wesley-Signature/dp/0321293533
I'm not sure there is a magical limit for any of the things you mentioned. I prefer to keep things in one place so I don't have to remember that some records are in place and other records are in another.
I'd be more interested to know if all this work is impacting your performance? And if it's not then why change it? Unless it's impacting performance in some horrible way your customers won't see any benefit from your work and then what's the point?
Your customers might be better served if you just bought a new machine or upgraded your database server software.

Users asking for denormalized database

I am in the early stages of developing a database-driven system and the largest part of the system revolves around an inheritance type of relationship. There is a parent entity with about 10 columns and there will be about 10 child entities inheriting from the parent. Each child entity will have about 10 columns. I thought it made sense to give the parent entity its own table and give each of the children their own tables - a table-per-subclass structure.
Today, my users requested to see the structure of the system I created. They balked at the idea of the table-per-subclass structure. They would prefer one big ~100 column table because it would be easier for them to perform their own custom queries.
Should I consider denormalizing the database for the sake of the users?
Absolutely not. You can always create a view later to show them what they want to see.
They are effectively asking for a report.
You could give them access to a view containing all the fields they require... that way you don't mess up your data model.
No. Structure the data properly and if the users need the a denormalized view of the data create it as a VIEW in the database.
Alternatively, consider that perhaps an RDBMS is not the appropriate storage tool for this project.
They are the users and not the programmers of the system for a reason. Provide a separate interface for their queries. Power users like this can both be helpful and a pain to deal with. Just explain you need the database designed a certain way so you can do your job, period. Once that is accomplished you and provide other means to make querying easier.
What do they know!? You could argue that users shouldn't even be having direct access to a database in the first place.
Doing that leaves you open to massive performance issues, just because a couple of users are running ridiculous queries.
How about if you created a VIEW in the format your users wanted while still maintaining a properly normalized table?
Aside from a lot of the technical reasons for or against your users' proposition, you need to be on same page in communicating the consequences of various scenarious and (more importantly) the costs of those consequences. If the users are your clients and they are paying you to do a job, explain that their awful "proposed" ideas may cost them more money in development time, additional hardware resources, etc.
Hopefully you can explain it in such a way that shows your expertise and why your idea is a much better value to your users in the long run.
As everyone more or less mentioned, that way lies madness, and you can always build a view.
If you just can't get them to come around on this point, consider showing them this thread and the number of pros who weighed in saying that the users are meddling with things that they don't fully understand, and the impact will be an undermined foundation.
A big part of the developer's craft is the feel for what won't work out long term, and the rules of normalization are almost canonical in that respect. There are situations where you need to denormalize (data warehouses, etc) but this doesn't sound like one of them!
It also sounds as though you may have a particularly troubling brand of user on your hand -- the amatuer developer who thinks they could do your job better themselves if only they had the time. This may or may not help, but I've found that those types respond well to presentation -- a few times now I've found that if I dress sharp and show a little bit of force in my personality, it helps them feel like I'm an expert and prevents a bunch of problems before they start.
I would strongly recommend coming up with an answer that doesn't involve someone running direct reports against your database. The moment that happens, your DB structure is set in stone and you can basically consider it legacy.
A view is a good start, but later on you'll probably want to structure this as an export, to decouple further. Of course, then you'll encounter someone who wants "real time" data. Proper business analysis usually reveals this to be unnecessary. Actual real time requirements are not best handled through reporting systems.
Just to be clear: I'd personally favour the table per subclass approach, but I don't think it's actually as big an issue as the direct reporting off transaction tables is going to be.
I would opt for a view (as others have suggested) or an inline table-valued function (the benefits of this is you require parameters - like an date range or a customer account - which can help to stop users from querying without any limits on the problem space) first. An inline TVF is really a parametrized view and is far closer to a view in terms of how the engine treats them than it is to a multi-statement table valued function or a scalar function, which can perform incredibly poorly.
However, in some cases, this can impact production performance if the view is complex or intensive. With poorly written ad hoc user queries, it can also cause locks to persist longer or be escalated further than they would on a better built query. It is also possible for users to misinterpret an E-R data model and produce multiplied numbers in cases where there are many-to-one or many-to-many relationships. The next option might be to materialize these views with indexes or make tables and keep them updated, which gets us closer to my next option...
So, given those drawbacks of the view option and already thinking of mitigating it by starting to make copies of data, the next option I would consider is to have a separate read-only (for these users) version of the data which is structured differently. Typically, I would first look at a Kimball-style star schema. You do not need to have a full-fledged time-consistent data warehouse. Of course, that's an option, but you could simply keep a reporting model up to date with data. Star-schemas are a special form of denormalization and are particularly good for numerical reporting, and a given star should not be able to be abused by users accidentally. You can keep the star up to date in a number of ways, including triggers, scheduled jobs, etc. They can be very fast for reporting needs and run on the same production installation - perhaps on a separate instance if not just a separate database.
Although such a solution may require you to effectively more than double your storage requirements, when compared with other practices it might be a really good option if you understand your data well and don't mind having two models - one for transactions and one for analysis (note that you will already start to have this logical separation anyway with the use of a the simplest first option of view).
Some architects will often double their servers and use the SAME model with some kind of replication in order to provide a reporting server which is indexed more heavily or differently. Such a second server doesn't impact production transactions with reporting requirements and can be kept up to date fairly easily. There will only be one model, but of course, this has the same usability problems with allowing users ad hoc access to the underlying model only, without the performance affects, since they get their own playground.
There are a lot of ways to skin these cats. Good luck.
The customer is always right. However, the customer is likely to back down when you convert their requirement into dollars and cents. A 100 column table will require extra dev time to write the code that does what the database would do automatically with the proper implementation. Further, their support costs will be higher since more code means more problems and lower ease of debugging.
I'm going to play devil's advocate here and say that both solutions sound like poor approximations of the actual data. There's a reason that object-oriented programming languages don't tend to be implemented with either of these data models, and it's not because Codd's 1970 ideas about relations were the ideal system for storing and querying object-oriented data structures. :-)
Remember that SQL was originally designed as a user interface language (that's why it looks vaguely like English and not at all like other languages of that era: Algol, C, APL, Prolog). The only reasons I've heard for not exposing a SQL database to users today are security (they could take down the server!) and usability (who wants to write SQL when you can clicky clicky?), but if it's their server and they want to, then why not let them?
Given that "the largest part of the system revolves around an inheritance type of relationship", then I'd seriously consider a database that lets me represent that natively, either Postgres (if SQL is important) or a native object database (which are awesome to work with, if you don't need SQL compatibility).
Finally, remember that every engineering decision is a tradeoff. By "sticking to your guns" (as somebody else proposed), you're implicitly saying the value of your users' desires are zero. Don't ask SO for a correct answer to this, because we don't know what your users want to do with your data (or even what your data is, or who your users are). Go tell them why you want a many-tables solution, and then work out a solution with them that's acceptable to both of you.
You've implemented Class Table Inheritance and they're asking for Single Table Inheritance. Both designs are valid in certain situations.
You might want to get a copy of Martin Fowler's Patterns of Enterprise Application Architecture to read more about the advantages and disadvantages of each design. That book is a classic reference to have on your bookshelf, in any case.

Resources