Loading presentation models directly from database - winforms

I'm working on a 2-tier application where WinForms client makes direct calls to the database. In one of the scenarios I need to display a list of Customer entities to a user. The problem is that Customer entity contains a lot of properties (some quite heavy) and I need only two of them - first and last names. So to improve performance and make presentation logic clearer I want to create some kind of a CustomerSummaryViewModel class with required properties only and use NHibernate's projections feature to load it. My concern here is that in this case my data access logic becomes coupled with presentation and to me it seems conceptually wrong.
Do you think this is ok or there are better solutions?

I think you can consider the CustomerSummaryViewModel as report (CustomerSummaryReport). It is fine to query your entities for scenario's like this and treat them as reports. Most reports are more complex, using multiple entities and aggregate queries. This report is very simple, but you can still use it like a report.
You also mention that the performance is significant. That is another reason to use a separate reporting query and DTO. The customer entity sounds like one of the "main" entities you use. That it takes a significant amount of time to retrieve them from the database with lazy-loaded properties not initialized, can be a warning to optimize the customer entity itself, instead using reporting queries to retrieve information about them. Just a warning because I have seen cases where this was needed.
By the way, you can consider linq instead of projections for the easier syntax like:
var reports = session.Linq<Customer>()
.Where(condition)
.Select(customer => new Report
{
FirstName = customer.FirstName,
LastName = customer.LastName
})
.ToList();

Related

Is this a "correct" database design?

I'm working with the new version of a third party application. In this version, the database structure is changed, they say "to improve performance".
The old version of the DB had a general structure like this:
TABLE ENTITY
(
ENTITY_ID,
STANDARD_PROPERTY_1,
STANDARD_PROPERTY_2,
STANDARD_PROPERTY_3,
...
)
TABLE ENTITY_PROPERTIES
(
ENTITY_ID,
PROPERTY_KEY,
PROPERTY_VALUE
)
so we had a main table with fields for the basic properties and a separate table to manage custom properties added by user.
The new version of the DB insted has a structure like this:
TABLE ENTITY
(
ENTITY_ID,
STANDARD_PROPERTY_1,
STANDARD_PROPERTY_2,
STANDARD_PROPERTY_3,
...
)
TABLE ENTITY_PROPERTIES_n
(
ENTITY_ID_n,
CUSTOM_PROPERTY_1,
CUSTOM_PROPERTY_2,
CUSTOM_PROPERTY_3,
...
)
So, now when the user add a custom property, a new column is added to the current ENTITY_PROPERTY table until the max number of columns (managed by application) is reached, then a new table is created.
So, my question is: Is this a correct way to design a DB structure? Is this the only way to "increase performances"? The old structure required many join or sub-select, but this structute don't seems to me very smart (or even correct)...
I have seen this done before on the assumed (often unproven) "expense" of joining - it is basically turning a row-heavy data table into a column-heavy table. They ran into their own limitation, as you imply, by creating new tables when they run out of columns.
I completely disagree with it.
Personally, I would stick with the old structure and re-evaluate the performance issues. That isn't to say the old way is the correct way, it is just marginally better than the "improvement" in my opinion, and removes the need to do large scale re-engineering of database tables and DAL code.
These tables strike me as largely static... caching would be an even better performance improvement without mutilating the database and one I would look at doing first. Do the "expensive" fetch once and stick it in memory somewhere, then forget about your troubles (note, I am making light of the need to manage the Cache, but static data is one of the easiest to manage).
Or, wait for the day you run into the maximum number of tables per database :-)
Others have suggested completely different stores. This is a perfectly viable possibility and if I didn't have an existing database structure I would be considering it too. That said, I see no reason why this structure can't fit into an RDBMS. I have seen it done on almost all large scale apps I have worked on. Interestingly enough, they all went down a similar route and all were mostly "successful" implementations.
No, it's not. It's terrible.
until the max number of column (handled by application) is reached,
then a new table is created.
This sentence says it all. Under no circumstance should an application dynamically create tables. The "old" approach isn't ideal either, but since you have the requirement to let users add custom properties, it has to be like this.
Consider this:
You lose all type-safety as you have to store all values in the column "PROPERTY_VALUE"
Depending on your users, you could have them change the schema beforehand and then let them run some kind of database update batch job, so at least all the properties would be declared in the right datatype. Also, you could lose the entity_id/key thing.
Check out this: http://en.wikipedia.org/wiki/Inner-platform_effect. This certainly reeks of it
Maybe a RDBMS isn't the right thing for your app. Consider using a key/value based store like MongoDB or another NoSQL database. (http://nosql-database.org/)
From what I know of databases (but I'm certainly not the most experienced), it seems quite a bad idea to do that in your database. If you already know how many max custom properties a user might have, I'd say you'd better set the table number of columns to that value.
Then again, I'm not an expert, but making new columns on the fly isn't the kind of operations databases like. It's gonna bring you more trouble than anything.
If I were you, I'd either fix the number of custom properties, or stick with the old system.
I believe creating a new table for each entity to store properties is a bad design as you could end up bulking the database with tables. The only pro to applying the second method would be that you are not traversing through all of the redundant rows that do not apply to the Entity selected. However using indexes on your database on the original ENTITY_PROPERTIES table could help greatly with performance.
I would personally stick with your initial design, apply indexes and let the database engine determine the best methods for selecting the data rather than separating each entity property into a new table.
There is no "correct" way to design a database - I'm not aware of a universally recognized set of standards other than the famous "normal form" theory; many database designs ignore this standard for performance reasons.
There are ways of evaluating database designs though - performance, maintainability, intelligibility, etc. Quite often, you have to trade these against each other; that's what your change seems to be doing - trading maintainability and intelligibility against performance.
So, the best way to find out if that was a good trade off is to see if the performance gains have materialized. The best way to find that out is to create the proposed schema, load it with a representative dataset, and write queries you will need to run in production.
I'm guessing that the new design will not be perceivably faster for queries like "find STANDARD_PROPERTY_1 from entity where STANDARD_PROPERTY_1 = 'banana'.
I'm guessing it will not be perceivably faster when retrieving all properties for a given entity; in fact it might be slightly slower, because instead of a single join to ENTITY_PROPERTIES, the new design requires joins to several tables. You will be returning "sparse" results - presumably, not all entities will have values in the property_n columns in all ENTITY_PROPERTIES_n tables.
Where the new design may be significantly faster is when you need a compound where clause on custom properties. For instance, finding an entity where custom property 1 is true, custom property 2 is banana, and custom property 3 is not in ('kylie', 'pussycat dolls', 'giraffe') is e`(probably) faster when you can specify columns in the ENTITY_PROPERTIES_n tables instead of rows in the ENTITY_PROPERTIES table. Probably.
As for maintainability - yuck. Your database access code now needs to be far smarter, knowing which table holds which property, and how many columns are too many. The likelihood of entertaining bugs is high - there are more moving parts, and I can't think of any obvious unit tests to make sure that the database access logic is working.
Intelligibility is another concern - this solution is not in most developers' toolbox, it's not an industry-standard pattern. The old solution is pretty widely known - commonly referred to as "entity-attribute-value". This becomes a major issue on long-lived projects where you can't guarantee that the original development team will hang around.

Modelling a product catalog model in Grails

I'm investigating a personal Grails project and want to put together a domain model to represent a product catalog. I really can't decide the best way to go about it. I will have a number of different product categories although many categories will just have a base set of properties that are shared across all categories (e.g. product name, product description, price etc). However, some products will have additional properties specific to their category.
I've looked into the Entity Attribute Value (EAV) Model technique that provides a very extensible solution. And, I've considered the route of using an explicit OO inheritance model where I have sub-classes of a base Product class to represent any product that has additional properties.
Obviously, the second approach is less extensible - to add a new product category would require a new entity and likely a custom view/editor for the front-end. However, as a developer, I think the programming model is significantly clearer and much more logical to code against.
The EAV approach would allow dynamic extensibility but would lead to a more cryptic programming model and would have a performance overhead in the DB (complex table joins). Views/editors on the front end could be dynamically generated to include any number of the custom attributes for a product category - though I'm sure situations would arise where such dynamic generation wouldn't suffice from a usability perspective.
When I consider a framework like Grails, it would seem to make sense to go down the route of creating an explicit inheritance model. I'm not convinced a framework like Grails would fit the EAV approach so well - a lot of the benefits of Grails would be lost in the complexity. However, I'm not sure this approach would scale practically as the number of product categories increases.
I'd be really interested to hear of others' experience with this type of modelling challenge!
I’ve had a situation similar to this and went with the inheritance solution. Going into this I knew I’d never have more than about 10 classes so I wasn’t worried about exponential growth of complexity. Although you will need views and controllers for each class there are some things you can do to reduce code duplication. The first thing to do is to put all common view code in templates. For example if all your classes will have a price, name, and description the view code that will allow the displaying and editing of this should be put into templates. Instead of having duplicate lines of code in each view you can simply do a
<g:render template=”/baseView</g>render>
For more info on templates see http://www.grails.org/Tag+-+render
The second thing I found useful to do was move all shared controller code into a class and define closures that I could call from my actual controller. This got quite ugly since my save method would not only insure the fields of the base class were dealt with properly but would also have code for corner cases of the inherited classes. Looking back on this a better option may have been to define custom behavior as functions of the domain class that required it or to use a service. With that said putting code into closures that could be called from the controller was still helpful since it would allow me to have one line long controller bodies instead of 30 or 40. If I had to modify code dealing with the base class I could edit it where the closures were defined and that change would be reflected across all my controllers with no code change to the actual source file of the controller. This came in quite useful and allowed me to edit code in one place instead of editing duplicate code across 10 controllers.
Inheritance works fine with Hibernate and GORM. Consider using the table-per-subclass mapping as you cannot define NOT NULL constraints with the (default) table-per-hierarchy inheritance mapping.
You can also use composition for "not so" common, but shared, attributes.
"The" criteria for EAV is, do you need to introduce new attributes without changing the data model?
In practice, applications like yours use a combination of inheritance and EAV.
You're concerned about performance when querying JOINed tables. That's normally not an issue if you index the columns that are included in the SQL WHERE statement.
(GORM/Hibernate will automatically create foreign keys, which are important as well.) (Given, the necessary indexes are in place and a DBMS that provides a decent query optimizer (i.e., PostgreSQL oder SQL Server - maybe not MySQL), you can select from millions of records using 10 joins in 50 milliseconds or less.)
Finally, there's been an excellent, recent, discussion on your issue.

How to divide an Entity with hundreds of fields?

I'd like suggestions for the design of a CRUD business app using Silverlight 4, the Business Application Template, WCF RIA Services and the Entity Framework 4. The app tracks lab test results performed on material samples. It replaces a (difficult to maintain) existing web application. Lab tests results are stored in two "SampleData" tables made up of hundreds of fields. The tables have a one to one relationship. I combined the two tables into one using Entity Framework's Table Per Type Inheritance which I'm very happy with. Note: I've decided not to change the database design to avoid destroying the existing application, but it was considered.
My dilemma is how to break up this huge table. Each record represents a material sample that is tested. The logical grouping of fields is by lab test. I envision my UI having multiple tabs or separate pages - one for each test. The problem at this point is that I'm sucking in ALL the fields yet only displaying a few in a paged DataGrid and there is a noticeable delay. Instead of one giant entity it might be nice to have several "Lab Test" entities (each representing a type of test) that are sub-sets of my one giant TPT Inheritance table. How would I do this? The base SampleData table/entity contains header fields plus several child test results fields. The second derived table/entity contains more test result fields linked to the base by SampleID. If split up I'd need to maintain the header info with each Lab Test entity.
I'm willing to stick with one giant table/entity (despite a slight performance penalty). Still, I'm wondering the best way to create my UI with this one entity. Can a DataForm be tabbed? If I make a dashboard with links to lab tests how do I keep header info in sync with each test page?
I know this is a broad question. I'm hoping to get suggestions on a good design path that will allow me to grow the app as new lab tests are added (making an even bigger entity). I'd hope to find a path that simplifies maintenance and takes advantage of the RAD experience Microsoft is promoting.
Thanks in advance!
I scanned the post discussing the database design and must say that based on what you said and the fact that you've already got users asking for more tests (repeating values) that I wish you'd reconsider the db redesign. You can create a flat view to simulate the existing flat samples-data table and use that to minimize breakage in the existing application.
But you've already made that decision, so how about reversing the situation? Instead of fixing the database, add code to the domain service that transforms the data from it's flat layout, leaving out all the null values.
One idea is to write a view that un-flattens the data and leaving out the null no-test situations. The query will raise eyebrows (I'll probably get flamed for this) because it looks nasty but in reality the DBMS does a fine job optimizing and performing the query (in Oracle anyway). I've had great results making a view something like::
create view programmer_exp_unflat as (
select programmer_id, 'C#', csharp_yrs from programmer_exp_flat where csharp_yrs is not null
union
select programmer_id, 'Java', java_yrs from programmer_exp_flat where java_yrs is not null
union
select programmer_id, 'Cobol', cobol_yrs from programmer_exp_flat where cobol_yrs is not null
.
repeat xx times) from dual
It's backwards and ugly no matter how you look at it but it reduces your result set to a bare minimum and no need to break things into categories. New test values require modification of the view, and depending on UI flexibility and business rules, might not require any changes.
It makes coding at the UI more difficult, as it would have been with the right database design in the first place, but your query result is reduced to only the tests that had been completed. If your users are flexible the UI could be designed to show the test results as a list making display a piece of cake. Your current design pretty much forces you to modify the UI and database with each and every new test.
These are the type challenges that make being a developer so much fun -- and why all the marketing gimmick sample CRUD applications that can be built in five minutes are worthless in the real world.
I'm answering (and accepting) my own question to increase my stack overflow accept rate, but my "answer" is that I have found no answer yet. Because I've had to move on with the project I continue to use one giant entity. I've also moved away from Silverlight and turned the project into a WPF app due to various struggles with Silverlight such as inherent asynchronous data access.

Is using a table inheritance a valid way to avoid using a join table?

I've been following a mostly DDD methodology for this project, so, like any DDD'er, I created my domain model classes first. My intention is to use these POCO's as my LINQ-to-SQL entities (yes, they're not pure POCO's, but I'm ok with that). I've started creating the database schema and external mapping XML file, but I'm running into some issues with modeling the entities' relationships and associations.
An artifact represents a document. Artifacts can be associated with either a Task or a Case. The Case entity looks like this:
public class Case
{
private EntitySet<Artifact> _Artifacts;
public IList<Artifact> Artifacts
{
get
{
return _Artifacts;
}
set
{
_Artifacts.Assign(value);
}
}
.
.
.
}
Since an Artifact can be associated with either a Case, or a Task, I've the option to use inheritance on the Artifact class to create CaseArtifact and TaskArtifact derived classes. The only difference between the two classes, however, would be the presence of a Case field or a Task field. In the database of course, I would have a single table, Artifact, with a type discriminator field and the CaseId and TaskId fields.
My question: is this a valid approach to solving this problem, or would creating a join table for each association (2 new tables, total) be a better approach?
I would probably go with two tables - it makes the referential integrity-PK/FKs a little simpler to handle in the database, since you won't have to have a complex constraint based on the selector column.
(to reply to your comment - I ran out of space so post here as an edit) My overall philosophy is that the database should be modelled with database best practices (protect your perimeter and ensure database consistency, using as much RI and constraints as possible, provide all access through SPs, log activity as necessary, control all modes of access, use triggers where necessary) and the object model should be modelled with OOP best practices to provide a powerful and consistent API. It's the job of your SPs/data-access layer to handle the impedance mismatch.
If you just persist a well-designed object model to a database, your database won't have much intrinsic value (difficult to data mine, report, warehouse, metadata vague, etc) when viewed without going through the lens of the object model - this is fine for some application, typically not for mine.
If you just mimic a well-designed database structure in your application, without providing a rich OO API, your application will be difficult to maintain and the internal strucutres will be awkward to deal with - typically very procedural, rigid and with a lot of code duplication.
I would consider finding commonalities in between case and task, for the lack of better word let's call it "CaseTask" and then sub-typing (inheriting) from that one. After that you attach document to the super-type.
UPDATE (after comment):
I would then consider something like this. Each document can be attached to several cases or tasks.

Database design help with varying schemas

I work for a billing service that uses some complicated mainframe-based billing software for it's core services. We have all kinds of codes we set up that are used for tracking things: payment codes, provider codes, write-off codes, etc... Each type of code has a completely different set of data items that control what the code does and how it behaves.
I am tasked with building a new system for tracking changes made to these codes. We want to know who requested what code, who/when it was reviewed, approved, and implemented, and what the exact setup looked like for that code. The current process only tracks two of the different types of code. This project will add immediate support for a third, with the goal of also making it easy to add additional code types into the same process at a later date. My design conundrum is that each code type has a different set of data that needs to be configured with it, of varying complexity. So I have a few choices available:
I could give each code type it's own table(s) and build them independently. Considering we only have three codes I'm concerned about at the moment, this would be simplest. However, this concept has already failed or I wouldn't be building a new system in the first place. It's also weak in that the code involved in writing generic source code at the presentation level to display request data for any code type (even those not yet implemented) is not trivial.
Build a db schema capable of storing the data points associated with each code type: not only values, but what type they are and how they should be displayed (dropdown list from an enum of some kind). I have a decent db schema for this started, but it just feels wrong: overly complicated to query and maintain, and it ultimately requires a custom query to view full data in nice tabular for for each code type anyway.
Storing the data points for each code request as xml. This greatly simplifies the database design and will hopefully make it easier to build the interface: just set up a schema for each code type. Then have code that validates requests to their schema, transforms a schema into display widgets and maps an actual request item onto the display. What this item lacks is how to handle changes to the schema.
My questions are: how would you do it? Am I missing any big design options? Any other pros/cons to those choices?
My current inclination is to go with the xml option. Given the schema updates are expected but extremely infrequent (probably less than one per code type per 18 months), should I just build it to assume the schema never changes, but so that I can easily add support for a changing schema later? What would that look like in SQL Server 2000 (we're moving to SQL Server 2005, but that won't be ready until after this project is supposed to be completed)?
[Update]:
One reason I'm thinking xml is that some of the data will be complex: nested/conditional data, enumerated drop down lists, etc. But I really don't need to query any of it. So I was thinking it would be easier to define this data in xml schemas.
However, le dorfier's point about introducing a whole new technology hit very close to home. We currently use very little xml anywhere. That's slowly changing, but at the moment this would look a little out of place.
I'm also not entirely sure how to build an input form from a schema, and then merge a record that matches that schema into the form in an elegant way. It will be very common to only store a partially-completed record and so I don't want to build the form from the record itself. That's a topic for a different question, though.
Based on all the comments so far Xml is still the leading candidate. Separate tables may be as good or better, but I have the feeling that my manager would see that as not different or generic enough compared to what we're currently doing.
There is no simple, generic solution to a complex, meticulous problem. You can't have both simple storage and simple app logic at the same time. Either the database structure must be complex, or else your app must be complex as it interprets the data.
I outline five solution to this general problem in "product table, many kind of product, each product have many parameters."
For your situation, I would lean toward Concrete Table Inheritance or Serialized LOB (the XML solution).
The reason that XML might be a good solution is that:
You don't need to use SQL to pick out individual fields; you're always going to display the whole form.
Your XML can annotate fields for data type, user interface control, etc.
But of course you need to add code to parse and validate the XML. You should use an XML schema to help with this. In which case you're just replacing one technology for enforcing data organization (RDBMS) with another (XML schema).
You could also use an RDF solution instead of an RDBMS. In RDF, metadata is queriable and extensible, and you can model entities with "facts" about them. For example:
Payment code XYZ contains attribute TradeCredit (Net-30, Net-60, etc.)
Attribute TradeCredit is of type CalendarInterval
Type CalendarInterval is displayed as a drop-down
.. and so on
Re your comments: Yeah, I am wary of any solution that uses XML. To paraphrase Jamie Zawinski:
Some people, when confronted with a problem, think "I know, I'll use XML." Now they have two problems.
Another solution would be to invent a little Domain-Specific Language to describe your forms. Use that to generate the user-interface. Then use the database only to store the values for form data instances.
Why do you say "this concept has already failed or I wouldn't be building a new system in the first place"? Is it because you suspect there must be a scheme for handling them in common?
Else I'd say to continue the existing philosophy, and establish additional tables. At least it would be sharing an existing pattern and maintaining some consistency in that respect.
Do a web search on "generalized specialized relational modeling". You'll find articles on how to set up tables that store the attributes of each kind of code, and the attributes common to all codes.
If you’re interested in object modeling, just search on “generalized specialized object modeling”.

Resources