I really like CodeIgniter's Active Record and how nicely it allows all my needed database queries.
But I've also been reading about ORMs like Doctrine. When I read Doctrine's documentation, it does not seem as clear to use as Active Record, and I can't see what makes it better (if it is).
What does Doctrine allow that is not possible with Active Record? Does Doctrine make the same job faster, easier, better? Or does it do things Active Record cannot do?
Best would be if people could post examples of tasks showing what we're talking about.
Thanks,
Matthew
First of all, what Doctrine are you talking about, 1 or 2 ?
There is a huge difference. The only thing that the two have in common is that they are both full-fledged ORM-s. Otherwise there really isn't any connection between the two.
Doctrine 1 is based on ActiveRecords, Doctrine 2 is based on Data mapper pattern.
Both can do same things, but there are some significant differences between the two.
Generally speaking Data mapper is less "developer-friendly" but should have better performance. Why? Actually it is pretty simple. With active records each entity knows everything "around" itself, relation with other entities etc. With data mapper, entities are dumb and lightweight, there is a central entity (EntityManager/UnitOfWork in Doctrine2) which handles all the relation mapping. So in terms of memory usage and performance Data mapper should be faster.
The Doctrine guys say that Doctrine2 is a least 50% faster that Doctrine1 (there are other differences too, not just the design pattern).
If you feel up for it, you can even implement ActiveRecords over Doctrine2 data mapper. Look at this blog post. I'm using this approach just for the development phase, to keep as little code as possible. Once it gets into production I will kill the additional ActiveRecords layer, and rollback to the default data mapper of Doctrine2.
So the conclusion is that you can do everything with both, but in the same way you could say that you can do everything with raw SQL. If you are a beginner in the ORM world, I would suggest going with ActiveRecords, because it is simple and (usually) requires less code. On the other hand, if you are building a large, complex model, I think data mapper is the better option.
Maybe I got something wrong, but this is how I understood it.
As for the comparison between CodeIgniters ActiveRecords and Doctrine (1 or 2), I can't really say, because I never used CodeIgniter. One thing I am sure of, Doctrine has a lot more features than CodeIgniters default ORM. For example: result hydration, inheritance (single table, class table), prefetching, lazy loading, extra lazy loading, extensions, behaviours, optimization, proxies, datetime handling... It is a massive and full-fledged ORM with a lot of features, while my experience with any "default framework ORM" is that their main goal is to be simple as possible, so a newbie can get a hang of it very easily. Doctrine is a mighty beast, and for sure can do a lot of things in a more efficient and/or logically more correct way than the built in CodeIgniter ORM. The downside is, that it takes more time to learn and code, and it is a huge library, with thousands of files, so just to get everything running adds some overhead compared to a lighter alternative.
Doctrine is a full-fledged ORM that implements the active record pattern. CodeIgniter's active record class is a query builder/database wrapper that is based on a "modified" version of the pattern.
Disclaimer: I have never used Doctrine. I will try my best to illustrate the differences between CodeIgniter's active record implementation and Doctrine, based on my understanding.
Using CodeIgniter's active record class, you might implement a model like this:
class User_model extends CI_Model
{
public function get_user_by_username($username)
{
// Build query using active record methods
$this->db->where('username', $username);
$this->db->where('active', 1);
// Execute query
$query = $this->db->get('users');
// Return results
return $query->result();
}
// ...
}
You are basically building the query using the active record methods. It's easy to see how each method (where(), get(), etc) maps to raw SQL. The advantage to using the active record methods as opposed to just $this->db->query() is that CodeIgniter compiles each query based on the database driver you are using. Other than that, CodeIgniter's active record implementation doesn't really do much. Any queries you need, you'll need to create. I hope I've illustrated how the active record methods are similar to a query builder.
Note that that the following sample code may be incorrect. Using Doctrine, you might have a model like this:
/** #Entity */
class User
{
/** #Column(type="integer") */
private $id;
/** #Column(length=50) */
private $username;
// ...
}
Then to use the model and the associated active record functionality, you would do something like this:
// Instantiate object
$user = new User();
// Set properties
$user->username = 'some_username';
// Save object
$user->save();
// Access properties
echo $user->id;
This is just scratching the surface in terms of what Doctrine can do. You can set default values for properties or specify relationships between tables. Notice how I didn't write any SQL or build the query. I just set the properties of the object and then saved it. Doctrine takes care of the rest.
Note that Doctrine includes its own query builder, so in a way it does what CodeIgniter's active record does, and more.
Using Doctrine is similar to CakePHP's or Ruby on Rails' implementation of the active record pattern. You could take a look there for additional insight into the pattern. CakePHP's examples might be particularly easy to digest if you're coming from a CodeIgniter background.
To answer some of your other questions, I don't think there's anything that makes Doctrine better than the CodeIgniter active record methods. It may be more advanced, but like any other library, you want to pick the best tool for the job. If you are happy with CodeIgniter's active record methods and you see no need for an advanced ORM, then skip it.
Related
TL;DR
I have architecture issue which boils down to filtering entities by predefined set of common filters. Input is: set of products. Each product has details. I need to design filtering engine so that I can (easily and fast) resolve a task:
"Filter out collection of products with specified details"
Requirements
User may specify whatever filtering is possible with support of precedence and nested filters. So, bare example is (weight=X AND (color='red' OR color='green')) OR price<1000 The requests should go via HTTP / REST, but that's insignificant (it only adds an issue with translating filters from URI to some internal model). Any comparison operators should be supported (like equality, inequality, less than etc.)
Specifics
Model
There is no fixed model definition - in fact I am free to chose one. To make it simpler I am using simple key=>value for details. So it goes at the very minimum to:
class Value extends Entity implements Arrayable
{
protected $key;
protected $value;
//getters/setters for key/value here
}
for simple value for product detail and something like
class Product extends Entity implements Arrayable
{
protected $id;
/**
* #var Value[]
*/
protected $details;
//getters/setters, more properties that are omitted
}
for the product. Now, regarding data model, there is a first question: How to design filtering model?. I have a simple idea of implementing it as a let's say, recursive iterator which will be a tree regular structure according to incoming user request. The difficulties which I certainly need to solve here are:
Quickly build the model structure out from user request
Possibility for easy modification of the structure
Easy translate of chosen filters data model to chosen storage (see below)
Last point in the list above is probably the most important part as storage routines will be most time-consuming and therefore filters data model should fit in such structure. That means storage has always higher priority and if data model can not fit into some storage design that allows to resolve the issue - then data model should be changed.
Storage
As a storage I want to use NoSQL+RDBMS which is Postgree 9.4 for example. So that will allow to use JSON for storing details. I do not want to use EAV in any case, that is why pure relational DBMS isn't an option (see here why). There is one important thing - products may contain stocks which leads to the situation that I have basically two ways:
If I design products as a single entity with their stocks (pretty logical), then I can not go "storage" + "indexer" approach because this produces outdated state as indexer (such as SOLR) needs to update and reindex data
Design with separate entities. That means - to separate whatever can be cached from whatever that can not. First part then can go to indexer (and details probably can go to there, so we are filtering by them) and non-cacheable part will go somewhere else.
And the question for storage part would be, of course: which one to chose?
Good thing about first approach is that the internal API is simple, internal structures are simple and scalable because they then can easily be abstracted from storage layer. Bad thing is that then I need this "magic solution" which will allow to use "just storage" instead of "storage+indexer". "Magic" here means to somehow design indexes or some additional data-structures (I was thinking about hashing, but it isn't helpful against range queries) in storage that will resolve filtering requests.
On the other hand second solution will allow to use search engine to resolve filtering task inside itself but producing some gap when data will be outdated there. And of course now the data layer needs to be implemented the way it will somehow know about which part of model goes to which storage (so stocks to one storage, details to another etc)
Summary
What can be a proper data model to design filtering?
Which approach should be used to resolve the issue on the storage level: storage+indexer with separate products model or only storage with monolithic products model? Or may be something else?
If go the approach with storage only - is it possible to design storage so it will be possible to filter out products easily by any set of details?
If go with the indexer, what will fit better for this issue? (There is a good comparison between solr and sphinx here, but it's '15 now while it was made in '09 so for sure it is outdated)
Any links, related blogposts or articles are very welcome.
As a P.S.: I did a search across SO but faced barely-relevant suggestions/topics so far (for example this). I am not expecting a silver bullet here as it is always boils down to some trade-off, but however question looks very standard so there should be good insights already. Please, guide me - I tried to "ask google" with some luck but that was not enough yet.
P.P.S. feel free to edit tags or redirect question to proper SE resource if SO is not a good idea for such kind of questions. And I am not asking language-specific solution, so if you are not using PHP - it does not matter, design has nothing to do with the language
My preferred solution would be to split the entities - your second approach. The stable data would be held in Cassandra (or Solr or Elastic etc), while the volatile stock data would be held in (ideally) an in-memory database like Redis or Memcache that supports compare-and-swap / transactions (or Dynamo or Voldemort etc if the stock data won't fit in memory). You won't need to worry too much about the consistency of the stable data since presumably it changes rarely if ever, so you can choose a scalable but not entirely consistent database like Cassandra; meanwhile you can choose a less scalable but more consistent database for the volatile stock data.
I'm investigating a personal Grails project and want to put together a domain model to represent a product catalog. I really can't decide the best way to go about it. I will have a number of different product categories although many categories will just have a base set of properties that are shared across all categories (e.g. product name, product description, price etc). However, some products will have additional properties specific to their category.
I've looked into the Entity Attribute Value (EAV) Model technique that provides a very extensible solution. And, I've considered the route of using an explicit OO inheritance model where I have sub-classes of a base Product class to represent any product that has additional properties.
Obviously, the second approach is less extensible - to add a new product category would require a new entity and likely a custom view/editor for the front-end. However, as a developer, I think the programming model is significantly clearer and much more logical to code against.
The EAV approach would allow dynamic extensibility but would lead to a more cryptic programming model and would have a performance overhead in the DB (complex table joins). Views/editors on the front end could be dynamically generated to include any number of the custom attributes for a product category - though I'm sure situations would arise where such dynamic generation wouldn't suffice from a usability perspective.
When I consider a framework like Grails, it would seem to make sense to go down the route of creating an explicit inheritance model. I'm not convinced a framework like Grails would fit the EAV approach so well - a lot of the benefits of Grails would be lost in the complexity. However, I'm not sure this approach would scale practically as the number of product categories increases.
I'd be really interested to hear of others' experience with this type of modelling challenge!
I’ve had a situation similar to this and went with the inheritance solution. Going into this I knew I’d never have more than about 10 classes so I wasn’t worried about exponential growth of complexity. Although you will need views and controllers for each class there are some things you can do to reduce code duplication. The first thing to do is to put all common view code in templates. For example if all your classes will have a price, name, and description the view code that will allow the displaying and editing of this should be put into templates. Instead of having duplicate lines of code in each view you can simply do a
<g:render template=”/baseView</g>render>
For more info on templates see http://www.grails.org/Tag+-+render
The second thing I found useful to do was move all shared controller code into a class and define closures that I could call from my actual controller. This got quite ugly since my save method would not only insure the fields of the base class were dealt with properly but would also have code for corner cases of the inherited classes. Looking back on this a better option may have been to define custom behavior as functions of the domain class that required it or to use a service. With that said putting code into closures that could be called from the controller was still helpful since it would allow me to have one line long controller bodies instead of 30 or 40. If I had to modify code dealing with the base class I could edit it where the closures were defined and that change would be reflected across all my controllers with no code change to the actual source file of the controller. This came in quite useful and allowed me to edit code in one place instead of editing duplicate code across 10 controllers.
Inheritance works fine with Hibernate and GORM. Consider using the table-per-subclass mapping as you cannot define NOT NULL constraints with the (default) table-per-hierarchy inheritance mapping.
You can also use composition for "not so" common, but shared, attributes.
"The" criteria for EAV is, do you need to introduce new attributes without changing the data model?
In practice, applications like yours use a combination of inheritance and EAV.
You're concerned about performance when querying JOINed tables. That's normally not an issue if you index the columns that are included in the SQL WHERE statement.
(GORM/Hibernate will automatically create foreign keys, which are important as well.) (Given, the necessary indexes are in place and a DBMS that provides a decent query optimizer (i.e., PostgreSQL oder SQL Server - maybe not MySQL), you can select from millions of records using 10 joins in 50 milliseconds or less.)
Finally, there's been an excellent, recent, discussion on your issue.
I'm working on a 2-tier application where WinForms client makes direct calls to the database. In one of the scenarios I need to display a list of Customer entities to a user. The problem is that Customer entity contains a lot of properties (some quite heavy) and I need only two of them - first and last names. So to improve performance and make presentation logic clearer I want to create some kind of a CustomerSummaryViewModel class with required properties only and use NHibernate's projections feature to load it. My concern here is that in this case my data access logic becomes coupled with presentation and to me it seems conceptually wrong.
Do you think this is ok or there are better solutions?
I think you can consider the CustomerSummaryViewModel as report (CustomerSummaryReport). It is fine to query your entities for scenario's like this and treat them as reports. Most reports are more complex, using multiple entities and aggregate queries. This report is very simple, but you can still use it like a report.
You also mention that the performance is significant. That is another reason to use a separate reporting query and DTO. The customer entity sounds like one of the "main" entities you use. That it takes a significant amount of time to retrieve them from the database with lazy-loaded properties not initialized, can be a warning to optimize the customer entity itself, instead using reporting queries to retrieve information about them. Just a warning because I have seen cases where this was needed.
By the way, you can consider linq instead of projections for the easier syntax like:
var reports = session.Linq<Customer>()
.Where(condition)
.Select(customer => new Report
{
FirstName = customer.FirstName,
LastName = customer.LastName
})
.ToList();
I've been following a mostly DDD methodology for this project, so, like any DDD'er, I created my domain model classes first. My intention is to use these POCO's as my LINQ-to-SQL entities (yes, they're not pure POCO's, but I'm ok with that). I've started creating the database schema and external mapping XML file, but I'm running into some issues with modeling the entities' relationships and associations.
An artifact represents a document. Artifacts can be associated with either a Task or a Case. The Case entity looks like this:
public class Case
{
private EntitySet<Artifact> _Artifacts;
public IList<Artifact> Artifacts
{
get
{
return _Artifacts;
}
set
{
_Artifacts.Assign(value);
}
}
.
.
.
}
Since an Artifact can be associated with either a Case, or a Task, I've the option to use inheritance on the Artifact class to create CaseArtifact and TaskArtifact derived classes. The only difference between the two classes, however, would be the presence of a Case field or a Task field. In the database of course, I would have a single table, Artifact, with a type discriminator field and the CaseId and TaskId fields.
My question: is this a valid approach to solving this problem, or would creating a join table for each association (2 new tables, total) be a better approach?
I would probably go with two tables - it makes the referential integrity-PK/FKs a little simpler to handle in the database, since you won't have to have a complex constraint based on the selector column.
(to reply to your comment - I ran out of space so post here as an edit) My overall philosophy is that the database should be modelled with database best practices (protect your perimeter and ensure database consistency, using as much RI and constraints as possible, provide all access through SPs, log activity as necessary, control all modes of access, use triggers where necessary) and the object model should be modelled with OOP best practices to provide a powerful and consistent API. It's the job of your SPs/data-access layer to handle the impedance mismatch.
If you just persist a well-designed object model to a database, your database won't have much intrinsic value (difficult to data mine, report, warehouse, metadata vague, etc) when viewed without going through the lens of the object model - this is fine for some application, typically not for mine.
If you just mimic a well-designed database structure in your application, without providing a rich OO API, your application will be difficult to maintain and the internal strucutres will be awkward to deal with - typically very procedural, rigid and with a lot of code duplication.
I would consider finding commonalities in between case and task, for the lack of better word let's call it "CaseTask" and then sub-typing (inheriting) from that one. After that you attach document to the super-type.
UPDATE (after comment):
I would then consider something like this. Each document can be attached to several cases or tasks.
I work for a billing service that uses some complicated mainframe-based billing software for it's core services. We have all kinds of codes we set up that are used for tracking things: payment codes, provider codes, write-off codes, etc... Each type of code has a completely different set of data items that control what the code does and how it behaves.
I am tasked with building a new system for tracking changes made to these codes. We want to know who requested what code, who/when it was reviewed, approved, and implemented, and what the exact setup looked like for that code. The current process only tracks two of the different types of code. This project will add immediate support for a third, with the goal of also making it easy to add additional code types into the same process at a later date. My design conundrum is that each code type has a different set of data that needs to be configured with it, of varying complexity. So I have a few choices available:
I could give each code type it's own table(s) and build them independently. Considering we only have three codes I'm concerned about at the moment, this would be simplest. However, this concept has already failed or I wouldn't be building a new system in the first place. It's also weak in that the code involved in writing generic source code at the presentation level to display request data for any code type (even those not yet implemented) is not trivial.
Build a db schema capable of storing the data points associated with each code type: not only values, but what type they are and how they should be displayed (dropdown list from an enum of some kind). I have a decent db schema for this started, but it just feels wrong: overly complicated to query and maintain, and it ultimately requires a custom query to view full data in nice tabular for for each code type anyway.
Storing the data points for each code request as xml. This greatly simplifies the database design and will hopefully make it easier to build the interface: just set up a schema for each code type. Then have code that validates requests to their schema, transforms a schema into display widgets and maps an actual request item onto the display. What this item lacks is how to handle changes to the schema.
My questions are: how would you do it? Am I missing any big design options? Any other pros/cons to those choices?
My current inclination is to go with the xml option. Given the schema updates are expected but extremely infrequent (probably less than one per code type per 18 months), should I just build it to assume the schema never changes, but so that I can easily add support for a changing schema later? What would that look like in SQL Server 2000 (we're moving to SQL Server 2005, but that won't be ready until after this project is supposed to be completed)?
[Update]:
One reason I'm thinking xml is that some of the data will be complex: nested/conditional data, enumerated drop down lists, etc. But I really don't need to query any of it. So I was thinking it would be easier to define this data in xml schemas.
However, le dorfier's point about introducing a whole new technology hit very close to home. We currently use very little xml anywhere. That's slowly changing, but at the moment this would look a little out of place.
I'm also not entirely sure how to build an input form from a schema, and then merge a record that matches that schema into the form in an elegant way. It will be very common to only store a partially-completed record and so I don't want to build the form from the record itself. That's a topic for a different question, though.
Based on all the comments so far Xml is still the leading candidate. Separate tables may be as good or better, but I have the feeling that my manager would see that as not different or generic enough compared to what we're currently doing.
There is no simple, generic solution to a complex, meticulous problem. You can't have both simple storage and simple app logic at the same time. Either the database structure must be complex, or else your app must be complex as it interprets the data.
I outline five solution to this general problem in "product table, many kind of product, each product have many parameters."
For your situation, I would lean toward Concrete Table Inheritance or Serialized LOB (the XML solution).
The reason that XML might be a good solution is that:
You don't need to use SQL to pick out individual fields; you're always going to display the whole form.
Your XML can annotate fields for data type, user interface control, etc.
But of course you need to add code to parse and validate the XML. You should use an XML schema to help with this. In which case you're just replacing one technology for enforcing data organization (RDBMS) with another (XML schema).
You could also use an RDF solution instead of an RDBMS. In RDF, metadata is queriable and extensible, and you can model entities with "facts" about them. For example:
Payment code XYZ contains attribute TradeCredit (Net-30, Net-60, etc.)
Attribute TradeCredit is of type CalendarInterval
Type CalendarInterval is displayed as a drop-down
.. and so on
Re your comments: Yeah, I am wary of any solution that uses XML. To paraphrase Jamie Zawinski:
Some people, when confronted with a problem, think "I know, I'll use XML." Now they have two problems.
Another solution would be to invent a little Domain-Specific Language to describe your forms. Use that to generate the user-interface. Then use the database only to store the values for form data instances.
Why do you say "this concept has already failed or I wouldn't be building a new system in the first place"? Is it because you suspect there must be a scheme for handling them in common?
Else I'd say to continue the existing philosophy, and establish additional tables. At least it would be sharing an existing pattern and maintaining some consistency in that respect.
Do a web search on "generalized specialized relational modeling". You'll find articles on how to set up tables that store the attributes of each kind of code, and the attributes common to all codes.
If you’re interested in object modeling, just search on “generalized specialized object modeling”.