Maybe this is a silly question, but I am new to large scale development in django. Basic django examples always show simple queries where you get a queryset and return the result set back to the template for use (e.g. a list of users). I was wondering is this really a best practice for a large site? Coming from the object oriented world, we have kept separate structures to loosely couple the database from business logic and views. One set of models would map directly to the tables in the database and use an ORM such as SqlAlchemy to automatically manage them but separate business models would be used to returned to the views for template usage based on business logic needs. They may have extra or fewer fields than the database table object depending on what information is needed for that template and come from a reusable code layer that contains the business logic. For example, sometimes you want to flatten a few columns from multiple database models in to one class that is easily consumable.
Also for situations such as transactions and connection handling, the reusable layer allows us to keep all knowledge of the database out of the views. The middle layer explicitly creates and uses a session to be used on different objects as needed and closes the session when done though I rarely see evidence of this done in django. While it is more development work, it tends to give better performance results. Is this separation of models and business logic all this considered overkill in the django world? Thanks.
From what you explain I get the sensation that you would like to build a middleware layer that caches queryset results in a reusable format.
Django takes the opposite approach: querysets are lazy. They do not touch the database until their result is needed in a view. When combining queries, it is only the combined query that hits the database.
Apart from that, caching and querysets gives good performance for most situations if you construct your models sensibly.
Related
For people that are splitting up monolithic applications into microservices how are you handling the connundrum of breaking apart the database. Typical applications that I've worked on do a lot of database integration for performance and simplicity reasons.
If you have two tables that are logically distinct (bounded contexts if you will) but you often do aggregate processing on a large volumes of that data then in the monolith you're more than likely to eschew object orientation and are instead using your database's standard JOIN feature to process the data on the database prior to return the aggregated view back to your app tier.
How do you justify splitting up such data into microservices where presumably you will be required to 'join' the data through an API rather than at the database.
I've read Sam Newman's Microservices book and in the chapter on splitting the Monolith he gives an example of "Breaking Foreign Key Relationships" where he acknowledges that doing a join across an API is going to be slower - but he goes on to say if your application is fast enough anyway, does it matter that it is slower than before?
This seems a bit glib? What are people's experiences? What techniques did you use to make the API joins perform acceptably?
When performance or latency doesn't matter too much (yes, we don't
always need them) it's perfectly fine to just use simple RESTful APIs
for querying additional data you need. If you need to do multiple
calls to different microservices and return one result you can use
API Gateway pattern.
It's perfectly fine to have redundancy in Polyglot persistence environments. For example, you can use messaging queue for your microservices and send "update" events every time you change something. Other microservices will listen to required events and save data locally. So instead of querying you keep all required data in appropriate storage for specific microservice.
Also, don't forget about caching :) You can use tools like Redis or Memcached to avoid querying other databases too often.
It's OK for services to have read-only replicated copies of certain reference data from other services.
Given that, when trying to refactor a monolithic database into microservices (as opposed to rewrite) I would
create a db schema for the service
create versioned* views** in that schema to expose data from that schema to other services
do joins against these readonly views
This will let you independently modify table data/strucutre without breaking other applications.
Rather than use views, I might also consider using triggers to replicate data from one schema to another.
This would be incremental progress in the right direction, establishing the seams of your components, and a move to REST can be done later.
*the views can be extended. If a breaking change is required, create a v2 of the same view and remove the old version when it is no longer required.
**or Table-Valued-Functions, or Sprocs.
CQRS---Command Query Aggregation Pattern is the answer to thi as per Chris Richardson.
Let each microservice update its own data Model and generates the events which will update the materialized view having the required join data from earlier microservices.This MV could be any NoSql DB or Redis or elasticsearch which is query optimized. This techniques leads to Eventual consistency which is definitely not bad and avoids the real time application side joins.
Hope this answers.
I would separate the solutions for the area of use, on let’s say operational and reporting.
For the microservices that operate to provide data for single forms that need data from other microservices (this is the operational case) I think using API joins is the way to go. You will not go for big amounts of data, you can do data integration in the service.
The other case is when you need to do big queries on large amount of data to do aggregations etc. (the reporting case). For this need I would think about maintaining a shared database – similar to your original scheme and updating it with events from your microservice databases. On this shared database you could continue to use your stored procedures which would save your effort and support the database optimizations.
In Microservices you create diff. read models, so for eg: if you have two diff. bounded context and somebody wants to search on both the data then somebody needs to listen to events from both bounded context and create a view specific for the application.
In this case there will be more space needed, but no joins will be needed and no joins.
I created for a project a single class, that contains all access code to the database.
Is this a good practice , under the assumption that this class doesn't contain any logic, or should i use several classes? If yes, how should i partition my code? I use C# .Net.
Actually Under the concept of MVC framework, it is a good practice to create a different class for database access, seperate class for logic and seperate class for your views.
You are doing good if you are writing a seperate class for database access under the assumption that it does not contain any logic.
In Agile Developement there is a term named as Database Encapsulation Layers.
A database encapsulation layer hides the implementation details of your database, including their physical schemas, from your business code. In effect this layer provides your business objects with persistence services – the ability to read data from, write data to, and delete data from – data sources. Ideally your business objects should know nothing about how they are persisted, it just happens. Database encapsulation layers aren’t magic and they aren’t academic theories; database encapsulation layers are commonly used practice by both large and small applications as well as in both simple and complex applications. Database encapsulation layers are an important technique that every agile software developer should be aware of and be prepared to use.
An effective database encapsulation layer will provide several benefits:
-> It reduces the coupling between your object schema and your data schema, increasing your ability to evolve either one.
-> It implements all database-related code in one place.
-> It simplifies the job of application programmers.
-> It allows application programmers to focus on the business problem and Agile DBA(s) can focus on the database.
-> It gives you a common place to implement data-oriented business rules and logic.
-> It takes advantage of specific database features, increasing application performance.
Hope this helps.
If your database is quite small, say, only a couple of tables, you could write all your queries in one class. otherwise I would suggest that per Entity/Table one class. for example, StudentDao.class will only focus on the queries to database table "STUDENT", and TeacherDao.class will only contain queries to table "TEACHER". if you are gonna implement a complex business logic, you may want to have a service class, to weave StudentDao and TeacherDao together.
Unless your data access is very simple, probably not.
you probably shouldn't need to write this code yourself. Take a look at some Object Relational Mapping tools. NHibernate is a popular .Net solution. http://en.wikipedia.org/wiki/NHibernate
If you really do want to write it yourself look up design patterns in this area, like the Data Transfer Object pattern. http://martinfowler.com/eaaCatalog/dataTransferObject.html
These are some of the suggestions while accessing database.
1.) Always keep your database access parameters in a properties file. Use a handler to get those data. Because when you change your database then you need not change your code just make a change in the properties file it's enough.
-- So here you need a handler class.
2.) Never create a single class (a god class) which performs all the actions. Disperse your behaviour in to different classes depending on the intent. For example Keep all read behavior in one class, Write behavior in another class ... so on.
3.) You can create a class which deals with connection creations and pooling stuff...
Hope this helps.
We have to redesign a legacy POI database from MySQL to PostgreSQL. Currently all entities have 80-120+ attributes that represent individual properties.
We have been asked to consider flexibility as well as good design approach for the new database. However new design should allow:
n no. of attributes/properties for any entity i.e. no of attributes for any entity are not fixed and may change on regular basis.
allow content admins to add new properties to existing entities on the fly using through admin interfaces rather than making changes in db schema all the time.
There are quite a few discussions about performance issues of EAV but if we don't go with a hybrid-EAV we end up:
having lot of empty columns (we still go and add new columns even if 99% of the data does not have those properties)
spend more time maintaining database esp. when attributes keep changing.
no way of allowing content admins to add new properties to existing entities
Anyway here's what we are thinking about the new design (basic ERD included):
Have separate tables for each entity containing some basic info that is exclusive e.g. id,name,address,contact,created,etc etc.
Have 2 tables attribute type and attribute to store properties information.
Link each entity to an attribute using a many-to-many relation.
Store addresses in different table and link to entities using foreign key.
We think this will allow us to be more flexible when adding,removing or updating on properties.
This design, however, will result in increased number of joins when fetching data e.g.to display all "attributes" for a given stadium we might have a query with 20+ joins to fetch all related attributes in a single row.
What are your thoughts on this design, and what would be your advice to improve it.
Thank you for reading.
I'm maintaining a 10 year old system that has a central EAV model with 10M+ entities, 500M+ values and hundreds of attributes. Some design considerations from my experience:
If you have any business logic that applies to a specific attribute it's worth having that attribute as an explicit column. The EAV attributes should really be stuff that is generic, the application shouldn't distinguish attribute A from attribute B. If you find a literal reference to an EAV attribute in the code, odds are that it should be an explicit column.
Having significant amounts of empty columns isn't a big technical issue. It does need good coding and documentation practices to compartmentalize different concerns that end up in one table:
Have conventions and rules that let you know which part of your application reads and modifies which part of the data.
Use views to ease poking around the database with debugging tools.
Create and maintain test data generators so you can easily create schema conforming dummy data for the parts of the model that you are not currently interested in.
Use rigorous database versioning. The only way to make schema changes should be via a tool that keeps track of and applies change scripts. Postgresql has transactional DDL, that is one killer feature for automating schema changes.
Postgresql doesn't really like skinny tables. Each attribute value results in 32 bytes of data storage overhead in addition to the extra work of traversing all the rows to pull the data together. If you mostly read and write the attributes as a batch, consider serializing the data into the row in some way. attr_ids int[], attr_values text[] is one option, hstore is another, or something client side, like json or protobuf, if you don't need to touch anything specific on the database side.
Don't go out of your way to put everything into one single entity table. If they don't share any attributes in a sensible way, use multiple instantitions of the specific EAV pattern you use. But do try to use the same pattern and share any accessor code between the different instatiations. You can always parametrise the code on the entity name.
Always keep in mind that code is data and data is code. You need to find the correct balance between pushing decisions into the meta-model and expressing them as code. If you make the meta-model do too much, modifying it will need the same kind of ability to understand the system, versioning tools, QA procedures, staging as your code, but it will have none of the tools. In essence you will be doing programming in a very awkward non-standard language. On the other hand, if you leave too much in the code, every trivial change will need a new version of your software. People tend to err on the side of making the meta-model too complex. Building developer tools for meta-models is hard and tedious work and has limited benefit. On the other hand, making the release process cheaper by automating everything that happens from commit to deploy has many side benefits.
EAV can be useful for some scenarios. But it is a little like "the dark side". Powerful, flexible and very seducing it is. But it's something of an easy way out. An easy way out of doing proper analysis and design.
I think "entity" is a bit over the top too general. You seem to have some idea of what should be connected to that entity, like address and contact. What if you decide to have "Books" in the model. Would they also have adresses and contacts? I think you should try to find the right generalizations and keep the EAV parts of the model to a minium. Whenever you find yourself wanting to show a certain subset of the attributes, or test for existance of the value, or determining behaviour based on the value you should really have it modelled as a columns.
You will not get a better opportunity to design this system than now. The requirements are known since the previous version, and also what worked and what didn't. (Just don't fall victim to the Second System Effect)
One good implementation of EAV can be found in magento, a cms for ecommerce. There is a lot of bad talk about EAV those days, but I challenge anyone to come up with another solution than EAV for dealing with infinite product attributes.
Sure you can go about enumerating all the columns you would need for every product in the world, but that would take you a lot of time and you would inevitably forget product attributes in the way.
So the bottom line is : use EAV for infinite stuff but don't rely on EAV for all the database's tables. Hence an hybrid EAV and relational db, when done right, is a powerful tool that could not be acomplished by only using fixed columns.
Basically EAV is trying to implement a database within a database, and it leads to madness. The queries to pull data become overly complex, and your data has no stable, specific model to keep it in some kind of order.
I've written EAV systems for limited applications, but as a generic solution it's usually a bad idea.
I have several objects, like products, order, etc. When I get information from my database I take a row and create one of the types of objects. I then work with that object created. I read this is called a factory.
Is there some advantage to doing this? Especially in a loosely typed language like PHP?
Thanks
edit: is this where I get database agnosticity? Is this what an ORM essentially does?
By creating your objects from the database queries, you are defining the mapping between your objects and the relational database. This is exactly what ORM software does.
By doing so and ensuring that your objects never directly access the database, but instead use your database-access functions/objects, you are protecting your code from changes in two ways:
Changes to your database schema will not ripple through your code. Instead, the code changes will be located only in your database access objects.
You can switch to a different DBMS by implementing a new database layer that follows the same interface as the original. Your other objects will not require changes.
I guess in that sense, you gain some database-agnosticity, but you'll probably be better off using a database library that provides that agnosticity out of the box.
In my opinion, the advantage is you are working with objects and gain all of the advantages that an object-oriented language offers. You can then read the domain logic at a higher level (in terms of the objects you have defined) without sifting through database queries. Writing the ORM yourself can be tough, but there are tools out there that help with that.
This is the route I normally take, but I don't do any PHP development, so I can't say how well it applies to that language.
What you're describing is an implementation of a Data Access Layer - it doesn't sound like an example of the Factory Method pattern, nor the Abstract Factory pattern.
Yes, ORMs bridge the gap from objects to relational databases, and can serve as your Data Access Layer. Bear in mind, any ORM you use has certain pros/cons/limitations. Depending on your experience and requirements, writing your own data access layer is sometimes a good idea; don't feel like you HAVE to use a 3rd-party ORM.
Yes, a good data access layer makes it easy to swap out your storage mechanism (different database, XML, flat files, whatever) without changing your business logic, UI, or other code.
Regardless of loose-typed or strong-typed languages, if you're working in an OO language, it will be much easier to write code using data objects (provided by an ORM or homegrown data access layer). I'm sure it's possible to write a system with no data access layer, where your business layer works directly with the database. But it will likely be more challenging to implement and maintain.
What is the best practice of introducing custom (typically volatile) data into entity model classes? This may sound like a bad practice first, but it seems to be quite a common scenario. In our recent web application we have developed a proper model and in most cases we are fine with loading model entities. But there are cases where we cannot afford loading an entire hierarchy of entities; we need to load, say, results of a couple of SQL COUNT’s or possibly some additional information alongside (or embedded inside) the model entities. So basically, the requirements and conditions are:
It’s a web application where 99.9999999999% of all operations are read operations.
They don’t need to process or do any complicated business logic. We just need to get data quickly to HTML.
In several performance critical cases, we need to load results of SQL aggregates which don’t fit any model properties.
We need an extensible way to introduce any new custom data if needed.
How do you usually solve this issue without working too much around your ORM (for instance raw data from db)? I’m sure this has been discussed many times, but I cannot figure out a good Google query to find anything useful.
Edit: Since I later realized the question was not very well formed, I decided to reformulate it and start a new one.
If you're just getting relational data to and from a browser, with little or no behavior in between, it sounds like your trying to solve a relational problem with an OO paradigm.
I might be inclined to dispense with the Object Oriented approach altogether.
Me team recently rewrote an application by asking "What is the simplest thing that can possibly work?" and "What is the closest language to the problem?". Our new app, replacing an OO one, ended up being 10 times smaller, faster, and cheaper.
We used SQL, stored procedures, XML libraries on the DB server, XSLT (to get the HTML), and javascript.
OOP purist like myself would go to the Decorator pattern.
http://en.wikipedia.org/wiki/Decorator_pattern
But the thing is, some people may not need the flexibility it offers. Plus, creating new classes for each distinct operation may seem overkill, but it provide good compile type checking.
The best practice in my view is that your application consumes data using the Domain Model pattern. The Domain Model can offer business-logic methods for doing the type of queries that make sense and are relevant to your application needs.
These can fetch "live" results that map directly to database rows and can therefore be edited and "saved."
But additionally, the Domain Model can provide methods that fetch read-only results that are too complex to be easily saved back to the database. This includes your example of grouped aggregate query results, and also includes joined query result sets, expressions as columns, etc.
The Domain Model pattern offers a way to decouple the OO design of an application from the design of the physical database.