I'm designing entities where one is related to another, but keeping application/database separate and also performance in mind.
I've read on many architectual concepts (SOLID, separation of concerns, etc.).
ORM frameworks solve this internally and lazy load the related data when it is accessed.
But is there a practical way without tightly coupling my objects to the ORM and keeping database logic out of them?
Separating the data for a relational database is simple.
For example:
Orders-Table, Customers-Table, Addresses-Table, Countries-Table
Order: id, date, customerId, ...
Customer: id, company, email, defaultAddressId, ...
Address: id, street, countryId, ...
Country: id, name, code, ...
I want to keep the database-related functions separate, so I would create separate repositories, which fetch the data from the database.
For example:
$orderRepository->getById(123);
$customerRepository->getById(234);
$addressRepository->getById(345);
$countryRepository->getById(456);
Sometimes I only need the order data. Sometimes I would need the related Customer and sometimes I need to know in which country the customer lives of the current order.
If I'm only reading a single order, that would all no problem. As I could fetch the needed data in separate variables:
$customer = $customerRepository->getById($order->getCustomerId());
$defaultAddress = $addressRepository->getById($customer->getDefaultAddressId());
$country = $countryRepository->getById($defaultAddress->getCountryId());
But if I want to list many orders on one page (or any other use case with many related objects in one view) and display for each the country name, from which it comes, this would be complicated.
Ideally I would write in the view:
foreach ($orders as $order) {
...
$order->getCustomer()->getDefaultAddress()->getCountry()->getCode();
...
}
From my current knowledge there are three possible solutions:
Lazy loading
The call of $order->getCustomer() will call (maybe a singleton of) the customer repository to fetch the customer object. Then will be the address repository called to fetch the address, then the country repository.
Disadvantage:
many single database calls and each object must know anything about the needed repositories
Fetching all related data, when the orders are fetched
So maybe the repositories call the other repositories to fetch all the data, which their objects need:
OrderRepository:
function getCurrentOrders() {
...code to fetch order data from database...
$relatedCustomers = $this->customerRepository->getByMultipleIds($relatedCustomerIds);
...assigning fetched customer objects to order objects
}
The call to customer repository will lead to call to address repository, which will lead to call to country repository. This would reduce the database calls at first.
Disadvantage:
Data is loaded which is most of the time not needed. It is fine for the list view, but when I only need the direct order infos or only a single order, there are still 3 other calls to the database (in larger object trees maybe many more).
Tailor-made objects for each required view
Either a customized database query which builds a new object with all needed data or some wrapper objects which keep the related objects inside.
Disadvantage:
Could be really complicated when also business logic is needed, as I have to implement the same logic at several points.
How do you keep business logic separate from database code and design entities?
Related
There are a lot of questions on syncing state between devices or from external storage to/from the UI. This question is about state within the UI.
The UI may have multiple state objects that can point to one entity.
Eg. Multiple User Models that have the same ID and are essentially the same User in the Database.
The second option is to have a pattern that prevents multiple entities and enforces a single Entity is never duplicated.
Eg. Retrieving a User Model with ID=1 will always return the same Model.
So the options I currently face:
Have multiple Models point to the same DB entity
Enforce a single Instance of a Model reflects a DB entity
Both of these have their tradeoffs:
Have multiple Models point to the same DB entity
This requires syncing the Models with the same ID when a copy if updated.
This becomes non-trivial in implementation.
The current implementation we have is an EntityManager that keeps copies of each model and will propagate writes to all copies.
It however has complexities in syncing due to async writes to the remote copies, reads from other devices and remote fetches as well as reactions (mobx) within models need to ensure they are reacting to a consistent state of the model.
Enforce a single Instance of a Model reflects a DB entity
This requires no work to sync. However we have the complexity of ensuring we don't have any copies of a Model pointing to the same DB entity.
This becomes subject to coding conventions.
Eg.
model.fromJSON({ title: 'foo' })
model.fetch()
Becomes
model = model.fromJSON({ title: 'foo' })
model = model.fetch()
This is hard to understand for new developers and can be missed over time creating hard to debug errors.
The question is how do you generally solve this scenario with a consistent and the least complex in terms of bugs case.
I have been studying microservice architecture for a while. But I have a few questions on my mind.
If you need to give an example, they are
order-service
customer-service
Product-service
Suppose there are 3 microservices above. They are using relational databases.
I list orders in the order-service. But I also have to pull customer informations here.
If this were a monolotic structure, I could handle it with join. But how can I do that in microservis architecture.
Note: I’m not doing any projects. My goal is only to understand the microservice architecture.
Options:
Limit the dependency between orderservice and customerservice: normaly the order is a self containing object that has all the customers data (from the time of ordering) in it.
If still needed the order should have the id of the customer saved and then any UI or logic that want to access recent customer data need to use the "public api" of the customer service. The "public api" in general can be anything - it even can be a defined shared storage (like a database). However most teams decide to not allow direct access to the technical storage to avoid tight coupling. Thats why most of the times service talk Rest (or GRPC) for syncronous use cases or use some form of messaging for async interactions
However - decide why you want to split it up - are you expecting a growing developer base and high complexity? If not a monolith might be cheaper to build for your case..
But how can I do that in microservis architecture.
Just by calling another microservice and asking for required additional information.
Normally microservices do not share database, as you noticed.
So if you have a class Order like that
class Order
{
OrderId;
ItemName;
UserName;
}
And a method that returns an order GetOrder(id) like that
GetOrder(orderId)
{
item = ItemMicroserice.GetItem();
user = UserMicroservice.GetUser();
result = new Order()
{
OrderId = orderId,
ItemName = item.Name,
UserName = user.Name
}
return result;
}
You can notice that there are two calls to other microservices that will return data to construct Order object.
Thought you can see that it can be slightly not optimal in sense of performance. So sometimes microservices do store duplicate information to be able to construct objects faster (eliminate calls to other microservices). And, for example, if Users microservice updates data, it sends an event to Orders microservice so it can update cached data from other microservices.
I'm creating an application that stores user's data in multiple database tables - info, payments and booking (this is a booking system).
In 'info' table I store the user info such as email, name, phone, etc...,
In 'payments' table I store his payments details and in 'booking' I store his booking history.
My questions is - What is the best way of representing this data in Flux architecture? Do I need 3 different stores (for each table) or a single store (let's say 'UserStore') that holds all user's data?
Basically, I have a dashboard component that should show all user's data.
In case I should go with the 3 different stores solution, is it possible to know when all of them finished loading their data (since each store loads the data asynchronously from the DB)?...
Thanks!
Generally, less stores is better (easier to maintain).
Rule of thumb I use: if a store gets bigger than 300 lines of code, then you should consider splitting up in different stores.
In both solutions (one store or 3 stores) you will need code to manage dependencies between the tables. The code just lives in different files/stores.
About the user dashboard you mention: avoid storing redundant data in your store(s). So if you have e.g. total number of bookings for a user, make a getter function that does live calculation.
Better not let stores load data from the DB or server-side directly, but create a WebAPIUtil that fires an update-action when new data comes in. See the 'official' flux diagram.
The stores can have a waitFor function that wait until another store is updated.
Two part question:
Number 1: What is the best approach in creating a model for an object that references another object, when some of the properties/attributes of the referenced object are not always necessary?
Imagine if you have two objects: PERSON and BUSINESS
Person
+ PersonID
+ Name
+ Age
+ Sex
+ Skill
+ Business *
Business
+ BusinessID
+ Name
+ Address
+ CorporateVision (this is large)
In the example above: A PERSON has a reference to a BUSINESS as their current employer.
In the database, I would have two tables for each object. While in code, using the MVC architecture pattern, I would have two classes for each object. The database would have a foreign-key relationship between BUSINESS-->PERSON, while in code the PERSON object would have a member variable that holds a reference to a BUSINESS object.
Now let's say I want to enumerate on a collection of PERSONS and find out the total number of those that work for a specific company (based on BUSINESS . Name).
Without using MVC, I could just create a function that would query the database and get a count. Simple and efficent.
WITH MVC, I need to instantiate every PERSON object, which in turns, instantiates a BUSINESS object for the reference (if one was not already done for it... the BusinessFactory would check a collection first). Furthermore, it MUST pull in BUSINESS . CorporateVision from the database for every object. And because most of these businesses are Media Marketing Companies, most of their corporate visions are large text blobs. So it is very unnecessary to read CorporateVision from the database when all we need is the name of the business.
I could solve this problem by having changing the PERSON object in code to:
Person
+ PersonID
+ Name
+ Age
+ Sex
+ Skill
+ BusinessID
+ BusinessName
So now when I create my PERSON object, I do a JOIN with BUSINESS and cache the name. Now I can get the BusinessName quickly and efficiently... and I still can get the full BUSINESS object as needed by doing a lookup on the ID. But I just denormalized the model... and I just introduced a new problem... and a new question.
Number 2: How does MVC handle concurrency with a multi-user database?
Lets say while my client application is enumerating (using the enumeration that I mentioned above that finds all people that work for a particular business), another user merged two of the BUSINESS objects.
Now my in-memory collection is wrong because all of the BusinessName caching is stale. The same could be true if I had just left the PERSON . Business as a BUSINESS object reference: The BUSINESS object would be stale.
In summary: I feel that with MVC I lose data retrieval efficiency as well as the loss of the ACIDness of my application. Or am I using MVC wrong?
You seem to mix UI and Data access, while you should minimize their dependencies over another. MVC is actually a pretty broad pattern which describes how application interacts with user. Both your questions are related to data access.
1) MVC is the way you organize UI. So, model is a piece of information you want to make user interact with. Note, business objects are not priority here. If there is a case when use loads a Person class along with several properties from Business, so be it: your second Person rendition is a perfect model for this case. And so on - each use case requires a different model and you should create different model for different scenarios.
If you think it's easier for you to call a function to calculate the number - fine. Remember, you are not bound to business objects here.
With more 'object'-oriented approach we usually solve this reference problem in two ways:
first is lazy loading, with is out-of-the-box feature for modern O/RMs. So you load a person and after a first call to Person.Business, latter is loaded automatically.
second is that you create a special kind UI, which is aware of your data access specifics and either has only fields you use, or requests additional data in an async manner from a client.
2) Again, MVC doesn't handles concurrency, and it shouldn't handle it and shouldn't even bother. It's a concern of data access layer. And there are also several ways to deal with concurrency, major of them are optimistic and pessimistic locks. (With the first one you allow different users to make conflicting changes and try to resolve conflicts when they occur. The second way prevents conflicts by locking updates completely). Again, O/RMs deal with it usually; or you can use your own implementation, but it should be still data access, not MVC part.
Let's say that I have two tables in a DB: Expenses and Account. Expenses is the data that I'm interested in and that table has a foreign key to Account. This DB is remote, accessed via Restful-esque commands, and I want to mirror just the data I need for my app in a Core Data data store on the iPhone. The actual DB I'm working with is much bigger than this example. ~30 tables and the Expenses table has ~7 FKs. I'm working closely with the person doing the API design, so I can modify the way I make my requests or the data returned, if necessary.
What is the best strategy for loading this data into Core Data?
My first thought was to have the request for the expense bring back the ids for the FK.
<expense>
<date>1/1/2011</date>
<cost>1.50</cost>
<account_id>123</account_id>
</expense>
This works fine if I already have an account with id '123' in my data store. If I don't, then I've got to make additional web requests every time I encounter an id I don't have… which is going to be incredibly slow. I can get around this by making requests in a specific order, i.e. request all new accounts before requesting expenses, so that I way I know all the FK rows exist. I feel this would become much too cumbersome once the DB starts reaching moderate complexity.
My second thought was to have the data returned from the request follow FKs and return data from the FK.
<expense>
<date>1/1/2011</date>
<cost>1.50</cost>
<account>
<id>123</id>
<name>Bob's Big Boy</name>
<address>1234 Main Street</address>
</account>
</expense>
This looks better and guarantees that I'll have all the data I need when I need it. If I don't already have an account '123' I can create a new account object from that XML. My concern with this method, though, is that as the database grows in complexity, these XML files could become excessively large. The Expenses table has ~7 foreign keys, each of those tables has multiple FKs. It feels like a simple request for just a single Expense could end up returning a huge chunk of data.
How have other people solved this issue?
I am assuming that at any given time you only want to cache part of the server DB in the local app and that the data cached may change overtime.
You probably want to use "stub" entities to represent related objects that you haven't actually downloaded yet. You would set up the entities like this:
Expense{
date:Date
cost:Number
account<<-->AccountStub.expenses
}
AccountStub{
id:Number
expenses<-->>Expenses.account
}
Account:AccountStub{
name:String
address:String
}
The AccountStub entity has the bare minimum info needed to identify the Account in the server DB based on info provided from the Expense table. It serves as a placeholder in the object graph for the full fledged Account object (you can think of it as a type of fault if you like.)
Since Expenses has the relationship with AccountStub and Account inherits from AccountStub you can swap out an Account for an AccountStub (and vice versa) as needed.
You will need to provide a custom subclass for AccountStub and Account such that AccountStub can trigger the downloading of account data and the creation of an Account object when that data is actually required. Then the new Account object should be swapped out for AccountStub in all its relationships (that may take rather a lot of code.)
To use, you would first obtain the data for an Expense object and create that object. You would attempt to fetch for an AccountStub with the ID provided from the Expense table data. Set the fetch to include subentries. If an AccountStub or Account object exist with that ID you will add the Expense object to the relationship. If not, the you create an AccountStub object with that ID and add it to the relationship. Now you have a basic object graph showing the relationship of an Expense object to an AccountStub object. To access the account data of an Expense, you would first check if the related account is a stub or a full account. If it is a stub, then you need to load the full account data before preceding.
The advantage of this system is that you can maintain a fairly complex object graph without having to actually have all the data locally. E.g. you can maintain several relationships and walk those relationships. E.g you could expand your model like this:
AccountStub{
id:Number
expenses<-->>Expenses.account
owner<<--AccountOwnerStub.accounts
}
AccountOwnerStub{
id:Number
accounts<-->>AccountStub.owner
}
AccountOwner{
name:String
address:String
bill:Number
}
If you wanted to find the name of an Expense object's account owner, you would just walk the relationship across the stubs with account.owner.name the Account object itself would would remain just a stub.
If you need to conserve room locally, you can revert an object back to a stub without compromising the graph.
This would take some work and you would have to keep an eye on the stubs but it would let you mirror a complex external DB without having to keep all the data on hand.