Microservices architecture database - database

I have been studying microservice architecture for a while. But I have a few questions on my mind.
If you need to give an example, they are
order-service
customer-service
Product-service
Suppose there are 3 microservices above. They are using relational databases.
I list orders in the order-service. But I also have to pull customer informations here.
If this were a monolotic structure, I could handle it with join. But how can I do that in microservis architecture.
Note: I’m not doing any projects. My goal is only to understand the microservice architecture.

Options:
Limit the dependency between orderservice and customerservice: normaly the order is a self containing object that has all the customers data (from the time of ordering) in it.
If still needed the order should have the id of the customer saved and then any UI or logic that want to access recent customer data need to use the "public api" of the customer service. The "public api" in general can be anything - it even can be a defined shared storage (like a database). However most teams decide to not allow direct access to the technical storage to avoid tight coupling. Thats why most of the times service talk Rest (or GRPC) for syncronous use cases or use some form of messaging for async interactions
However - decide why you want to split it up - are you expecting a growing developer base and high complexity? If not a monolith might be cheaper to build for your case..

But how can I do that in microservis architecture.
Just by calling another microservice and asking for required additional information.
Normally microservices do not share database, as you noticed.
So if you have a class Order like that
class Order
{
OrderId;
ItemName;
UserName;
}
And a method that returns an order GetOrder(id) like that
GetOrder(orderId)
{
item = ItemMicroserice.GetItem();
user = UserMicroservice.GetUser();
result = new Order()
{
OrderId = orderId,
ItemName = item.Name,
UserName = user.Name
}
return result;
}
You can notice that there are two calls to other microservices that will return data to construct Order object.
Thought you can see that it can be slightly not optimal in sense of performance. So sometimes microservices do store duplicate information to be able to construct objects faster (eliminate calls to other microservices). And, for example, if Users microservice updates data, it sends an event to Orders microservice so it can update cached data from other microservices.

Related

How to model related objects without tight database coupling

I'm designing entities where one is related to another, but keeping application/database separate and also performance in mind.
I've read on many architectual concepts (SOLID, separation of concerns, etc.).
ORM frameworks solve this internally and lazy load the related data when it is accessed.
But is there a practical way without tightly coupling my objects to the ORM and keeping database logic out of them?
Separating the data for a relational database is simple.
For example:
Orders-Table, Customers-Table, Addresses-Table, Countries-Table
Order: id, date, customerId, ...
Customer: id, company, email, defaultAddressId, ...
Address: id, street, countryId, ...
Country: id, name, code, ...
I want to keep the database-related functions separate, so I would create separate repositories, which fetch the data from the database.
For example:
$orderRepository->getById(123);
$customerRepository->getById(234);
$addressRepository->getById(345);
$countryRepository->getById(456);
Sometimes I only need the order data. Sometimes I would need the related Customer and sometimes I need to know in which country the customer lives of the current order.
If I'm only reading a single order, that would all no problem. As I could fetch the needed data in separate variables:
$customer = $customerRepository->getById($order->getCustomerId());
$defaultAddress = $addressRepository->getById($customer->getDefaultAddressId());
$country = $countryRepository->getById($defaultAddress->getCountryId());
But if I want to list many orders on one page (or any other use case with many related objects in one view) and display for each the country name, from which it comes, this would be complicated.
Ideally I would write in the view:
foreach ($orders as $order) {
...
$order->getCustomer()->getDefaultAddress()->getCountry()->getCode();
...
}
From my current knowledge there are three possible solutions:
Lazy loading
The call of $order->getCustomer() will call (maybe a singleton of) the customer repository to fetch the customer object. Then will be the address repository called to fetch the address, then the country repository.
Disadvantage:
many single database calls and each object must know anything about the needed repositories
Fetching all related data, when the orders are fetched
So maybe the repositories call the other repositories to fetch all the data, which their objects need:
OrderRepository:
function getCurrentOrders() {
...code to fetch order data from database...
$relatedCustomers = $this->customerRepository->getByMultipleIds($relatedCustomerIds);
...assigning fetched customer objects to order objects
}
The call to customer repository will lead to call to address repository, which will lead to call to country repository. This would reduce the database calls at first.
Disadvantage:
Data is loaded which is most of the time not needed. It is fine for the list view, but when I only need the direct order infos or only a single order, there are still 3 other calls to the database (in larger object trees maybe many more).
Tailor-made objects for each required view
Either a customized database query which builds a new object with all needed data or some wrapper objects which keep the related objects inside.
Disadvantage:
Could be really complicated when also business logic is needed, as I have to implement the same logic at several points.
How do you keep business logic separate from database code and design entities?

DDD, Databases and Lists of Data

Im at the beginning of my first "real" software project, and I'd like to start off right. The concept of DDD seems like a very clean approach which separates the various software parts, however im having trouble implementing this in reality.
My Software is measurement tracker and essentially stores list of measurement data, consisting of a timestamp and the data value.
My Domain Models
class MeasurementDM{
string Name{get;set;}
List<MeasurementPointDM> MeasurementPoints{get;set;}
}
class MeasurementPointDM{
DateTime Time{get;set;}
double Value{get;set;}
}
My Persistence Models:
class MeasurementPM{
string Id{get;set;} //Primary key
string Name{get;set;} //Data from DomainModel to store
}
class MeasurementPointPM{
string Id{get;set;} //Primary Key
string MeasurementId{get;set;} //Key of Parent measurement
}
I now have the following issues:
1) Because I want to keep my Domain Models pure, I don't want or need the Database Keys inside those classes. This is no problem when building my Domain models from the Database, but I don't understand how to store them, as the Domain Model no longer knows the Database Id. Should I be including this in the Domain model anyway? Should I create a Dictionary mapping Domain objects to Database ids when i retreive them from the Database?
2)The measurement points essentially have the same Id problem as the measurements themselves. Additionally I'm not sure what the right way is to store the MeasurementPoints themselves. Above, each MeasurementPointPM knows to which MeasurementPM it belongs. When I query, I simply select MeasurementPoints based on their Measurement key. Is this a valid way to store such data? It seems like this will explode as more and more measurements are added. Would I be better off serializing my list of MeasurementPoints to a string, and storing the whole list as an nvarchar? This would make adding and removing datapoints more difficult, as Id always need to deserialize, reserialize the whole list
I'm having difficulty finding a good example of DDD that handles these problems, and hopefully someone out there can help me out.
My Software is measurement tracker and essentially stores list of measurement data, consisting of a timestamp and the data value.
You may want to have a careful think about whether you are describing a service or a database. If your primary use case is storing information that comes from somewhere else, then introducing a domain model into the mix may not make your life any better.
Domain models test to be interesting when new information interacts with old information. So if all you have are data structures, it's going to be hard to discover a good model (because the critical element -- how the model entities change over time -- is missing).
That said....
I don't understand how to store them, as the Domain Model no longer knows the Database Id.
This isn't your fault. The literature sucks.
The most common answer is that _people are allowing their models to be polluted with O/RM concerns. For instance, if you look at the Cargo entity from the Citerus sample application, you'll find these lines hidden at the bottom:
Cargo() {
// Needed by Hibernate
}
// Auto-generated surrogate key
private Long id;
This is an indirect consequence of the fact that the "repository" pattern provides the illusion of an in-memory collection of objects that maintain their own state, when the reality under the covers is that you are copying values between memory and durable storage.
Which is to say, if you want a clean domain model, then you are going to need a separate in memory representation for your stored data, and functions to translate back and forth between the two.
Put another way, what you are running into is a violation of the Single Responsibility Principle -- if you are using the same types to model your domain that you use to manage your persistence, the result is going to be a mix of the two concerns.
So essentially you would say that some minimal pollution of the domain model, for example an Id, is standard practice.
Less strong; I would say that it is a common practice. Fundamentally, a lot of people, particularly in the early stages of a project, don't value having a boundary between their domain model and their persistence plumbing.
Could it make sense to have every Domain Model inherit from a base class or implement an interface that forces the creation of Unique Id?
It could. There are a lot of examples on the web where domain entities extend some generic Entity or Aggregate pattern.
The really interesting questions are
What are the immediate costs and benefits of doing that?
What are the deferred costs and benefits of doing that?
In particular, does that make things easier or harder to change?

API design for non-CRUD queries

we are considering creating an API app that our internal apps and customers can use. This seems simple for CRUD API queries, but we are having problems with more complex queries.
For example, suppose an internal app wants to know "for each company, count number of users and number of dashboards, and return the results", how would that be exposed as part of the API?
I would imagine performance becomes an issue if we issue one query for list of all companies, then fire two queries per company to count users and dashboards.
Also, how do we deal with issues where we are currently hardcoding SQLs for optimization?
Any recommended readings are also appreciated.
I generate crud stored procedures for all database tables including selects by all foreign keys and then generate domain objects for all database tables. To me, object mapping feels more scalable to work with.
If there is a particular view of data that will cause problems or does not fit the model. I would extend an existing domain object with the extended data. For example:
public class Customer
{
public int CustomerID{get;set;}
}
public class CustomerListingView:Customer
{
public int NumberOfOrders{get;set;}
public int NumberOfSomethingElse{get;set;}
}
This will at least allow you to cast your custom view data to controller functions requiring the super type.

Normalized Tables in MVC with Concurrency Checking (ACID)

Two part question:
Number 1: What is the best approach in creating a model for an object that references another object, when some of the properties/attributes of the referenced object are not always necessary?
Imagine if you have two objects: PERSON and BUSINESS
Person
+ PersonID
+ Name
+ Age
+ Sex
+ Skill
+ Business *
Business
+ BusinessID
+ Name
+ Address
+ CorporateVision (this is large)
In the example above: A PERSON has a reference to a BUSINESS as their current employer.
In the database, I would have two tables for each object. While in code, using the MVC architecture pattern, I would have two classes for each object. The database would have a foreign-key relationship between BUSINESS-->PERSON, while in code the PERSON object would have a member variable that holds a reference to a BUSINESS object.
Now let's say I want to enumerate on a collection of PERSONS and find out the total number of those that work for a specific company (based on BUSINESS . Name).
Without using MVC, I could just create a function that would query the database and get a count. Simple and efficent.
WITH MVC, I need to instantiate every PERSON object, which in turns, instantiates a BUSINESS object for the reference (if one was not already done for it... the BusinessFactory would check a collection first). Furthermore, it MUST pull in BUSINESS . CorporateVision from the database for every object. And because most of these businesses are Media Marketing Companies, most of their corporate visions are large text blobs. So it is very unnecessary to read CorporateVision from the database when all we need is the name of the business.
I could solve this problem by having changing the PERSON object in code to:
Person
+ PersonID
+ Name
+ Age
+ Sex
+ Skill
+ BusinessID
+ BusinessName
So now when I create my PERSON object, I do a JOIN with BUSINESS and cache the name. Now I can get the BusinessName quickly and efficiently... and I still can get the full BUSINESS object as needed by doing a lookup on the ID. But I just denormalized the model... and I just introduced a new problem... and a new question.
Number 2: How does MVC handle concurrency with a multi-user database?
Lets say while my client application is enumerating (using the enumeration that I mentioned above that finds all people that work for a particular business), another user merged two of the BUSINESS objects.
Now my in-memory collection is wrong because all of the BusinessName caching is stale. The same could be true if I had just left the PERSON . Business as a BUSINESS object reference: The BUSINESS object would be stale.
In summary: I feel that with MVC I lose data retrieval efficiency as well as the loss of the ACIDness of my application. Or am I using MVC wrong?
You seem to mix UI and Data access, while you should minimize their dependencies over another. MVC is actually a pretty broad pattern which describes how application interacts with user. Both your questions are related to data access.
1) MVC is the way you organize UI. So, model is a piece of information you want to make user interact with. Note, business objects are not priority here. If there is a case when use loads a Person class along with several properties from Business, so be it: your second Person rendition is a perfect model for this case. And so on - each use case requires a different model and you should create different model for different scenarios.
If you think it's easier for you to call a function to calculate the number - fine. Remember, you are not bound to business objects here.
With more 'object'-oriented approach we usually solve this reference problem in two ways:
first is lazy loading, with is out-of-the-box feature for modern O/RMs. So you load a person and after a first call to Person.Business, latter is loaded automatically.
second is that you create a special kind UI, which is aware of your data access specifics and either has only fields you use, or requests additional data in an async manner from a client.
2) Again, MVC doesn't handles concurrency, and it shouldn't handle it and shouldn't even bother. It's a concern of data access layer. And there are also several ways to deal with concurrency, major of them are optimistic and pessimistic locks. (With the first one you allow different users to make conflicting changes and try to resolve conflicts when they occur. The second way prevents conflicts by locking updates completely). Again, O/RMs deal with it usually; or you can use your own implementation, but it should be still data access, not MVC part.

What is the best strategy for mirroring a remote DB in Core Data?

Let's say that I have two tables in a DB: Expenses and Account. Expenses is the data that I'm interested in and that table has a foreign key to Account. This DB is remote, accessed via Restful-esque commands, and I want to mirror just the data I need for my app in a Core Data data store on the iPhone. The actual DB I'm working with is much bigger than this example. ~30 tables and the Expenses table has ~7 FKs. I'm working closely with the person doing the API design, so I can modify the way I make my requests or the data returned, if necessary.
What is the best strategy for loading this data into Core Data?
My first thought was to have the request for the expense bring back the ids for the FK.
<expense>
<date>1/1/2011</date>
<cost>1.50</cost>
<account_id>123</account_id>
</expense>
This works fine if I already have an account with id '123' in my data store. If I don't, then I've got to make additional web requests every time I encounter an id I don't have… which is going to be incredibly slow. I can get around this by making requests in a specific order, i.e. request all new accounts before requesting expenses, so that I way I know all the FK rows exist. I feel this would become much too cumbersome once the DB starts reaching moderate complexity.
My second thought was to have the data returned from the request follow FKs and return data from the FK.
<expense>
<date>1/1/2011</date>
<cost>1.50</cost>
<account>
<id>123</id>
<name>Bob's Big Boy</name>
<address>1234 Main Street</address>
</account>
</expense>
This looks better and guarantees that I'll have all the data I need when I need it. If I don't already have an account '123' I can create a new account object from that XML. My concern with this method, though, is that as the database grows in complexity, these XML files could become excessively large. The Expenses table has ~7 foreign keys, each of those tables has multiple FKs. It feels like a simple request for just a single Expense could end up returning a huge chunk of data.
How have other people solved this issue?
I am assuming that at any given time you only want to cache part of the server DB in the local app and that the data cached may change overtime.
You probably want to use "stub" entities to represent related objects that you haven't actually downloaded yet. You would set up the entities like this:
Expense{
date:Date
cost:Number
account<<-->AccountStub.expenses
}
AccountStub{
id:Number
expenses<-->>Expenses.account
}
Account:AccountStub{
name:String
address:String
}
The AccountStub entity has the bare minimum info needed to identify the Account in the server DB based on info provided from the Expense table. It serves as a placeholder in the object graph for the full fledged Account object (you can think of it as a type of fault if you like.)
Since Expenses has the relationship with AccountStub and Account inherits from AccountStub you can swap out an Account for an AccountStub (and vice versa) as needed.
You will need to provide a custom subclass for AccountStub and Account such that AccountStub can trigger the downloading of account data and the creation of an Account object when that data is actually required. Then the new Account object should be swapped out for AccountStub in all its relationships (that may take rather a lot of code.)
To use, you would first obtain the data for an Expense object and create that object. You would attempt to fetch for an AccountStub with the ID provided from the Expense table data. Set the fetch to include subentries. If an AccountStub or Account object exist with that ID you will add the Expense object to the relationship. If not, the you create an AccountStub object with that ID and add it to the relationship. Now you have a basic object graph showing the relationship of an Expense object to an AccountStub object. To access the account data of an Expense, you would first check if the related account is a stub or a full account. If it is a stub, then you need to load the full account data before preceding.
The advantage of this system is that you can maintain a fairly complex object graph without having to actually have all the data locally. E.g. you can maintain several relationships and walk those relationships. E.g you could expand your model like this:
AccountStub{
id:Number
expenses<-->>Expenses.account
owner<<--AccountOwnerStub.accounts
}
AccountOwnerStub{
id:Number
accounts<-->>AccountStub.owner
}
AccountOwner{
name:String
address:String
bill:Number
}
If you wanted to find the name of an Expense object's account owner, you would just walk the relationship across the stubs with account.owner.name the Account object itself would would remain just a stub.
If you need to conserve room locally, you can revert an object back to a stub without compromising the graph.
This would take some work and you would have to keep an eye on the stubs but it would let you mirror a complex external DB without having to keep all the data on hand.

Resources