How to handle multiple entity update in the same transaction in Spring Data REST - spring-data-mongodb

Is anyone having an idea on how to handle multiple entity updates within the same transaction in Spring Data REST ? The same thing can be handle within Spring controller methods using the #Transactional annotation. If I am correct, Spring Data REST executes every execution event within separate transactions. So multiple entity updates cannot be handled in a proper way.
I am having issues updating 2 entities (ABC and PQR) within the same transaction and rolling back the ABC entity when the PQR entity is failed.
// ABC repository
#RepositoryRestResource
public interface ABCEntityRepository extends MongoRepository<ABC, String> {
}
// PQR repository
#RepositoryRestResource
public interface PQREntityRepository extends MongoRepository<PQR, String> {
}
// ABC repository handler
#RepositoryEventHandler
public class ABCEventHandler {
#Autowired
private PQREntityRepository pqrEntityRepository;
#HandleBeforeSave
public void handleABCBeforeSave(ABC abc) {
log.debug("before saving ABC...");
}
#HandleAfterSave
public void handleABCAfterSave(ABC abc) {
List<PQR> pqrList = pqrEntityRepository.findById(abc.getPqrId());
if (pqrList != null && !pqrList.isEmpty()) {
pqrList.forEach(pqr -> {
// update PQR objects
}
}
// expect to fail this transaction
pqrEntityRepository.saveAll(pqrList);
}
}
since #HandleAfterSave method is executed in a separate transaction, calling HandleAfterSave method means the ABC entity updation is already completed and cannot rollback, therefore. Any suggestion to handle this ?

Spring Data REST does not think in entities, it thinks in aggregates. Aggregate is a term coming from Domain-Driven Design that describes a group of entities for which certain business rules apply. Take an order along side its line items for example and a business rule that defines a minimum order value that needs to be reached.
The responsibility to govern constraints aligns with another aspect that involves aggregates in DDD which is that strong consistency should/can only be assumed for changes on an aggregate itself. Changes to multiple (different) aggregates should be expected to be eventually consistent. If you transfer that into technology, it's advisable to apply the means of strong consistency – read: transactions – to single aggregates only.
So there is no short answer to your question. The repository structure you show here virtually turns both ABCEntity and PQREntity into aggregates (as repositories only exist for aggregate roots). That means, OOTB Spring Data REST does not support updating them in a single transactional HTTP call.
That said, Spring Data REST allows the declaration of custom resources that can take responsibility of doing that. Similarly to what is shown here, you can simply add resources on additional routes to completely implement what you imagine yourself.
Spring Data REST is not designed to produce a full HTTP API out of the box. It's designed to implement certain REST API patterns that are commonly found in HTTP APIs and will very likely be part of your API. It's build to avoid you having to spend time on thinking about the straight-forward cases and only have to plug custom code for scenarios like the one you described, assuming what you plan to do here is a good idea in the first place. Very often requests like these result in the conclusion that the aggregate design needs a bit of rework.
PS: I saw you tagged that question with spring-data-mongodb. By default, Spring Data REST does not support MongoDB transactions because it doesn't need them. MongoDB document boundaries usually align with aggregate boundaries and updates to a single document are atomic within MongoDB anyway.

I'm not sure I understood your question correctly, but I'll give it a try.
I'd suggest to have a service with both Repositories autowired in, and a method annotated with #Transactional that updates everything you want.
This way, if the transaction fails anywhere inside the method, it will all rollback.
If this does not answer your question, please clarify and I'll try to help.

Related

.NET Core Difference between TransactionScope and DistributedLock

Good morning everyone,
We are implementing a method that stores a document into SqlServer with EF Core.
This method called SAVE is called by multiple endpoints of the controller and is used both from a createDocument endpoint and another endpoint putDocument.
An extra requirement is that this method contains two calls that store the document and its properties into 2 different repositories, therefore we wanted to achieve that if one of the two calls to the repositories fails, it rolls back the changes.
Moreover, the whole code is hosted on multiple machines that's why we also implemented the library of distributed lock of macallon (GitHub) in order to avoid that the same document is accessed/modified by two different machines at the same time.
Is it correct to use a mixup of these solutions? Being a noob for me a transactionScope is already a lock, in a sense that during the transaction other machines cannot even access the same row in the Db but please help me to understand. Thanks
public void Save(Document document){
using var scope = new TransactionScope(TransactionScopeOption.RequiresNew,
TransactionScopeAsyncFlowOption.Enabled);
try
{
using (var lockHandler = documentLock.CreateLockForGuid(document.Guid))
{
repositoryOne.Save(document);
repositoryTwo.Save(document);
scope.Complete();
}
}
}
lockhandler is just a wrapper which calls the distributedLock, SqlDistributedReaderWriterLock with the document.Guid as its name

Domain driven design database validation in model layer

I'm creating a design for a Twitter application to practice DDD. My domain model looks like this:
The user and tweet are marked blue to indicate them being a aggregate root. Between the user and the tweet I want a bounded context, each will run in their respective microservice (auth and tweet).
To reference which user has created a tweet, but not run into a self-referencing loop, I have created the UserInfo object. The UserInfo object is created via events when a new user is created. It stores only the information the Tweet microservice will need of the user.
When I create a tweet I only provide the userid and relevant fields to the tweet, with that user id I want to be able to retrieve the UserInfo object, via id reference, to use it in the various child objects, such as Mentions and Poster.
The issue I run into is the persistance, at first glance I thought "Just provide the UserInfo object in the tweet constructor and it's done, all the child aggregates have access to it". But it's a bit harder on the Mention class, since the Mention will contain a dynamic username like so: "#anyuser". To validate if anyuser exists as a UserInfo object I need to query the database. However, I don't know who is mentioned before the tweet's content has been parsed, and that logic resides in the domain model itself and is called as a result of using the tweets constructor. Without this logic, no mentions are extracted so nothing can "yet" be validated.
If I cannot validate it before creating the tweet, because I need the extraction logic, and I cannot use the database repository inside the domain model layer, how can I validate the mentions properly?
Whenever an AR needs to reach out of it's own boundary to gather data there's two main solutions:
You pass in a service to the AR's method which allows it to perform the resolution. The service interface is defined in the domain, but most likely implemented in the infrastructure layer.
e.g. someAr.someMethod(args, someServiceImpl)
Note that if the data is required at construction time you may want to introduce a factory that takes a dependency on the service interface, performs the validation and returns an instance of the AR.
e.g.
tweetFactory = new TweetFactory(new SqlUserInfoLookupService(...));
tweet = tweetFactory.create(...);
You resolve the dependencies in the application layer first, then pass the required data. Note that the application layer could take a dependency onto a domain service in order to perform some reverse resolutions first.
e.g.
If the application layer would like to resolve the UserInfo for all mentions, but can't because it doesn't know how to parse mentions within the text it could always rely on a domain service or value object to perform that task first, then resolve the UserInfo dependencies and provide them to the Tweet AR. Be cautious here not to leak too much logic in the application layer though. If the orchestration logic becomes intertwined with business logic you may want to extract such use case processing logic in a domain service.
Finally, note that any data validated outside the boundary of an AR is always considered stale. The #xyz user could currently exist, but not exist anymore (e.g. deactivated) 1ms after the tweet was sent.

Domain Driven Design (DDD) and database generated reports

I'm still investigating DDD, but I'm curious to know about one potential pitfall.
According to DDD an aggregate-root shouldn't know about persistence, but doesn't that mean the entire aggregate-root ends up being instantiated in memory?
How could the aggregate-root, for instance, ask the database to group and sum a lot of data if it's not supposed to know about persistence?
According to DDD an aggregate-root shouldn't know about persistence, but doesn't that mean the entire aggregate-root ends up being instantiated in memory?
Oh no, it's worse than that; the entire aggregate (the root and all of the subordinate entities) get loaded instantiated in memory. Essentially by definition, you need all of the state loaded in order to validate any change.
How could the aggregate-root, for instance, ask the database to group and sum a lot of data if it's not supposed to know about persistence?
You don't need the aggregate-root to do that.
The primary role of the domain model is to ensure the integrity of the book of record by ensuring that all writes respect your business invariant. A read, like a database report, isn't going to change the book of record, so you don't need to load the domain model.
If the domain model itself needs the report, it typically defines a service provider interface that specifies the report that it needs, and your persistence component is responsible for figuring out how to implement that interface.
According to DDD an aggregate-root shouldn't know about persistence, but doesn't that mean the entire aggregate-root ends up being instantiated in memory?
Aggregate roots are consistency boundaries, so yes you would typically load the whole aggregate into memory in order to enforce invariants. If this sounds like a problem it is probably a hint that your aggregate is too big and possibly in need of refactoring.
How could the aggregate-root, for instance, ask the database to group and sum a lot of data if it's not supposed to know about persistence?
The aggregate wouldn't ask the database to group and sum data - typically you would load the aggregate in an application service / command handler. For example:
public class SomeUseCaseHandler : IHandle<SomeCommand>
{
private readonly ISomeRepository _someRepository;
public SomeUseCaseHandler(ISomeRepository someRepository)
{
_someRepository = someRepository;
}
public void When(SomeCommand command)
{
var someAggregaate = _someRepository.Load(command.AggregateId);
someAggregate.DoSomething();
_someRepository.Save(someAggregate);
}
}
So your aggregate remains ignore of how it is persisted. However, your implementation of ISomeRepository is not ignorant, so can do whatever is necessary to fully load the aggregate. So you could have your persistence implementation group/sum when loading the aggregate, but more often you would probably query a read model:
public class SomeUseCaseHandler : IHandle<SomeCommand>
{
private readonly ISomeRepository _someRepository;
private readonly ISomeReadModel _someReadModel;
public SomeUseCaseHandler(ISomeRepository someRepository, ISomeReadModel readModel)
{
_someRepository = someRepository;
_someReadModel = someReadModel;
}
public void When(SomeCommand command)
{
var someAggregaate = _someRepository.Load(command.AggregateId);
someAggregate.DoSomethingThatRequiresTheReadModel(_someReadModel);
_someRepository.Save(someAggregate);
}
}
You haven't actually said what your use case is though. :)
[Update]
Just noticed the title refers to database generated reports - this will not go through your domain model at all, it would be a completely separate read model. CQRS applies here

Save an entity which has a cyclic reference with another entity in Spring Data Mongo will be failed

I have two entities with both JPA annotations and Spring Data Mongo annotations. And they reference with each other. Like Parent and Child
#Entity
#Document
class Parent {
#OneToMany
public Set<Child> getChildren() {
return children;
}
}
#Entity
class Child {
#ManyToOne
public Parent getParent() {
return parent;
}
}
So apparently, these two entities reference with each other. With JPA, they are ok. And with Spring Data Mongo 1.8.4, the query is also ok, just has a INFO level message says that there is a cyclic reference has been detected.
But when I try to save data, Spring Data Mongo is failed. The console outputs exceptions round and round. Then finally these exceptions make the stackoverflow.
So whether it is an issue needs to be fixed. Since when query Spring Data Mongo can protect from the cyclic references but the save action cannot.
The INFO level message concerning the cycle provides a hint that there might be cyclic references that cannot be handled based on type information. Since it depends on the actual data used, this just points out a potential problem that might occur during mapping.
Please refer to the documentation on using references for information on splitting your data into multiple collections and referencing them or register a custom converter for your types that knows how how to deal with your types.

Self Tracking Entities Traffic Optimization

I'm working on a personal project using WPF with Entity Framework and Self Tracking Entities. I have a WCF web service which exposes some methods for the CRUD operations. Today I decided to do some tests and to see what actually travels over this service and even though I expected something like this, I got really disappointed. The problem is that for a simple update (or delete) operation for just one object - lets say Category I send to the server the whole object graph, including all of its parent categories, their items, child categories and their items, etc. I my case it was a 170 KB xml file on a really small database (2 main categories and about 20 total and about 60 items). I can't imagine what will happen if I have a really big database.
I tried to google for some articles concerning traffic optimization with STE, but with no success, so I decided to ask here if somebody has done something similar, knows some good practices, etc.
One of the possible ways I came out with is to get the data I need per object with more service calls:
return context.Categories.ToList();//only the categories
...
return context.Items.ToList();//only the items
Instead of:
return context.Categories.Include("Items").ToList();
This way the categories and the items will be separated and when making changes or deleting some objects the data sent over the wire will be less.
Has any of you faced a similar problem and how did you solve it or did you solve it?
We've encountered similiar challenges. First of all, as you already mentioned, is to keep the entities as small as possible (as dictated by the desired client functionality). And second, when sending entities back over the wire to be persisted: strip all navigation properties (nested objects) when they haven't changed. This sounds very simple but is not at all trivial. What we do is to recursively dig into the entities present in trackable collections of say the "topmost" entity (and their trackable collections, and theirs, and...) and remove them when their ChangeTracking state is "Unchanged". But be carefull with this, because in some cases you still need these entities because they have been removed or added to trackable collections of their parent entity (so then you shouldn't remove them).
This, what we call "StripEntity", is also mentioned (not with any code sample or whatsoever) in Julie Lerman's - Programming Entity Framework.
And although it might not be as efficient as a more purist kind of approach, the use of STE's saves a lot of code for queries against the database. We are not in need for optimal performance in a high traffic situation, so STE's suit our needs and takes away a lot of code to communicate with the database. You have to decide for your situation what the "best" solution is. Good luck!
You can find an Entity Framework project item at http://selftrackingentity.codeplex.com/. With version 0.9.8, I added a method called GetObjectGraphChanges() that returns an optimized entity object graph with only objects that have changes.
Also, there are two helper methods: EstimateObjectGraphSize() and EstimateObjectGraphChangeSize(). The first method returns the estimate size of the whole entity object along with its object graph; and the later returns the estimate size of the optimized entity object graph with only object that have changes. With these two helper methods, you can decide whether it makes sense to call GetObjectGraphChanges() or not.

Resources