How should we structure our models with microservices?

How should we structure our models with microservices? - database

For example, if I have a microservice with this API:
service User {
rpc GetUser(GetUserRequest) returns (GetUserResponse) {}
}
message GetUserRequest {
int32 user_id = 1;
}
message GetUserResponse {
int32 user_id = 1;
string first_name = 2;
string last_name = 3;
}
I figured that for other services that require users, I'm going to have to store this user_id in all rows that have data associated with that user ID. For example, if I have a separate Posts service, I would store the user_id information for every post author. And then whenever I want that user information to return data in an endpoint, I would need to make a network call to the User service.
Would I always want to do that? Or are there certain times that I want to just copy over information from the User service into my current service (excluding saving into in-memory databases like Redis)?

Copying complete data generally never required, most of times for purposes of scale or making microservices more independent, people tend to copy some of the information which is more or less static in nature.
For eg: In Post Service, i might copy author basic information like name in post microservices, because when somebody making a request to the post microservice to get list of post based on some filter , i do not want to get name of author for each post.
Also the side effect of copying data is maintaining its consistency. So make sure you business really demands it.

You'll definitely want to avoid sharing database schema/tables. See this blog for an explanation. Use a purpose built interface for dependency between the services.
Any decision to "copy" data into your other service should be made by the service's team, but they better have a real good reason in order for it to make sense. Most designs won't require duplicated data because the service boundary should be domain specific and non-overlapping. In case of user ids they can be often be treated as contextual references without any attached logic about users.
One pattern observed is: If you have auth protected endpoints, you will need to make a call to your auth service anyway - for security - and that same call should allow you to acquire whatever user id information is necessary.
All the regular best practices for API dependencies apply, e.g. regarding stability, versioning, deprecating etc.

Related

Custom Entity UUID w/ embedded info

I follow clean arch / solid principles on my entire stack. I'm coming across a situation where I want to embed an UUID in some of my entity id fields in the domain logic, for example:
Create OrganizationEntity id=abc123
Create a ItemEntity and embed the OrganizationEntity 's id that owns that ItemEntity in the id field when it's created, ie: Item.id = itm-abc123-sdfnj344
I'm thinking of going this route so that I can reduce the amount of DB lookups to see if someone has access to a ItemEntity - if the client request belongs to OrganizationEntity then I can pattern match abc123 on both the client request session id and the requesting record for ItemEntity ... this would greatly improve performance.
Is this a known pattern/implementation? Are there any concerns or gotchas?

Try to keep your domain model as close to the language of the domain experts as you can. So if an Item belongs to an organization it is ok to have a reference id in the Item. But if an item belongs to another domain object and this one belongs to an organization you should not reference the organization in the item domain object because of performance (persistence) reasons.
You said that you want to check if someone has access to the ItemEntity. This means that there is a kind of context in which ItemEntity objects are accessible.
I see 3 options to implement such a context:
a repository api that has a organization id argument
public interface ItemRepository {
public List<ItemEntity> findItems(...., organizationId);
}
When you pass the organization id on every repository call, the repository is stateless. But it also means that you must pass the organization id from the controller to the use case and then to the repository.
a repository that is bound to an organization
public ItemRepository {
private UUID organizationId; // constructor omitted here
public List<ItemEntity> findItems(...){}
}
When you create a repository that is bound to an organization, you must create it when you need it (and also the use case), because it is stateful. But you can be sure that noone can get items that he is not allowed to see.
organization id in a call context
When the controller is invoked it takes the organization id from the session, puts it in the call context and calls the use case. In Java you would use a ThreadLocal. You can also implement this as an aspect and apply it to every controller (AOP). The repository implemetation can then access the call context and get the organization id and use it in it's queries or filter the items before returning them.
This option will allow you to access the organization id in every layer that is in the flow of control, e.g. in all use cases, entities, repositories or when you call an external service.
In all three cases you can avoid to put the organization id in the item just for database access reasons.

Domain driven design database validation in model layer

I'm creating a design for a Twitter application to practice DDD. My domain model looks like this:
The user and tweet are marked blue to indicate them being a aggregate root. Between the user and the tweet I want a bounded context, each will run in their respective microservice (auth and tweet).
To reference which user has created a tweet, but not run into a self-referencing loop, I have created the UserInfo object. The UserInfo object is created via events when a new user is created. It stores only the information the Tweet microservice will need of the user.
When I create a tweet I only provide the userid and relevant fields to the tweet, with that user id I want to be able to retrieve the UserInfo object, via id reference, to use it in the various child objects, such as Mentions and Poster.
The issue I run into is the persistance, at first glance I thought "Just provide the UserInfo object in the tweet constructor and it's done, all the child aggregates have access to it". But it's a bit harder on the Mention class, since the Mention will contain a dynamic username like so: "#anyuser". To validate if anyuser exists as a UserInfo object I need to query the database. However, I don't know who is mentioned before the tweet's content has been parsed, and that logic resides in the domain model itself and is called as a result of using the tweets constructor. Without this logic, no mentions are extracted so nothing can "yet" be validated.
If I cannot validate it before creating the tweet, because I need the extraction logic, and I cannot use the database repository inside the domain model layer, how can I validate the mentions properly?

Whenever an AR needs to reach out of it's own boundary to gather data there's two main solutions:
You pass in a service to the AR's method which allows it to perform the resolution. The service interface is defined in the domain, but most likely implemented in the infrastructure layer.
e.g. someAr.someMethod(args, someServiceImpl)
Note that if the data is required at construction time you may want to introduce a factory that takes a dependency on the service interface, performs the validation and returns an instance of the AR.
e.g.
tweetFactory = new TweetFactory(new SqlUserInfoLookupService(...));
tweet = tweetFactory.create(...);
You resolve the dependencies in the application layer first, then pass the required data. Note that the application layer could take a dependency onto a domain service in order to perform some reverse resolutions first.
e.g.
If the application layer would like to resolve the UserInfo for all mentions, but can't because it doesn't know how to parse mentions within the text it could always rely on a domain service or value object to perform that task first, then resolve the UserInfo dependencies and provide them to the Tweet AR. Be cautious here not to leak too much logic in the application layer though. If the orchestration logic becomes intertwined with business logic you may want to extract such use case processing logic in a domain service.
Finally, note that any data validated outside the boundary of an AR is always considered stale. The #xyz user could currently exist, but not exist anymore (e.g. deactivated) 1ms after the tweet was sent.

Minimizing database overhead by storing additional information to the users authentication-object (Spring Security)

Long story short: I want to minimize my database look-ups for things like user_id of an already logged in user but I don't know what a good way of doing this would look like.
I am using Spring Security in order to check if a logged in user is authenticated. However, after authentication, as some actual requests come in, I would like to minimize the number of database calls as much as possible.
Hence my question: From the SecurityContextHolder I can get my hands on the User object using getPrincipal() of the Authentication object:
#PreAuthorize("isAuthenticated()")
public List<StoreDTO> getAvailableStores() {
Authentication auth = SecurityContextHolder.getContext().getAuthentication();
User user = (User)auth.getPrincipal();
String username = user.getUsername();
List<Store> storeList = this.storeAdminRepository.getStores(username);
return Convert.toStoreDtoList(restaurantList);
}
Would it be "dirty", instead of setting the simple User object as principal, to use a custom object that e.g. also stores the user id from inside the database?
The way I am doing it now would require me to look up the user id first according to his name and then get the stores where user.id = store.user_id or something like that.
Assume that there are a lot of requests coming and - is this a way to minimize the overhead? Or, and this could also be true, are my concerns unfounded since the overhead is by far not that large like I am assuming here?

Best way to implement supplemental analytics

I want to be able to allow my writers to see how much traffic their articles are getting. I can do this in Google Analytics but can't figure out how to share this data with them without giving them access to all the data so I was thinking of adding another analytics service that would insert a unique code for each author on their articles. I already have the GA code and quantcast code so I don't want to bog down my site much more. Should I use a pixel tracker or javascript tracker?
UPDATE: Here is the code I use in analytics to track my authors.
var pageTracker = _gat._getTracker("UA-xxxxxxx-x");
pageTracker._trackPageview();
} catch(err) {}
<?php if ( is_singular()) { ?>
pageTracker._trackEvent('Authors','viewed','<?php the_author_meta('ID'); ?>');
<?php } ?>

you could use a custom field to track the writers by a unique id that they probably have. Then you could use GA's api to pull data where custom field value = unique id and display it in their profile or wherever you want them to see it.

One option would be to use a server-local Redis instance and use the PHP Redis library to increment a local counter using the author ID and article IDs.
For example, if in redis you use a sorted set with AuthorID as the redis key, and use the article ID (or however you identify an article) as a member that you increment using zincrby for each load you'll have the data readily available and under your control. You could then have a PHP page that pulls the author's data from Redis and display it in whatever format you need. For example you could build a table showing them traffic for each of their articles, or make pretty graphs to display it. You could extend the above to do per-day traffic (for example) by using a key structure of "AUTHORID:YYYY-MM-DD" instead of just author ID.
The hit penalty for tracking this is much lower than reaching out to an external site - it should be on the order of single-digit milliseconds. Even if your Redis instance was elsewhere the response times should still be lower than external tracking. I know you are using GA but this is a simple to implement method you could consider.

This slightly depends on how many authors you have and your level of involvement, main type I would use is either
Create a separate view per author and filter in his / hers traffic
Use a google docs plugin to pull down authors data and share
Use the API to pull down relevant information
Happy to give mor specifics if you could guide in more details what you want

Should I Pass UserID between application levels?

WHen submitting data to Data Layer when userID is not a field in the object being passed, but will still need to cross reference tables with userID when submitting data, should I call the the membership class to get the UserID at the datalayer, or should I pass UserID from level to level as a parameter? (ie from the business layer to the data layer? )
(Or does it not matter either way?)
Controller or Business Layer:
MembershipUser user = Membership.GetUser();
Guid userID = (Guid)user.ProviderUserKey;
DL.SaveObject (Object, userID);
OR
Do it in DataLayer:
SaveObject(Object)
{
MembershipUser user = Membership.GetUser();
Guid userID = (Guid)user.ProviderUserKey;
...
...
}

Whilst this is indeed a cross-cutting concern, my personal preference is that code such as:
MembershipUser user = Membership.GetUser();
Guid userID = (Guid)user.ProviderUserKey;
should NOT be in the data layer.
I like to see this kind of code in a layer higher than the data layer, usually the business layer, simply because I want my data layer to be entirely agnostic as where the data it will read/write/process has come from. I want my data gathering to be done primarily in the UI layer (in the case of user supplied data/input) and perhaps a little more data gathering within the business layer (such as gathering a UserID or retrieving a user's roles/authorisation).
Putting code such as this in the business layer can potentially lead to duplication of this code across many different domain objects, however, this can be alleviated by abstracting this code away into it's own object, and using object composition to allow other domain objects to access it.

Passing input to the DataAccessLayer should be done by the controller or BL. I prefer not to include anything other than Data read / write in DAL. (in this case, DAL is given the task of determining the currently logged in user)

In general I'd prefer to see GetUser() in SaveObject(), since that would allow the business layer to be abstracted away from it and should reduce the amount of code that is calling GetUser(). But it depends on the requirements. If you needed to apply business rules based on who the user is, then (also) putting it in the business layer might make more sense.
Authorization/authentication is one of those cross-cutting concerns that AOP (aspect oriented programming) is best suited to handle.
Update:
CraigTP makes a valid point wrt having the data layer be agnostic about where its data comes from. In general I would agree. In this scenario there is a requirement where user identity is needed by the data persistence mechanism, probably for security and/or auditing purposes. So in this case I'd prefer to put the user identity access call under the control of the layer that needs it. I'd abstract away the details of the GetUser() implementation behind another call, so the data layer doesn't have a dependency on System.Web.Security.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight