API design versioning with multiple databases - database

Should one setup versioning for API databases as he does for API URLs?
API starts from api.domain.com/v1/ and eventually evolves to api.domain.com/v2/.
Is it good practice to setup separate databases for each API version?
api.domain.com/v1/ will use database api_v1
api.domain.com/v2/ will use database api_v2

You probably have the same data the api provides access to.
So I think it is pretty impossible to use two databases.
An API always should abstract the underlying business logic and database so changes in an API reflect only possibilities to interact with the business logic.
Versioning of APIs is a good thing but requires good estimate of what might or surely will change in the future to avoid bigger problems.

Related

Database: Repositories with NoSQL/Document Database (DDD)

Looking for any advice from anyone who has migrated their repositories from relational DB to a NoSQL?
We are currently building an App using a Postgres database & ORM (SQLAlchemy). However, there is a possibility that at a later date we may need to migrate the App to an environment that currently only supports a couple of NoSQL solutions.
With that in mind, we're following the Persistence-Orietated approach to repositories covered in Vaughn Vernon's Implementing Domain-Driven Design. This results in the following API:
save(aggregate)
save_all(aggregates)
remove(aggregate)
get_by_...
Without going into detail, the ORM specific code has been hidden away in the repository itself. The Session is only used for the short span of time when data is retrieved, or updated, and then immediately committed and closed (in the repos methods). This means lots of merging on save, and not the most efficient use of the Session.
def save(aggregate):
try:
session.merge(aggregate)
commit
except:
rollback
def get():
try:
aggregate = session.query_by(id)
session.expunge
commit
except:
rollback
return aggregate
etc etc
The advantages:
We are limiting ourselves to updating a single Aggregate per Use Case, so the lack of fully utilising the UOW Transaction Control in the Application Service is minimal (outside of performance). Transaction Control is enabled in the repos while the aggregate is written to ensure the full aggregate is persisted.
No ORM specific code leaks outside of the Repositories, which would need to be re-coded in the advent of switching to a NoSQL db anyway.
So if we do have to switch to a NoSQL DB, we should have a minimal amount of work to do.
However, almost everything I have read encourages Transactional Behaviour to live in the Application Service Layer. Although I believe there is a distinction here between Business Transactional and DB Transactional.
Likewise, we're taking performance hit, in that we are asking the session factory for a session on every call to the repository. Most services contain about 3 or so calls to a repository.
So, the question to anyone who has migrated from Relational to a NoSQL DB?
Does the concept of a Unit of Work / Session mean anything in a NoSQL world?
Should we fully embrace the ORM in the meantime, and move the UOW/Session outside of the Repository into the Application Service?
If we do that, what was the level of effort to re-engineer the Application Service, if we need to migrate to a NoSQL solution in the end. (The repositories will need to be re-written in any instance).
Finally, anyone had much experience writing a implementation agnostic repository?
PS. Understand we could drop the ORM entirely and go pure SQL in the meantime, but we have beed asked to ensure we are using an ORM.
EDIT: In this answer I focus on document db's based on the questions title. Of course other NoSQL stores exist with vastly different characteristics (for example graph db's, using event sourcing and others).
It should not be a problem really.
In document db's your entire aggregate should be a single document. This way you have exactly the same guarantees that you need for transactional consistency. Regardless of how many entities change within the aggregate, you're still storing a document. You will need to make sure you enforce some form of optimistic concurrency (through an etag or version or similar), and not a Unit of Work pattern, but after that your transactional requirements are covered.
I can't really comment whether you fully embrase a UoW pattern now, vs rely on ORM implementation etc. This really depends a lot on your current situation and details about implementation. What I can say though is that it is quite probable that you won't need to migrate from normal form (SQL) to documents all in one go. Start from a simple one so that you can see what works for you and what doesn't.
I don't know if implementation-agnostic repositories exist, but that doesn't make a lot of sense to me. The whole point of a repository is encapsulating persistence, so you can't abstract it: there won't be any other responsibility allocated to them. Also, you can't assume that the repository will need to compose different models into the aggregate model: this is specific to platform, so it's not agnostic.
Another final comment: I see in your question that for documents you wrote save_all(aggregates). I'm not sure what you're referring to, but at minimum, each aggregate save should be wrapped in its own transaction, otherwise this operation violates transactional boundary characteristic of Aggregate.
Does the concept of a Unit of Work / Session mean anything in a NoSQL
world?
Yes, it can still be an interesting concept to have. Just because you're using a NoSQL storage doesn't mean that the need for some sort of business transaction management disappears. Many NoSQL databases have drivers or third party libraries that manage change tracking. See RavenDB for instance.
Sure, if you're only ever loading one aggregate per transaction and if your NoSQL unit of storage matches an aggregate perfectly, most of a Unit of Work's features will be less important, but you'll still be facing exceptions to that rule. Besides, the part of a UoW that's relevant in any case is Commit and possibly Abort.
Should we fully embrace the ORM in the meantime, and move the
UOW/Session outside of the Repository into the Application Service?
What I recommend instead is materializing the concept of Unit of Work in a full fledged class:
class UnitOfWork {
void Commit()
{
// Call your ORM persistence here
}
}
Application Services are just the place where the Unit of Work is called, not where it is implemented.
If we do that, what was the level of effort to re-engineer the
Application Service, if we need to migrate to a NoSQL solution in the
end. (The repositories will need to be re-written in any instance).
It depends on a lot of other parameters such as Unit of Work support by your NoSQL API or third party libraries, and similarity in shape between Aggregates and the NoSQL storage. It can range from practically no work to writing a full UoW/change tracking implementation yourself. If the latter, extracting UoW logic from the Repository to a separate class won't be the hardest part of the job.
Finally, anyone had much experience writing a implementation agnostic
repository?
I concur with SKleanthous here - implementation agnostic repos don't make much sense IMO. You've got your repository abstractions (interfaces) which are of course agnostic but when it comes to implementations, you have to address a particular persistent storage.

Stateless Micro services and database

We have a requirement of building stateless micro services which rely on a database cluster to persist data.
What is the approach that is recommended for redundant stateless micro services(for high availability and scalability) using the database cluster. For example: Running multiple copies of version 1.0 Payment service.
Should all the redundant micro services use a common shared DB schema or they should have their own schema? In case of independent DB schema inconsistency among the redundant services may exist.
Also how can the schema upgrade handled in case of common DB schema?
This is a super broad topic, and rather hard to answer in general terms.
However...
A key requirement for a micro service architecture is that each service should be independent from the others. You should be able to deploy, modify, improve, scale your micro service independently from the others.
This means you do not want to share anything other than API definitions. You certainly don't want to share a schema; each service should be able to define its own schema, release new versions, change data types etc. without having to check with the other services. That's almost impossible with a shared schema.
You may not want to share a physical server. Sharing a server means you cannot make independent promises on scalability and up-time; a big part of the micro service approach means that the team that builds it is also responsible for running it. You really want to avoid the "well, it worked in dev, so if it doesn't scale on production, it's the operations team's problem" attitude. Databases - especially clustered, redundant databases - can be expensive, so you might compromise on this if you really need this.
As most microservice solutions use containerization and cloud hosting, it's quite unlikely that you'd have the "one database server to rule them all" sitting around. You may find it much better to have each micro service run its own persistence service, rather than sharing.
The common approach to dealing with inconsistencies is to accept them - but to use CQRS to distribute data between microservices, and make sure the micro services deal with their internal consistency requirements.
This also deals with the "should I upgrade my database when I release a new version?" question. If your observers understand the version for each message, they can make decisions on how to store them. For instance, if version 1.0 uses a different set of attributes to version 1.1, the listener can do the mapping.
In the comments, you ask about consistency. This is a super complex topic - especially in micro service architectures.
If you have, for instance, a "customers" service and an "orders" service, you must make sure that all orders have a valid customer. In a monolithic application, with a single database, and exclusively synchronous interactions, that's easy to enforce at the database level.
In a micro service architecture, where you might have lots of data stores, with no dependencies on each other, and a combination of synchronous and asynchronous calls, it's really hard. This is an inevitable side effect of reducing dependencies between micro services.
The most common approach is "eventual consistency". This typically requires a slightly different application design. For instance, on the "orders" screen, you would invoke first the client microservice (to get client data), and then the "orders" service (to get order details), rather than have a single (large) service call to retrieve everything.

Should we use webservices or do direct database access

Should we use webservices or do direct database access. Ofcourse direct DB access is relatively faster and also with webservices it is good if we have to make for multiple platforms.
is the time significantly high in case of accessing data through a webservice as against a DB call or is it marginally high ?
I would have to disagree with TruthOf42 in that web services are best practices for data access. There is certainly a big shift towards that approach these days, but I don't think common use is the same as best practice. Just because something is common/popular doesn't mean it's the best fit for all situations.
Why use web services?
If you plan on having more than one application use a generic data access layer.
If you plan on exposing your data to external clients.
If you want to draw some hard physical lines between your application and the database.
I would argue making web service calls will always be slower than just writing queries against the database. But you can mitigate the issues with wise network planning and caching.
I agree with Aphelion in that if it's a simple application, then keep it simple.
A good idea would be to make an interface in your code that gets data and you start with a database implementation. If you find you'd like to introduce web services later, then you can keep the same interface and just implement a version that makes web service calls instead of directly dialing the database.
Don't try to optimize something you haven't even written yet. It's best practices to use web services. Doing direct calls to the database just opens you to more security issues.
The most important thing when writing software is writing it well, speed is usually of last concern.

Pluggable database interface

I am working on a project where we are scoping out the specs for an interface to the backend systems of multiple wholesalers. Here is what we are working with,
Each wholesaler has multiple products, upwards of 10,000. And each wholesaler has customized prices for their products.
The list of wholesalers being accessed will keep growing in the future, so potentially 1000s of wholesalers could be accessed by the system.
Wholesalers are geographically dispersed.
The interface to this system will allow the user to select the wholesaler they wish and browse their products.
Product price updates should be reflected on the site in real time. So, if the wholesaler updates the price it should immediately be available on the site.
System should be database agnostic.
The system should be easy to setup on the wholesalers end, and be minimally intrusive in their daily activities.
Initially, I thought about creating databases for each wholesaler on our end, but with potentially 1000s of wholesalers in the future, is this the best option as far as performance and storage.
Would it be better to query the wholesalers database directly instead of storing their data locally? Can we do this and still remain database agnostic?
What would be best technology stack for such an implementation? I need some kind of ORM tool.
Java based frameworks and technologies preferred.
Thanks.
If you want to create a software that can switch the database I would suggest to use Hibernate (or NHibernate if you use .Net).
Hibernate is an ORM which is not dependent to a specific database and this allows you to switch the DB very easy. It is already proven in large applications and well integrated in the Spring framework (but can be used without Spring framework, too). (Spring.net is the equivalent if using .Net)
Spring is a good technology stack to build large scalable applications (contains IoC-Container, Database access layer, transaction management, supports AOP and much more).
Wiki gives you a short overview:
http://en.wikipedia.org/wiki/Hibernate_(Java)
http://en.wikipedia.org/wiki/Spring_Framework
Would it be better to query the wholesalers database directly instead
of storing their data locally?
This depends on the availability and latency for accessing remote data. Databases itself have several posibilities to keep them in sync through multiple server instances. Ask yourself what should/would happen if a wholesaler database goes (partly) offline. Maybe not all data needs to be duplicated.
Can we do this and still remain database agnostic?
Yes, see my answer related to the ORM (N)Hibernate.
What would be best technology stack for such an implementation?
"Best" depends on your requirements. I like Spring. If you go with .Net the built-in ADO.NET Entity Framework might be fit, too.

Data Migration from Legacy Data Structure to New Data Structure

Ok So here is the problem we are facing.
Currently:
We have a ton of Legacy Applications that have direct database access
The data structure in the database is not normalized
The current process / structure is used by almost all applications
What we are trying to implement:
Move all functionality to a RESTful service so no application has direct database access
Implement a normalized data structure
The problem we are having is how to implement this migration not only with the Applications but with the Database as well.
Our current solution is to:
Identify all the CRUD functionality and implement this in the new Web Service
Create the new Applications to replace the Legacy Apps
Point the New Applications to the new Web Service ( Still Pointing to the Old Data Structure )
Migrate the data in the databases to the new Structure
Point the New Applications to the new Web Service ( Point to new Data Structure )
But as we are discussing this process we are looking at having to rewrite the New Web Service twice. Once for the Old Data Structure and Once for the New Data Structure, As currently we could not represent the old Data Structure to fit the new Data Structure for the new Web Service.
I wanted to know if anyone has faced any challenges like this and how did you overcome these types of issues/implementation and such.
EDIT: More explanation of synchronization using bi-directional triggers; updates for syntax, language and clarity.
Preamble
I have faced similar problems on a data model upgrade on a large web application I worked on for 7 years, so I feel your pain. From this experience, I would propose the something a bit different - but hopefully one that will be a lot easier to implement. But first, an observation:
Value to the organisation is the data - data will long outlive all your current applications. The business will constantly invent new ways of getting value out of the data it has captured which will engender new reports, applications and ways of doing business.
So getting the new data structure right should be your most important goal. Don't trade getting the structure right against against other short term development goals, especially:
Operational goals such as rolling out a new service
Report performance (use materialized views, triggers or batch jobs instead)
This structure will change over time so your architecture must allow for frequent additions and infrequent normalizations to it. This means that your data structure and any shared APIs to it (including RESTful services) must be properly versioned.
Why RESTful web services?
You mention that your will "Move all functionality to a RESTful service so no application has direct database access". I need to ask a very important question with respect to the legacy apps: Why is this important and what value has it brought?
I ask because:
You lose ACID transactions (each call is a single transaction unless you implement some horrifically complicated WS-* standards)
Performance degrades: Direct database connections will be faster (no web server work and translations to do) and have less latency (typically 1ms rather than 50-100ms) which will visibly reduce responsiveness in applications written for direct DB connections
The database structure is not abstracted from the RESTful service, because you acknowledge that with the database normalization you have to rewrite the web services and rewrite the applications calling them.
And the other cross-cutting concerns are unchanged:
Manageability: Direct database connections can be monitored and managed with many generic tools here
Security: direct connections are more secure than web services that your developers will write,
Authorization: The database permission model is very advanced and as fine-grained as you could want
Scaleability: The web service is a (only?) direct-connected database application and so scales only as much as the database
You can migrate the database and keep the legacy applications running by maintaining a legacy RESTful API. But what if we can keep the legacy apps without introducing a 'legacy' RESTful service.
Database versioning
Presumably the majority of the 'legacy' applications use SQL to directly access data tables; you may have a number of database views as well.
One approach to the data migration is that the new database (with the new normalized structure in a new schema) presents the old structure as views to the legacy applications, typically from a different schema.
This is actually quite easy to implement, but solves only reporting and read-only functionality. What about legacy application DML? DML can be solved using
Updatable views for simple transformations
Introducing stored procedures where updatable views not possible (eg "CALL insert_emp(?, ?, ?)" rather than "INSERT INTO EMP (col1, col2, col3) VALUES (?, ? ?)".
Have a 'legacy' table that synchronizes with the new database with triggers and DB links.
Having a legacy-format table with bi-directional synchronization to the new format table(s) using triggers is a brute-force solution and relatively ugly.
You end up with identical data in two different schemas (or databases) and the possibility of data going out-of-sync if the synchronization code has bugs - and then you have the classic issues of the "two master" problem. As such, treat this as a last resort, for example when:
The fundamental structure has changed (for example the changing the cardinality of a relation), or
The translation to the legacy format is a complex function (eg if the legacy column is the square of the new-format column value and is set to "4", an updatable view cannot determine if the correct value is +2 or -2).
When such changes are required in your data, there will be some significant change in code and logic somewhere. You could implement in a compatibility layer (advantage: no change to legacy code) or change the legacy app (advantage: data layer is clean). This is a technical decision by the engineering team.
Creating a compatibility database of the legacy structure using the approaches outlined above minimize changes to legacy applications (in some cases, the legacy application continues without any code change at all). This greatly reduces development and testing costs (for which there is no net functional gain to the business), and greatly reduces rollout risk.
It also allows you to concentrate on the real value to the organisation:
The new database structure
New RESTful web services
New applications (potentially build using the RESTful web services)
Positive aspect of web services
Please don't read the above as a diatribe against web services, especially RESTful web services. When used for the right reason, such as for enabling web applications or integration between disparate systems, this is a good architectural solution. However, it might not be the best solution for managing your legacy apps during the data migration.
What it seems like you ought to do is define a new data model ("normalized") and build a mapping from the normalized model back to the legacy model. Then you can replace legacy direct calls with calls on the normalized one at your leisure. This breaks no code.
In parallel, you need to define what amounts to a (cerntralized) legacy db api, and map it to to your normalized model. Now, at your leisure, replace the original legacy db calls with calls on the legacy db API. This breaks no code.
Once the original calls are completely replaced, you can switch the data model over to the real normalized one. This should break no code, since everything is now going against the legacy db API or the normalized db API.
Finally, you can replace the legacy db API calls and related code, with revised code that uses the normalized data API. This requires careful recoding.
To speed all this up, you want an automated code transformation tool to implement the code replacements.
This document seems to have a good overview: http://se-pubs.dbs.uni-leipzig.de/files/Cleve2006CotransformationsinDatabaseApplicationsEvolution.pdf
Firstly, this seems like a very messy situation, and I don't think there's a "clean" solution. I've been through similar situations a couple of times - they weren't much fun.
Firstly, the effort of changing your client apps is going to be significant - if the underlying domain changes (by introducing the concept of an address that is separate from a person, for instance), the client apps also change - it's not just a change in the way you access the data. The best way to avoid this pain is to write your API layer to reflect the business domain model of the future, and glue your old database schema into that; if there are new concepts you cannot reflect using the old data (e.g. "get /app/addresses/addressID"), throw a NotImplemented error. Where you can reflect the new model with the old data, wire it together as best you can, and then re-factor under the covers.
Secondly, that means you need to build versioning into your API as a first-class concern - so you can tell clients that in version 1, features x, y and z throw "NotImplemented" exceptions. Each version should be backwards compatible, but add new features. That way, you can refactor features in version 1 as long as you don't break the service, and implement feature x in version 1.1, feature y in version 1.2 etc. Ideally, have a roadmap for your versions, and notify the client app owners if you're going to stop supporting a version, or release a breaking change.
Thirdly, a set of automated integration tests for your API is the best investment you can make - they confirm that you've not broken features as you refactor.
Hope this is of some use - I don't think there's a single, straightforward answer to your question.

Resources