How to get reports from web services in efficient manner - database

We have a distributed system with 3 sites. Each site has its own services that encapsulates both logic and data.All services are using mysql database as the persistence system and SOAP services. But we get a trouble with database reports since maintaining services encapsulation prevents from accessing database directly. So How to get reports from web services without breaking encapsulation provided by web services and in the same time maintaining efficiency.

Share a common data-structure known by the services and the clients.
I'd implement a very simple serializable data-structure, and have these entities to be interchanged, known between the client and the server(s). And of course all services would output the same data-structures.
If you have already a persistence layer (if not, build one), with DAO/DAL(s) entities, have them to be responsible of querying the data and performing the transformation between the original data to these new common data-structures. A helper class would do that automatically.
What I think it could be this data-structure, is an entity based on a set of rows and columns (array of object instances), plus, an array of columns identifiers, known by both the client and the server, so that your model knows which are the columns being requested by the client.
In this way you could have a client requesting 3 columns of the report, and a different client, might be requesting many others of the same report.
Additionally, I'd of course, not including any HTML in the data, just the raw data, and your clients to be responsible on how to present that data.
This above is a little bit abstract.. but hope it helps you anyway.

Related

UI - Consuming different microservices API and consolidate the list into grid view and allow users to sort the data

Currently in our front end project (AngularJS), we need to consume different endpoints that are built in microservices architecture and show the data in the list view. Then we need to allow users to sort the data based on the columns selected by user. For eg, we are listing 10 columns out of which 6 are rendered from Service A and other 4 columns are pulled from another Service B. Both the services don't have direct relation mapping instead based on the object id Service B returns the data.
Now we have consolidated the list and shown the columns and allowed users to choose columns of their choice. As a next step, we need to allow users to sort any column data seamlessly. Is there any best practice followed in microservices paradigm to retrieve the data from both the services and sort them and show the result.
We have few options like
list all the data at once from both the services and sort the data in frontend. But problem with this approach is, if there are more dataset then user might feel slowness and at times browser can get hanged. We are using AngularJs in our project and already facing slowness when data set grows.
Introduce an intermediate API service(light weight nodejs server) which will helps to coordinate the request and it internally handles requesting data between different services and sends the result back.
Create an intermediate API service which will cache the data and orchestrates the request and responds the data from multiple services.
Can any one just share any other practices can be followed for the above use case? In current microservices trends, all API services are exposed as separate service and it makes frontend world a bit complex to handle services between different APIs and show data to users in UI to interact.
Any suggestions or approaches or hint will be helpful.
Thanks in advance.
Srini
Like you said, there are a few ways to handle the scenario you have. In my opinion the best approach would be option two. It is similar to the Gateway Aggregation Pattern where you introduce a gateway layer to handle the aggregation of your service APIs. The added benefit is that you may be able to park some common functionalities in this gateway layer if required.
Of course, the obvious drawback would be that you now have another layer that needs to be highly available and managed. So do consider the pros and cons carefully before deciding on your approach. For example, if this is the only aggregation that you will ever need, then 3 may be a better option.

Project architecture: service for working only with database

i am just wondering how good is this approach to project architecture:
1) You have N services that do X stuff. But there is one constraint - they dont have their own database and they can not access any database directly.
2) For that i have a DB service which can access DB and do any action against that.
So the worklow is like this: If any service needs something from a database it asks database service for the records.
How well is this kind of architecture? Am i running into serious bottlenecks ?
Rather than put your entire database behind a single service and single interface, think about providing separate services for different parts of your dataset according to interfaces driven by your high-level business rules and data model (e.g. user account data service, orders data service, audit log data service). That way you can mock/scale/deploy these independent parts differently according to need and more easily change the backend storage if required later (e.g. archived order retrieval from different db). Also because the data managed by a service is of a particular type, certain decisions can be made independently for each service (e.g. caching policy - config-type data could be cached, active orders data probably not).
Initially you can implement all of these interfaces in a single service and then separate later, but the key to this approach is getting the interfaces abstracted and segregated cleanly.
This is a pretty typical architecture - It's a good idea to write your service's data access code against an abstraction so that you can unit test with a mocked version of your data.
At the least, it's a good idea to consolidate your data access code in one place so that you can make changes to it easily.

Data retrieval and search accross multiple services

I'm building a system that comprises a multiple heterogeneous services that talk to each other over a network, although in the standard deployment model they are all on the same machine. The UI client for managing the entities within that complex system should be able to display aggregated data from all comprising services while enabling search across that aggregated data.
I'm wondering how to design the data retrieval within this system so that it is scalable as the amount of data to be searched is already high and increases?
I'm thinking about two approaches:
The client queries data from all services on demand and aggregates the results in its layer. In many cases it will have to do joins between data coming from multiple services, so I'm concerned about performance here.
Denormalize the services data in a way so that it is convenient for the client queries and even store aggregations between the multiple services data so that the client doesn't have to do joins on demand. Probably, it would be better to store each service's denormalized data in its own database or cache as thus it would be easier to keep all denormalized data up-to-date. However, I'll need to put the aggregated views across multiple services' data in some other place and I'm concerned about the overhead of keeping this remote cache up-to-date.
Any examples or references to existing architectures that solve similar problems would be highly appreciated. Thanks!
Having an aggregated cache would surely can have better performance but think carefully about be the cost - the synchronization. It will end up that your client (or some remote service that will do this job for the clients) has its own database that synchronizes with the service data (something like implementing own database asynchronous pull replication). Check how the data retrieved from the services can change. The best for you would be if the data is not deleted/modified and only new can be added. It would be also easier if the data do not have to be consistent. Choosing appropriate synchronization mechanism depends on existing architecture and requirements.

Is OData suitable for multi-tenant LOB application?

I'm working on a cloud-based line of business application. Users can upload documents and other types of object to the application. Users upload quite a number of documents and together there are several million docs stored. I use SQL Server.
Today I have a somewhat-restful-API which allow users to pass in a DocumentSearchQuery entity where they supply keyword together with request sort order and paging info. They get a DocumentSearchResult back which is essentially a sorted collection of references to the actual documents.
I now want to extend the search API to other entity types than documents, and I'm looking into using OData for this. But I get the impression that if I use OData, I will face several problems:
There's no built-in limit on what fields users can query which means that either the perf will depend on if they query a indexed field or not, or I will have to implement my own parsing of incoming OData requests to ensure they only query indexed fields. (Since it's a multi-tenant application and they share physical hardware, slow queries are not really acceptable since those affect other customers)
Whatever I use to access data in the backend needs to support IQueryable. I'm currently using Entity Framework which does this, but i will probably use something else in the future. Which means it's likely that I need to do my own parsing of incoming queries again.
There's no built-in support for limiting what data users can access. I need to validate incoming Odata queries to make sure they access data they actually have permission to access.
I don't think I want to go down the road of manually parsing incoming expression trees to make sure they only try to access data which they have access to. This seems cumbersome.
My question is: Considering the above, is using OData a suitable protocol in a multi-tenant environment where customers write their own clients accessing the entities?
I think it is suitable here. Let me give you some opinions about the problems you think you will face:
There's no built-in limit on what fields users can query which means
that either the perf will depend on if they query a indexed field or
not, or I will have to implement my own parsing of incoming OData
requests to ensure they only query indexed fields. (Since it's a
multi-tenant application and they share physical hardware, slow
queries are not really acceptable since those affect other customers)
True. However you can check for allowed fields in the filter to allow the operation or deny it.
Whatever I use to access data in the backend needs to support
IQueryable. I'm currently using Entity Framework which does this, but
i will probably use something else in the future. Which means it's
likely that I need to do my own parsing of incoming queries again.
Yes, there is a provider for EF. That means if you use something else in the future you will need to write your own provider. If you change EF probably you took a decision to early. I don´t recommend WCF DS in that case.
There's no built-in support for limiting what data users can access. I
need to validate incoming Odata queries to make sure they access data
they actually have permission to access.
There isn´t any out-of-the-box support to do that with WCF Data Services, right. However that is part of the authorization mechanism that you will need to implement anyway. But I have good news for you: do it is pretty easy with QueryInterceptors. simply intercepting the query and, based on the user privileges. This is something you will have to implement it independently the technology you use.
My answer: Considering the above, WCF Data Services is a suitable protocol in a multi-tenant environment where customers write their own clients accessing the entities at least you change EF. And you should have in mind the huge effort it saves to you.

Data Migration from Legacy Data Structure to New Data Structure

Ok So here is the problem we are facing.
Currently:
We have a ton of Legacy Applications that have direct database access
The data structure in the database is not normalized
The current process / structure is used by almost all applications
What we are trying to implement:
Move all functionality to a RESTful service so no application has direct database access
Implement a normalized data structure
The problem we are having is how to implement this migration not only with the Applications but with the Database as well.
Our current solution is to:
Identify all the CRUD functionality and implement this in the new Web Service
Create the new Applications to replace the Legacy Apps
Point the New Applications to the new Web Service ( Still Pointing to the Old Data Structure )
Migrate the data in the databases to the new Structure
Point the New Applications to the new Web Service ( Point to new Data Structure )
But as we are discussing this process we are looking at having to rewrite the New Web Service twice. Once for the Old Data Structure and Once for the New Data Structure, As currently we could not represent the old Data Structure to fit the new Data Structure for the new Web Service.
I wanted to know if anyone has faced any challenges like this and how did you overcome these types of issues/implementation and such.
EDIT: More explanation of synchronization using bi-directional triggers; updates for syntax, language and clarity.
Preamble
I have faced similar problems on a data model upgrade on a large web application I worked on for 7 years, so I feel your pain. From this experience, I would propose the something a bit different - but hopefully one that will be a lot easier to implement. But first, an observation:
Value to the organisation is the data - data will long outlive all your current applications. The business will constantly invent new ways of getting value out of the data it has captured which will engender new reports, applications and ways of doing business.
So getting the new data structure right should be your most important goal. Don't trade getting the structure right against against other short term development goals, especially:
Operational goals such as rolling out a new service
Report performance (use materialized views, triggers or batch jobs instead)
This structure will change over time so your architecture must allow for frequent additions and infrequent normalizations to it. This means that your data structure and any shared APIs to it (including RESTful services) must be properly versioned.
Why RESTful web services?
You mention that your will "Move all functionality to a RESTful service so no application has direct database access". I need to ask a very important question with respect to the legacy apps: Why is this important and what value has it brought?
I ask because:
You lose ACID transactions (each call is a single transaction unless you implement some horrifically complicated WS-* standards)
Performance degrades: Direct database connections will be faster (no web server work and translations to do) and have less latency (typically 1ms rather than 50-100ms) which will visibly reduce responsiveness in applications written for direct DB connections
The database structure is not abstracted from the RESTful service, because you acknowledge that with the database normalization you have to rewrite the web services and rewrite the applications calling them.
And the other cross-cutting concerns are unchanged:
Manageability: Direct database connections can be monitored and managed with many generic tools here
Security: direct connections are more secure than web services that your developers will write,
Authorization: The database permission model is very advanced and as fine-grained as you could want
Scaleability: The web service is a (only?) direct-connected database application and so scales only as much as the database
You can migrate the database and keep the legacy applications running by maintaining a legacy RESTful API. But what if we can keep the legacy apps without introducing a 'legacy' RESTful service.
Database versioning
Presumably the majority of the 'legacy' applications use SQL to directly access data tables; you may have a number of database views as well.
One approach to the data migration is that the new database (with the new normalized structure in a new schema) presents the old structure as views to the legacy applications, typically from a different schema.
This is actually quite easy to implement, but solves only reporting and read-only functionality. What about legacy application DML? DML can be solved using
Updatable views for simple transformations
Introducing stored procedures where updatable views not possible (eg "CALL insert_emp(?, ?, ?)" rather than "INSERT INTO EMP (col1, col2, col3) VALUES (?, ? ?)".
Have a 'legacy' table that synchronizes with the new database with triggers and DB links.
Having a legacy-format table with bi-directional synchronization to the new format table(s) using triggers is a brute-force solution and relatively ugly.
You end up with identical data in two different schemas (or databases) and the possibility of data going out-of-sync if the synchronization code has bugs - and then you have the classic issues of the "two master" problem. As such, treat this as a last resort, for example when:
The fundamental structure has changed (for example the changing the cardinality of a relation), or
The translation to the legacy format is a complex function (eg if the legacy column is the square of the new-format column value and is set to "4", an updatable view cannot determine if the correct value is +2 or -2).
When such changes are required in your data, there will be some significant change in code and logic somewhere. You could implement in a compatibility layer (advantage: no change to legacy code) or change the legacy app (advantage: data layer is clean). This is a technical decision by the engineering team.
Creating a compatibility database of the legacy structure using the approaches outlined above minimize changes to legacy applications (in some cases, the legacy application continues without any code change at all). This greatly reduces development and testing costs (for which there is no net functional gain to the business), and greatly reduces rollout risk.
It also allows you to concentrate on the real value to the organisation:
The new database structure
New RESTful web services
New applications (potentially build using the RESTful web services)
Positive aspect of web services
Please don't read the above as a diatribe against web services, especially RESTful web services. When used for the right reason, such as for enabling web applications or integration between disparate systems, this is a good architectural solution. However, it might not be the best solution for managing your legacy apps during the data migration.
What it seems like you ought to do is define a new data model ("normalized") and build a mapping from the normalized model back to the legacy model. Then you can replace legacy direct calls with calls on the normalized one at your leisure. This breaks no code.
In parallel, you need to define what amounts to a (cerntralized) legacy db api, and map it to to your normalized model. Now, at your leisure, replace the original legacy db calls with calls on the legacy db API. This breaks no code.
Once the original calls are completely replaced, you can switch the data model over to the real normalized one. This should break no code, since everything is now going against the legacy db API or the normalized db API.
Finally, you can replace the legacy db API calls and related code, with revised code that uses the normalized data API. This requires careful recoding.
To speed all this up, you want an automated code transformation tool to implement the code replacements.
This document seems to have a good overview: http://se-pubs.dbs.uni-leipzig.de/files/Cleve2006CotransformationsinDatabaseApplicationsEvolution.pdf
Firstly, this seems like a very messy situation, and I don't think there's a "clean" solution. I've been through similar situations a couple of times - they weren't much fun.
Firstly, the effort of changing your client apps is going to be significant - if the underlying domain changes (by introducing the concept of an address that is separate from a person, for instance), the client apps also change - it's not just a change in the way you access the data. The best way to avoid this pain is to write your API layer to reflect the business domain model of the future, and glue your old database schema into that; if there are new concepts you cannot reflect using the old data (e.g. "get /app/addresses/addressID"), throw a NotImplemented error. Where you can reflect the new model with the old data, wire it together as best you can, and then re-factor under the covers.
Secondly, that means you need to build versioning into your API as a first-class concern - so you can tell clients that in version 1, features x, y and z throw "NotImplemented" exceptions. Each version should be backwards compatible, but add new features. That way, you can refactor features in version 1 as long as you don't break the service, and implement feature x in version 1.1, feature y in version 1.2 etc. Ideally, have a roadmap for your versions, and notify the client app owners if you're going to stop supporting a version, or release a breaking change.
Thirdly, a set of automated integration tests for your API is the best investment you can make - they confirm that you've not broken features as you refactor.
Hope this is of some use - I don't think there's a single, straightforward answer to your question.

Resources