Database for web app with multiple clients

Database for web app with multiple clients - database

In a real-world web app, how does one go about storing data for multiple clients/companies/customers?
Lets assume we have the following collections for one client:
- users
- tasks
How would I extent this system to a second client? Is there a standard approach?
Note: I am using Firestore (no-sql).

We use a separate set of collections for each client. Our data structure works really well for us and looks like this...
/clients/{clientId}/reportingData
/clients/{clientId}/billingData
/clients/{clientId}/privateData
Using security rules, we allow clients to read their reportingData and billingData collections, but not the privateData collection.
However, if you need to query data across multiple clients at the same time (for internal use, for example), then Frank's option 1 would work better, with a clientId field.
We do the same thing with users...
/users/{uid}/publicProfile (anyone can read this, only the user can write)
/users/{uid}/userProfile (only the user can read and write)
/users/{uid}/privateProfile (internal data that the user can't read or write)

You have a few options for implementing such a multi-tenant solution on Cloud Firestore:
Have all clients in a single set of collections
Have a separate set of collections for each client
Have a separate project for each client
There is no approach which is pertinently better or worse.
I recommend that you at least consider having a separate project for each client. Isolating the clients from each other, makes maintenance and (possibly future) billing a lot easier.
Having all clients in a single set of collections is also possible. You'll just have to make sure that clients can't see each other's data. Since your likely accessing the database directly from the client, use security rules to ensure that clients can only access their own data.

Related

How to cache batches of IDs "locally" in a serverless environment?

Traditionally, in a non-serverless environment, I would have the following system. Say I have a custom ID generation protocol for all my models. Say I also have 20 servers scattered around. I give each server a slice of IDs to work with off the whole stack of IDs. When they are done or the server goes down, it returns the IDs back to the system so they don't get wasted. The reason for sending each server a batch of IDs is so that every time a new record is created you don't need to fetch from a central ID server to get the next ID. Instead they have a local set they can work with freely.
How would you do this sort of thing in a serverless system? I am deploying to Vercel and wondering what the appropriate architecture might be for such an ID batching system. There are other use cases for needed a persistent copy of data in a local server, so if you don't like the ID example just imagine another sort of system. How do you solve this optimization problem in a serverless environment?

Serverless is an approach. Like all such things (solutions), it should be matched to the problem - not the other way around. Is this simply a case where serverless is a good solution choice for dealing with 80% of your problem, and that all you need to do is choose something appropriate to deal to the other 20%?
Assuming you have the freedom to do this, can't you just have the serverless parts of the solution consume non-serverless services - e.g. an ID Service?
Separately to this, caching comes to mind - just the general idea of having some data close by which might be mastered somewhere else. Caching patterns like Write Behind would allow you to work with local copies (i.e. immediate consumption) whilst farming out the cache-master communication.

Restricting data in PouchDB

I have an offline ready application that I am currently building in electron.
The core requirements are that all data is restricted (have to be a user to read or write) and that within that data some data is further restricted to a user, (account information, messages, etc...)
Now I do not want to replicate any data offline that a user should not have access to (this is because all the data can be seen using the devtools regardless of restriction) so essentially I only want to sync data to PouchDB's offline store if that user has access to it as well as all the data all users have access to.
Now I have read the following posts/guides but I am still a little confused.
https://pouchdb.com/2015/04/05/filtered-replication.html
https://www.joshmorony.com/creating-a-multiple-user-app-with-pouchdb-couchdb/
Restricting Access to local PouchDB
From my understanding filtering is a bad choice performance wise even though it could do what I want.
Setting up a proxy would work but it then essentially becomes a REST api and the data synchronization falls apart.
And the final option which I think is what I want is to have a database for every user that would contain their private information and then additional databases to hold the information that is available to every user.
The only real question I have with this approach is how is data handled that is private but shared between two users (messages, etc...)
I am more after an overarching view of how the data should be stored as opposed to code examples, just really struggling with the conceptual architecture of the application.

There are many solutions to your problem. One solution looks very promising: IBM Cloudant has started work on Cloudant Envoy, a proxy simulating the CouchDB interface instead of a simple REST API. You can read more about it on the site for Envoy over at ibm.com. A custom replicator for PouchDB is also available on Github.
There's also a blog post on Medium.com on this.
The idea is the same as the much older Couchbase Sync Gateway. Although Couchbase has common roots with CouchDB, I have not tracked if they still support replication with CouchDB.
The easiest way to start would be to create a single database per user on the server, and a common database that you just pull the shared data from. Let me know if you need more info on this solution.

How to get reports from web services in efficient manner

We have a distributed system with 3 sites. Each site has its own services that encapsulates both logic and data.All services are using mysql database as the persistence system and SOAP services. But we get a trouble with database reports since maintaining services encapsulation prevents from accessing database directly. So How to get reports from web services without breaking encapsulation provided by web services and in the same time maintaining efficiency.

Share a common data-structure known by the services and the clients.
I'd implement a very simple serializable data-structure, and have these entities to be interchanged, known between the client and the server(s). And of course all services would output the same data-structures.
If you have already a persistence layer (if not, build one), with DAO/DAL(s) entities, have them to be responsible of querying the data and performing the transformation between the original data to these new common data-structures. A helper class would do that automatically.
What I think it could be this data-structure, is an entity based on a set of rows and columns (array of object instances), plus, an array of columns identifiers, known by both the client and the server, so that your model knows which are the columns being requested by the client.
In this way you could have a client requesting 3 columns of the report, and a different client, might be requesting many others of the same report.
Additionally, I'd of course, not including any HTML in the data, just the raw data, and your clients to be responsible on how to present that data.
This above is a little bit abstract.. but hope it helps you anyway.

Is OData suitable for multi-tenant LOB application?

I'm working on a cloud-based line of business application. Users can upload documents and other types of object to the application. Users upload quite a number of documents and together there are several million docs stored. I use SQL Server.
Today I have a somewhat-restful-API which allow users to pass in a DocumentSearchQuery entity where they supply keyword together with request sort order and paging info. They get a DocumentSearchResult back which is essentially a sorted collection of references to the actual documents.
I now want to extend the search API to other entity types than documents, and I'm looking into using OData for this. But I get the impression that if I use OData, I will face several problems:
There's no built-in limit on what fields users can query which means that either the perf will depend on if they query a indexed field or not, or I will have to implement my own parsing of incoming OData requests to ensure they only query indexed fields. (Since it's a multi-tenant application and they share physical hardware, slow queries are not really acceptable since those affect other customers)
Whatever I use to access data in the backend needs to support IQueryable. I'm currently using Entity Framework which does this, but i will probably use something else in the future. Which means it's likely that I need to do my own parsing of incoming queries again.
There's no built-in support for limiting what data users can access. I need to validate incoming Odata queries to make sure they access data they actually have permission to access.
I don't think I want to go down the road of manually parsing incoming expression trees to make sure they only try to access data which they have access to. This seems cumbersome.
My question is: Considering the above, is using OData a suitable protocol in a multi-tenant environment where customers write their own clients accessing the entities?

I think it is suitable here. Let me give you some opinions about the problems you think you will face:
There's no built-in limit on what fields users can query which means
that either the perf will depend on if they query a indexed field or
not, or I will have to implement my own parsing of incoming OData
requests to ensure they only query indexed fields. (Since it's a
multi-tenant application and they share physical hardware, slow
queries are not really acceptable since those affect other customers)
True. However you can check for allowed fields in the filter to allow the operation or deny it.
Whatever I use to access data in the backend needs to support
IQueryable. I'm currently using Entity Framework which does this, but
i will probably use something else in the future. Which means it's
likely that I need to do my own parsing of incoming queries again.
Yes, there is a provider for EF. That means if you use something else in the future you will need to write your own provider. If you change EF probably you took a decision to early. I don´t recommend WCF DS in that case.
There's no built-in support for limiting what data users can access. I
need to validate incoming Odata queries to make sure they access data
they actually have permission to access.
There isn´t any out-of-the-box support to do that with WCF Data Services, right. However that is part of the authorization mechanism that you will need to implement anyway. But I have good news for you: do it is pretty easy with QueryInterceptors. simply intercepting the query and, based on the user privileges. This is something you will have to implement it independently the technology you use.
My answer: Considering the above, WCF Data Services is a suitable protocol in a multi-tenant environment where customers write their own clients accessing the entities at least you change EF. And you should have in mind the huge effort it saves to you.

Which of these two APIs is the best to use REST or SOAP (for this specific architecture)?

Architecture :
database on a central server which contains a complex hierarchical database structure.
The clients should be able to insert data into tables through the API, The data would be inserted into multiple tables in the database at the same time, and not only into one table.
The clients should be able to retrieve data by using a complex search query.
The clients can upload/download files to the server which could have a size of multiple GBs
would SOAP be better for this job than REST ? can you please explain why ?

Almost all the things you mention are equally achievable using either SOAP or REST, though perhaps a little easier with SOAP. Certainly it's easier to create client APIs for SOAP interfaces; client tooling support is significantly more advanced on the majority of languages.
However, you say that you're wanting to deal with multi-gigabyte upload and download. That's a crucial point as REST is able to handle that sort of thing far more easily. SOAP is almost always tooled in terms of DOM processing, and that means building full messages in memory; you don't ever want to do that with a multi-GB payload.
So go with REST. That's definitely your best option for achieving all your listed objectives.