Restricting data in PouchDB - database

I have an offline ready application that I am currently building in electron.
The core requirements are that all data is restricted (have to be a user to read or write) and that within that data some data is further restricted to a user, (account information, messages, etc...)
Now I do not want to replicate any data offline that a user should not have access to (this is because all the data can be seen using the devtools regardless of restriction) so essentially I only want to sync data to PouchDB's offline store if that user has access to it as well as all the data all users have access to.
Now I have read the following posts/guides but I am still a little confused.
https://pouchdb.com/2015/04/05/filtered-replication.html
https://www.joshmorony.com/creating-a-multiple-user-app-with-pouchdb-couchdb/
Restricting Access to local PouchDB
From my understanding filtering is a bad choice performance wise even though it could do what I want.
Setting up a proxy would work but it then essentially becomes a REST api and the data synchronization falls apart.
And the final option which I think is what I want is to have a database for every user that would contain their private information and then additional databases to hold the information that is available to every user.
The only real question I have with this approach is how is data handled that is private but shared between two users (messages, etc...)
I am more after an overarching view of how the data should be stored as opposed to code examples, just really struggling with the conceptual architecture of the application.

There are many solutions to your problem. One solution looks very promising: IBM Cloudant has started work on Cloudant Envoy, a proxy simulating the CouchDB interface instead of a simple REST API. You can read more about it on the site for Envoy over at ibm.com. A custom replicator for PouchDB is also available on Github.
There's also a blog post on Medium.com on this.
The idea is the same as the much older Couchbase Sync Gateway. Although Couchbase has common roots with CouchDB, I have not tracked if they still support replication with CouchDB.
The easiest way to start would be to create a single database per user on the server, and a common database that you just pull the shared data from. Let me know if you need more info on this solution.

Related

Offline database capability while using firebase

I'm new to Firebase, and I was researching about it to see if it fits our needs. It has everything we need except an offline database. Well, I know that it has the capability to cache changes when the user is offline and then sync them when the user becomes online, but this is not what I'm talking about.
As firebase is costly, we want our free users to be able to use the app only offline and the data should not sync to the cloud no matter the user is online or not, and only use sync for subscribed users.
One solution which we have not yet put much thought into is to use an offline DB like SQLite and:
a) when the user subscribes move the data to firebase
b) if the user cancels the subscription move the data to SQLite
but this solution needs 2 completely different codings the same thing. Extra code for migrating from SQLite to firebase and from firebase to SQLite. Is there a better solution to use the Firestore database and also have a complete offline database functionality?
In my opinion, your solution can work. But there are some situations that you should take into consideration.
a) when the user subscribes move the data to Firebase.
As I understand, it's one or the other. In this case, when a user gets subscribed, you should always consider locking the local SQLite database for writes, until all the data is written on the Firebase servers. When the operation completes, only then you should allow the user to write data in the cloud.
Why is this needed? To always have consistent data.
b) if the user cancels the subscription move the data to SQLite.
If the user cancels the subscription, you might consider using almost the same mechanism as above. But first, you have to clear the local database, as it will contain outdated data, and then copy all data from Firestore right into the SQLite database.
Edit:
You can also take into consideration copying the data from the local cache rather than getting it from the Firebase servers. This will imply no costs.
but this solution needs 2 completely different codings the same thing.
That's correct, but since both, the offline and online databases share the same fields, the operation might not be so complicated as think. Simply attach a real-time listener on a property within the user object, most likely called "subscribed" which can hold the value of true/false, and take actions accordingly when is changed.

How to structure/coordinate multiple databases?

Imagine a large corp with dozens of companies, each with their own website and each website will have their own unique functional requirements
Most data on each website will be specific to that website
Each website can edit its own data
Some data will be shared across all websites
There will be a central CMS that is allowed to edit this data, but other websites can read and use that data
e.g. say you're planning the infrastructure for a company that owns multiple sub-companies that make different kinds of products, some in the same category (cereal, food), others in completely different categories (books, instruments). Some are marketing websites, some are for CRM, some are online stores
there are a list of regulatory requirements that affect all products
each company should manage the status of compliance of its own products to each requirement
when a new requirement surfaces, details regarding that requirement should only be entered once
How would the multiple databases be coordinated?
edit: added more info per Bob's suggestions
Thanks for the incredibly insightful questions!
compliance data is not shared, silo'd within each site
shared data is only on the one enterprise-wide database, they will mostly be "types of [thing]"
no conclusive list of instances where they'll be used but currently it'd be to populate CMS dropdowns for individual sites.
changes to shared data would occur a few times a year.
Ideally changes would be reflected within a few minutes, but an hour or so should be acceptable
very low volume in shared data.
All DBs will be new, decision on which DB is pending current investigation.
Sub-systems will expose REST api
Here are some ways I have seen this handled, you need to think about the implications of each structure based on the details of your particular business domain. All can work, but all have to be carefully set up if they are going to work.
One database for shared information and one for each client for client-specific information. Set up the overall application so that the first thing you put in the application on log in is the client and it connects to the correct client. People might have to also have a way to change the client if users will handled multiples.
Separate servers for each client if they completely need to be siloed. Database changes are by script (and in source control) and are applied to each server as need be. So the changes to the central database might have a job that runs to push any data changes to the other servers
All the data in one database, but making sure each table has a client_id so that the data is always filtered correctly by client. You can set up separate views by client, so that the users can only see the clients they are supposed to see. This only works if the data for each client is substantially in the same form.
And since you are in a regulatory environment, I strongly urge that you create an audit database that is updated by database triggers (never audit from the application, you will lose changes to the data) for each database.
I agree with Chris that, even after both the sets of questions, there is still a big set of possible solutions. For instance, if the databases were the same technology, and the shared data were stored in the same way in each one, you could do db-level replication from the central db to the others. Is it OK to have 2 separate dbs per application (one with shared stuff and one with not-shared?) - this would influence the kind of replication.
Or you could have a purely code solution, where clicking publish in a GUI that updates the central db calls a set of APIs that also update the other dbs. Or micro-services - updating the central db also creates a message on a shared queue, that is picked up by services that each look after a different db and apply the updates in whatever form makes sense for that db.
It depends on (among the things already mentioned) what your organisation's technology strategy is, what technology and skills you already have in-house, and so on.
So this is as much an architecture question as it is a db question.
I don't think this question is sufficiently clear to get a single answer. However there are a few possibilities.
In many cases, where you have shared data you want to have a single point of ownership of that information. It could be in a database, in an excel file (which can then be turned into csv and periodically loaded on all dbs), or some other form. The specifics depend on what is shared exactly.
Now in this case it sounds like you are going to have some sort of legal department in charge of some shared information and they will manage that data, which will then be shared to the other sites. This might be done with an application they manage which aggregates information from the other companies or it could be data which is pushed to their systems.
A final point:
Software is at its best when it facilitates human solutions to human problems, not when it tries to solve those problems directly. In these cases, you probably want a good human solution in place and then to look at what software can do to support that. A lot of the issues (who owns the information?) will already have been solved and you will be simply automating what is already done.

Persisting and keeping mobile app data in snych with online backend

I am building a mobile app using AngularJS and PhoneGap. The app allows the user to access a large amount of data-items. These data-items come with the app in form of a number of .json files.
One use-case is that a user can favorite any of those data-items.
Currently, I store the (ids of) the items that have been favorited in localStorage. It works and it's great and very simple.
But now I would like create an online-backend for the app. By this I mean that the (ids of) the items that have been favorited should also be stored on a server somewhere in some form of database.
Now my question is:
How do I best do this?
How do I keep the localStorage data and online-backend data in synch?
In particular, the user might not have an internet connection at the time were he favorites a data-item. Additionally, if the user favorites x data-items in a row, I would need to make x update calls to the server db, which clearly isn't great.
So, how do people do it?
Does Angular have anything build-in for this?
Is there any plugin?
Any other framework?
This very much seems like a common problem that must have a well-known solution?
I think you've almost got the entire solution. All you need to do is periodically (on app start load the data from the service if available, otherwise use current local storage, then maybe with a timer and on app close update the data if connected) send the JSON out to a service (I generally prefer PHP, but Python, Java, Ruby, Perl, whatever floats your boat) that puts it in a database. If you're concerned with merging synchronization changes you'll need to use timestamps in the data in local storage and in the database to make the right call on what should be inserted vs what should be updated.
I don't think there's a one size fits all solution to the problem, though I imagine someone may have crafted a library that handles the different potential scenarios the configuration may be as complicated as just writing the logic yourself.

Is Redis right for storing and retrieving messages against user a la twitter?

I'm building a web app - primarily in php - but we need to pull down messages from twitter and various other services (email, sms). I'm writing a small service in node.js to handle the twitter connection etc. but am just trying to work out what is best to do with the content that is pulled down.
Right now I'm leaning towards a combination of MySQL for all our standard info with the main PHP app and Redis with the node.js service to store each message against a key that will probably be the username and some sort of unique identifier.
I've used redis before but this data needs to persist rather than being something that can expire like sessions. Redis' in memory nature makes me a bit nervous about this as, over time, with this being our main message store the dataset will quickly become unruly in RAM, will it not?
This blog post gives a good and concise overview for NoSQL-type databases. Perhaps you can find confirmation for or an alternative to Redis there. Since you have not given any numbers on how many and how often you need to pull data from the sources, it's hard to answer from my side.
And, Redis supports two methods of persistence: timed snapshots and an append-only journal files where changes to the db are written to. The second one is the safer alternative.

Any recomendations for an efective way to sync data from one database, to other app's databases?

Here's my problem. I built a web app, and naturally kept the data in a database which describes that app's domain. Afterwords, I built another web app for the same organization, and used a seperate database to describe that app's domain and store data... and naturally a couple more projects came up and for each app I've isolated it's data to a single database. Deveolpment wise, I think it's ok, as I can maintain changes to the data structure and data at the app's database.
Considering these apps belong to the same organization, there tends to be plenty of data replicated between them, like department names, job titles, shop names, etc. Most of these tables hold the same data, but are not exactly the same in each database, and are not always used by all of the apps. Changes to this data, though, needs to be changed at all the apps (sometimes in a diferent ways) creating a growing management "hassle".
So I've been think of a way to get some syncronization between the data. I want an easier management - update at one app (or a central app) and update all the databases as needed by each app - and also a better way to share data between apps (like maybe mash up data from differnt apps in a new app to alow specific analysis). Most of the data I'm refering to is used as contraints more than being core domain concept, describing the organization rather than describing a particular domain.
I'm looking for opinions on some ways to get this done.
My first idea was to grab comun data structures, like the department names' table i mentioned, and stick'em in a core database. Any updates to the data would be done at this database, through a dedicated web app, and I'd apply some sort of Observer or Publisher / Suscriber Pattern for these changes - on changes the app would notify observing apps (through there dedicated webservice) that the changes occured and allow for the app to grab the new data and use it as it needs. GUIDs could be user as a reference to identify the same data throughout the apps. Also, I could build web services for read and search operations that don't need to be in a specific app's database, but could be useful to it.
A second idea would be that each app manage it's own data, and the apps could observe one another. A change in one could notify others that share the same data structure that the change occurred. I could still use some GUIDs and even build services on any of the apps. I think this would also be less excessive in terms of duplication of data, but might be harder to manage as each app would eventually be coupled to other apps, and I would some how have to distribute responsabilities as to which app controls what information.
I'm really curious as to something of this genre of data distibuition and syncing would work and even be recomended. Opions and other ideas are more than welcome!
What you describe here is a typical case for a "Master Data Management" system. EAI vendors (Oracle, TIBCO, IBM) offer such products. They resemble your first solution, being centralised databases with synchronization processes, detecting changes in external data sources, grabbing the changes and synchronizing data out to other external databases. They also provide a user interface to change master data directly.
MDM software are expensive, but you can implement a custom solution which will be - at least initially - cheaper than purchasing one. Both of your solutions make technical sense but there is a difference in their manageability.
The first one is better, if you can dedicate a responsible person/organization to take care of it and the business owners of your services can agree on making changes via this new centralised system.
The second solution shares the responsibility between the service owners. The hard task here is to identify the owner of each type of information (business object).
I cannot advise a solution without a deeper knowledge of your systems and organizations, but I hope I could give some ideas.

Resources