How to make App Engine Datastore private - google-app-engine

I'm developing an App Engine app that offers users to keep a diary.
Now, I noticed that I can check all data in datastore through Developers Console.
This is not good for a diary app for privacy.
So I want to know how to make datastore private to prevent me from checking users' data.
Please help me.

This is a little bit tricky since the code can read the data in the datastore and so, by definition, anyone who can update the running code can also read the data in the datastore; however, there are ways that you can at least make it more difficult to inadvertently examine the data (though accessing the data will still be technically possible for you or any of the owners to do). The simplest way is to encrypt the data before storing it within the datastore model objects (and decrypting it when you read the data from the model objects); however, this will make indexed fields no longer work if you do that (you will need to decide whether that content really needs to be indexable or whether it is worthwhile to add manual indexing).
If you want data to not be readable by you at all, then you will need to encrypt/decrypt the data with a key that is only available to your application while the user is interacting with it (e.g. encrypting the data in the client that communicates with your server); however, you need to be aware that this will make any sort of indexing or background processing of the data impossible.

The only way to prevent you from viewing data in the datastore is to remove you from the developers of the app. A developer can always extract data if he wants to, either by looking it at directly in the Datastore viewer or by writing code that can read/forward this data.

Related

Storing Application Configuration in AD

I am trying to write a small application that will run on all the domain controllers at my company.
Since all the DCs need to have the same, fairly static config, I thought it might be sane to store the configuration in AD itself. I imagine writing a GUI config editor that manipulates the AD based config.
At first glance, Application Partitions would seem like the right tool for the job.
The first question is: is this just generally a terrible idea? Would pro sysadmins get angry at doing this? Or will this require some high-inertia operation like schema changes?
The second question is: is there a specific object type that would be well suited for storing either JSON blobs or key-value pairs?
And the last question is: Are there better alternatives?
I found a post from a decade ago which touches on this, but things can change rather a lot after 3 major OS releases.
This only might make sense if you were planning on storing configuration for a user on the user's AD object. But even then, anyone in your organization who has access to update AD will be able to change those values in ways that your application may not expect.
is there a specific object type that would be well suited for storing either JSON blobs or key-value pairs?
No AD attribute is designed for that. At best, you might be able to find a string attribute that has a big enough max length that you could store some JSON value. But performance would be terrible if you want to search for a JSON value using that attribute.
Are there better alternatives?
Yes. The best solution is to use a dedicated database for your application. You can structure it the way you need, and restrict access to only your application.

Message storage duplication for messaging systems

In many sub-system designs for messaging applications (twitter, facebook e.t.c) I notice duplication of where user message history is stored. On other hand they use tokenizing indexer like ElasticSeach or Solr. It's good for search. On other hand still use some sort of DB for history. Why to duplicate? Why the same instance of ES/Solr/EarlyBird can not be used for history? It's in fact able to.
The usual problem is the following - you want to search and also ideally you want to try index data in a different manner (e.g. wipe index and try new awesome analyzer, that you forgot to include initially). Separating data source and index from each other makes system less coupled. You're not afraid, that you will lose data in the Elasticsearch/Solr.
I am usually strongly against calling Elasticsearch/Solr a database. Since in fact, it's not. For example none of them have support for transactions, which makes your life harder, if you want to update multiple documents following standard relational logic.
Last, but not least - one of the hardest operation in Elasticsearch/Solr is to retrieve stored values, since it's not much optimised to do so, especially if you want to return 10k documents at once. In this case separate datasource would also help, since you will be able to return only matched document ids from Elasticsearch/Solr and later retrieve needed content from datasource and return it to the user.
Summary is just simple - Elasticsearch/Solr should be more think of as a search engines, not data storage.
True that ES is NOT a database per se and will never be. But no one says you cannot use it as such, and many people actually do. It really depends on your specific use case(s), and in the end it's all a question of the trade-offs you are ready to make to support your specific needs. As with pretty much any technology in general, there is no one-size-fits-all approach and with ES (and the like) it's no different.
A primary source of truth might not necessarily be a relational DBMS and they are not necessarily "duplicating" the data in the sense that you meant, it can be anything that has a copy of your data and allows you to rebuild your ES indexes in case something goes wrong. I've seen many many different "sources of truth". It could simply be:
your raw flat files containing your historical logs or business data
Kafka topics that you can replay anytime easily
a snapshot that you take from ES on a regular basis
a relational DB
you name it...
The point is that if something goes wrong for any reason (and that happens), you want to be able to recreate your ES indexes, be it from a real DB, from backups or from raw data. You should see that as a safety net. Even if all you have is a MySQL DB, you usually have a backup of it, so you're already "duplicating" the data in some way.
One thing that you need to think of, though, when architecting your system, is that you might not necessarily need to have the entirety of your data in ES, since ES is a search and analytics engine, you should only store in there what is necessary to support your search and analytics needs and be able to recreate that information anytime. In the end, ES is just a subsystem of your whole architecture, just like your DB, your messaging queue or your web server.
Also worth reading: Using ElasticSeach as primary source for part of my DB

Restricting data in PouchDB

I have an offline ready application that I am currently building in electron.
The core requirements are that all data is restricted (have to be a user to read or write) and that within that data some data is further restricted to a user, (account information, messages, etc...)
Now I do not want to replicate any data offline that a user should not have access to (this is because all the data can be seen using the devtools regardless of restriction) so essentially I only want to sync data to PouchDB's offline store if that user has access to it as well as all the data all users have access to.
Now I have read the following posts/guides but I am still a little confused.
https://pouchdb.com/2015/04/05/filtered-replication.html
https://www.joshmorony.com/creating-a-multiple-user-app-with-pouchdb-couchdb/
Restricting Access to local PouchDB
From my understanding filtering is a bad choice performance wise even though it could do what I want.
Setting up a proxy would work but it then essentially becomes a REST api and the data synchronization falls apart.
And the final option which I think is what I want is to have a database for every user that would contain their private information and then additional databases to hold the information that is available to every user.
The only real question I have with this approach is how is data handled that is private but shared between two users (messages, etc...)
I am more after an overarching view of how the data should be stored as opposed to code examples, just really struggling with the conceptual architecture of the application.
There are many solutions to your problem. One solution looks very promising: IBM Cloudant has started work on Cloudant Envoy, a proxy simulating the CouchDB interface instead of a simple REST API. You can read more about it on the site for Envoy over at ibm.com. A custom replicator for PouchDB is also available on Github.
There's also a blog post on Medium.com on this.
The idea is the same as the much older Couchbase Sync Gateway. Although Couchbase has common roots with CouchDB, I have not tracked if they still support replication with CouchDB.
The easiest way to start would be to create a single database per user on the server, and a common database that you just pull the shared data from. Let me know if you need more info on this solution.

OrderCloud.io API XP Size Limit

I'm part of a team developing an ecommerce site through the ordercloud.io API. I'm trying to figure out how much information I can store in a JSON object's xp key. The project I'm working on requires some pretty custom configuration and data storage, and I want to make sure the process I come up with will scale well.
Thanks!
The XP value in OrderCloud is limited by a maximum string size of 8000 characters. However, you really shouldn't try to reach this value because it may drain on the performance of your API requests (that's a lot of data to send over the wire, especially if you are getting a list of objects).
Make sure you review existing features to see if there is a way to accomplish what you are trying to store/behavior you are trying to create. Or consider storing in a separate location (your own db or an integration that helps fulfill the feature).

Persisting and keeping mobile app data in snych with online backend

I am building a mobile app using AngularJS and PhoneGap. The app allows the user to access a large amount of data-items. These data-items come with the app in form of a number of .json files.
One use-case is that a user can favorite any of those data-items.
Currently, I store the (ids of) the items that have been favorited in localStorage. It works and it's great and very simple.
But now I would like create an online-backend for the app. By this I mean that the (ids of) the items that have been favorited should also be stored on a server somewhere in some form of database.
Now my question is:
How do I best do this?
How do I keep the localStorage data and online-backend data in synch?
In particular, the user might not have an internet connection at the time were he favorites a data-item. Additionally, if the user favorites x data-items in a row, I would need to make x update calls to the server db, which clearly isn't great.
So, how do people do it?
Does Angular have anything build-in for this?
Is there any plugin?
Any other framework?
This very much seems like a common problem that must have a well-known solution?
I think you've almost got the entire solution. All you need to do is periodically (on app start load the data from the service if available, otherwise use current local storage, then maybe with a timer and on app close update the data if connected) send the JSON out to a service (I generally prefer PHP, but Python, Java, Ruby, Perl, whatever floats your boat) that puts it in a database. If you're concerned with merging synchronization changes you'll need to use timestamps in the data in local storage and in the database to make the right call on what should be inserted vs what should be updated.
I don't think there's a one size fits all solution to the problem, though I imagine someone may have crafted a library that handles the different potential scenarios the configuration may be as complicated as just writing the logic yourself.

Resources