REST status code and eventual consistency? - google-app-engine

I have a RESTful web service that runs on the Google App Engine, and uses JPA to store entities in the GAE Data Store.
New entities are created using a POST request (as the server will generate the entity ID).
However, I am uncertain as to the best status code to return, as the GAE DS is eventual consistent. I have considered the following:
200 OK: RFC states that the response body should contain “an entity describing or containing the result of the action”. This is achievable as the entity is updated with it's generated ID when it is persisted to the DS, therefore it is possible to serialize and return the updated entity straight away. However, subsequent GET requests for that entity by ID may fail as all nodes may not yet have reached consistency (this has been observed as a real world problem for my client application).
201 Created: As above, returning a URI for the new entity may cause the client problems if consistency has not yet been reached.
202 Accepted: Would eliminate the problems discussed above, but would not be able to inform the client of the ID of the new entity.
What would be considered best practice in this scenario?

A get by key will always be consistent, so a 200 response would be Ok based on your criteria unless there is a problem in google land. Are you certain you observed problems are from gets rather than queries. There is a difference between a query selecting a KEY vs a GET by key.
For a query to be consistent it must be an ancestor query, alternately a GET is consistent, anything else may see inconsistent data as indexes have yet to be updated.
This is all assuming there isn't an actual problem in google land. We have seen problems in the past, where datacenters where late replicating, and eventual consistancy was very late, sometimes even hours.
But you have no way of knowing that, so you either have to assume all is OK, or take an extremely pessimistic approach.

It depends on which JSON REST Protocoll you are using. Just always returning a json Object is not very RESTful.
You should look at some of these:
jsonapi.org/format/
http://stateless.co/hal_specification.html
http://amundsen.com/media-types/collection/
To answer you Question:
I would prefer using a format, where the Resource itself is aware of it's URL, so I would use 201 but return also the whole ressource.
The easiest way would be be to use jsonapi with a convenious url schema, so you are able to find a ressource by url because you know the id.

Related

Datomic ids in datascript

I'm using datomic on the server side, with multiple reagent atoms on the client, and now looking at trying datascript on the client.
Currently, I'm passing across a nested structure via an initial api load, which contains the result of a datomic pull query. It's pretty concise, and works fine.
However, now looking to explore the potential benefits of datascript. Selling point there is it seems to allow to retain normalisation right down to the attribute level. However, I come across an initial hurdle. Datascript isn't, as I'd imagined (perhaps, hoped...), a way to just subset your datomic db and replicate it on the client. Problem is, datomic's entity ids cannot be shared to datascript, specifically - when you transact! entities into datascript, a new eid (datascript's) is issued for each entity.
I haven't worked through all of the consequences yet but it appears it would be necessary to store :datomic-id in datascript, in addition to datascript's own newly issued :db/id, and ref types are going to use datascript's id, not datomics. This potentially complicates synchronisation back to datomic, feels like it could create a lot of potential gotchas, and isn't as isomorphic as I'd hoped. But still working on it. Can anyone share experience here? Maybe there's a solution...
Update:
wonder if a solution is to ban use of datomic's :db/id on the client, enforcing this by filtering them out of initial load; not passing them to client at all. Then any client -> server communication would have to use the (server-generated) slugs instead, which are passed in the initial load.
So, all entities would have different ids on the client, but we ban passage of server id to client, so client id if accidentally passed to server should probably say eid not found. There are likely more issues with this, haven't worked it right through yet.
You also have to think in entities, not datoms, when passing to & inserting to client, so as to create the correct refs there (or perhaps could insert a tree, if wrangle that).
So I've discovered that the datomic/datascript partnership certainly isn't just a case of 'serialise a piece of your database' - that might work if using datascript on the server, which is not the use case here at all (db persistence being required).
If I remember correctly, Datomic uses all 64 bits for entity ids, but in JavaScript (and by extension in DataScript) there’re only 53-bit integers max. So some sort of translation layer is necessary either way, no way around it.
P.S. you can totally specify :db/id to whatever you want in DataScript and it’ll use that instead of generating its own. Just make sure to fit in 53 bits.

AngularJS + Breeze + Security

We are trying to use AngularJS with Breeze, with a .NET backend. We have gotten things hooked up working together. However, we are having trouble figuring out how to lock things down based on the user role and the user's own data.
Can anyone point us in the general direction? We couldn't find anything explicitly in Breeze's documention.
There is no reason why Breeze should be insecure. Security is orthogonal. My question remains: what are your concerns?
Update 2 March 2015
Thanks for the clarifying comment ... which reflects concerns that are widely share. I really am going to have to write about this at length in our documentation.
Fortunately, I believe I can ease your mind about the specific issues you raised.
BreezeJS, the client library, can only reach the data that your server allows the current user to access. It's the server's job to grant or refuse such requests.
This is fundamentally the same story for a client written with any technology talking to a server written with any technology. If the server has a "Customers" endpoint, than a client can request all of your customers and will receive them unless you guard that endpoint with logic on the server. This is true with or without Breeze.
You may be thinking that the metadata describes your entire database schema and therefore exposes the entire database to Breeze client requests. That statement is not true on a couple of counts.
First, even if the client knows about your entire database schema, it can't do anything with that knowledge unless you go to the trouble of exposing every table in your web api with unguarded endpoints. This is entirely within your control and its not something you can do by accident.
Second, there is no good reason to send metadata that describe your entire database. If you let the server generate the metadata based on the Entity Framework model, you can easily limit the size and shape of that model to the subset of the database that you want to expose in your client-facing api.
After you've narrowed the model and the web api to the size and shape appropriate for your application, you must take the next step ... the step you'd take for any web api imaginable ... which is to guard the endpoints.
At a minimum that means ensuring that the user is authenticated and authorized to make requests of each endpoint. But it also means preventing unwanted responses even to authorized user requests. For example, you might want to limit on the server the number of Customers that can be returned for any given customer query. You might want to throttle the number of requests that you'll process in a single interval of time. You might want to filter the customers down to just those few that the user is allowed to see.
The techniques for doing these things are all part of the ASP.NET Web API itself, having nothing to do with Breeze whatsoever. You'll want to investigate the options that Web API has to offer.
The update side of things is actually much easier to manage with Breeze components for ASP.NET Web API. The conventional Breeze update mechanism is a batch post to a single SaveChanges endpoint. In other words, the surface area of the attack can be limited to a single endpoint. The Breeze SaveChanges method for .NET offers two interception points for security and integrity checks:
BeforeSaveEntity where you can inspect and confirm every entity individually before it gets saved to the database.
BeforeSaveEntities where you can inspect the entire batch as a whole ... to ensure that the save request is cohesive and legitimate. This is something you can't do in a simple REST-ish api where PUT/POST/DELETE requests arrive as separate, autonomous events
The Breeze query language is highly expressive so it is possible that the client may query the server for something that you were not expecting. The expand clause is probably the most "dangerous" in this regard. Someone can query the Customer endpoint and get their related Orders, their related OrderDetails, the related Products, etc ... all at the same time.
That is a great power and with it comes responsibility. You may choose to withhold that power by refusing to allow expand queries. You can refuse select queries that can "expand" by selecting related entities. The ASP.NET Web API makes it easy to impose these restriction.
Alternatively, you can allow expand in some cases and not others. Or you can inspect the query request within the GET endpoint's implementation and refuse it if it fails your security checks.
You could decide that you don't want certain entity types to be "queryable" at all. You can create just the specialized GET endpoints you need to support safe access to those highly sensitive types. If the supporting methods don't return IQueryable, neither Breeze nor Web API will attempt to turn the OData query parameters into LINQ queries. These endpoints look and behave exactly like the traditional REST-ish apis that are familiar to you now. And the Breeze client will be happy to play along. When you compose a Breeze query, the client doesn't know whether the server will honor that request. With Breeze you can compose any request you want and send it to any HTTP endpoint you desire. You're not limited to OData-style queries.
You don't need ONE approach for querying. You decide what entity types are exposed, how, and under what conditions. You can and should write the guard logic to ensure that the proper data are returned by a query or updated by a save ... just as you would for any web api. Both Breeze and Web API give you a rich set of tools for writing such guard logic. Your options are unbounded.
Finally, I observe that Breeze-oriented apis tend to be much smaller than the typical RESTy api ... that is, they offer fewer endpoints and (in this sense) a smaller surface area. As a practical matter, that means you can concentrate your server-side security budget on fewer methods and potentially improve both the quality of that code and your capacity to scrutinize your api's security risks.

Sending out Database Document Ids (Security)

I have a web app that stores objects in a database and then sends emails based on changes to those objects. For debugging and tracking, I am thinking of including the Document Id in the email metadata. Is there a security risk here? I could encrypt it (AES-256).
In general, I realize that security through obscurity isn't good practice, but I am wondering if I should still be careful with Document Ids.
For clarity, I am using CouchDB, but I think this can apply to databases in general.
By default, CouchDB uses UUIDs with a UTC time prefix. The worst you can leak there is the time the document was created, and you will be able to correlate about 1k worth of IDs likely having been produced on the same machine.
You can change this in the CouchDB configuration to use purely 128bit random UUIDs by setting the algorithm setting within the uuids section to random. For more information see the CouchDB Docs. Nothing should be possible to be gained from them.
Edit: If you choose your own document IDs, of course, you leak whatever you put in there :)
Compare Convenience and Security:
Convenience:
how useful is it for you having the document id in the mail?
can you quickly get useful information / the document having the ID ?
does encrypting/hashing it mean it's harder to get the actual database document? (answer here is yes unless you have a nice lookup form/something which takes the hash directly, avoid manual steps )
Security:
having a document ID what could I possibly do that's bad?
let's say you have a web application to look at documents..you have the same ID in a URL, it can't be considered 'secret'
if I have the ID can I access the 'document' or some other information I shouldn't be able to access. Hint: you should always properly check rights, if that's done then you have no problem.
as long as an ID isn't considered 'secret', meaning there aren't any security checks purely on ID, you should have no problems.
do you care if someone finds out the time a document was created? ( from Jan Lehnardt's answer )

Is OData suitable for multi-tenant LOB application?

I'm working on a cloud-based line of business application. Users can upload documents and other types of object to the application. Users upload quite a number of documents and together there are several million docs stored. I use SQL Server.
Today I have a somewhat-restful-API which allow users to pass in a DocumentSearchQuery entity where they supply keyword together with request sort order and paging info. They get a DocumentSearchResult back which is essentially a sorted collection of references to the actual documents.
I now want to extend the search API to other entity types than documents, and I'm looking into using OData for this. But I get the impression that if I use OData, I will face several problems:
There's no built-in limit on what fields users can query which means that either the perf will depend on if they query a indexed field or not, or I will have to implement my own parsing of incoming OData requests to ensure they only query indexed fields. (Since it's a multi-tenant application and they share physical hardware, slow queries are not really acceptable since those affect other customers)
Whatever I use to access data in the backend needs to support IQueryable. I'm currently using Entity Framework which does this, but i will probably use something else in the future. Which means it's likely that I need to do my own parsing of incoming queries again.
There's no built-in support for limiting what data users can access. I need to validate incoming Odata queries to make sure they access data they actually have permission to access.
I don't think I want to go down the road of manually parsing incoming expression trees to make sure they only try to access data which they have access to. This seems cumbersome.
My question is: Considering the above, is using OData a suitable protocol in a multi-tenant environment where customers write their own clients accessing the entities?
I think it is suitable here. Let me give you some opinions about the problems you think you will face:
There's no built-in limit on what fields users can query which means
that either the perf will depend on if they query a indexed field or
not, or I will have to implement my own parsing of incoming OData
requests to ensure they only query indexed fields. (Since it's a
multi-tenant application and they share physical hardware, slow
queries are not really acceptable since those affect other customers)
True. However you can check for allowed fields in the filter to allow the operation or deny it.
Whatever I use to access data in the backend needs to support
IQueryable. I'm currently using Entity Framework which does this, but
i will probably use something else in the future. Which means it's
likely that I need to do my own parsing of incoming queries again.
Yes, there is a provider for EF. That means if you use something else in the future you will need to write your own provider. If you change EF probably you took a decision to early. I don´t recommend WCF DS in that case.
There's no built-in support for limiting what data users can access. I
need to validate incoming Odata queries to make sure they access data
they actually have permission to access.
There isn´t any out-of-the-box support to do that with WCF Data Services, right. However that is part of the authorization mechanism that you will need to implement anyway. But I have good news for you: do it is pretty easy with QueryInterceptors. simply intercepting the query and, based on the user privileges. This is something you will have to implement it independently the technology you use.
My answer: Considering the above, WCF Data Services is a suitable protocol in a multi-tenant environment where customers write their own clients accessing the entities at least you change EF. And you should have in mind the huge effort it saves to you.

The best practice for communication with database - POST or GET?

For example, I want to search/insert/get/delete data from database and I'm working with wcf RESTful service.
I have one method for getting data from the table, one method for searching in the table, one method for inserting data in the table and one method for deleting data from the table.
I know that every of these methods can be POST or GET.
But, what is smartest? What is the best practice?
My opinion is that the search and the get method should be GET. The insert and the DELETE method should be POST.
Am I right?
You are right. The thing about GET is that it should be idempotent as the client (browser) is free to send repeat GETs anytime they want. However the POST can only be sent once (according to the rules).
So anything that changes anything could be a POST. Strictly speaking the delete could be a GET as well, as the resend of the GET will not hurt the delete, but generally it's better if you respect the spirit of the HTTP protocol. See the HTTP RFC 2616 for more details.
Wiki has a good overview of the HTTP verbs and their use.
If I were you, I'd use:
GET for search and get operations (since they will not modify data; it's safe to call these operations multiple times)
POST for the insert operation
DELETE for the delete operation
(IIS has no problem with the DELETE verb.)
Yes, that's the convention.
Use POST for operations that change data or system state. Use GET for queries that don't change anything.
Rails, for example, enhances this by also using PUT and DELETE, but this is not supported by most webservers (so there's a workaround for this).
References:
Nginx does not include support for PUT and DELETE by default: sorry, only Russian doc is available.
Same for Apache.
These two have 70% of the market.

Resources