Google App Engine Servlet Design - google-app-engine

I have built a server on GAE that handles 6 different types requests over HTTP POST, all of which involve either creating, updating, or deleting objects from the datastore. What is the best design for this? I will tell you my current design and express a couple others.
My current design has all requests sent to the same servlet, and uses an "action" parameter as part of the POST to distinguish and handle the different requests. Code the server should run is included here.
e.g.
public void doPost(HttpServletRequest request, HttpServletResponse response) {
if (request.getParameter("action").equals("action_1")) {..code..}
if (request.getParameter("action").equals("action_2")) {..code..}
.
.
.
if (request.getParameter("action").equals("action_n")) {..code..}
}
2._Similar to above, but instead of the code here, this servlet just acts as a centralized servlet and calls a dedicated servlet for that action.
3._Have just a dedicated servlet for each action.
What are the pros and cons to the above designs and what is the preferred way to setup a server on GAE? Does accessing the datastore have an impact on my design?

I am in a similar situation. I started out with your option 1, which works fine. The only problem is it requires a lot of argument parsing, converting strings to integers and whatnot, as well as a manual mapping of command names to methods. Options 2 and 3 are equally laborious, but even worse because you have to create a bunch of auxiliary methods. If I had to do it all over again I would use a library that does all that work for me, like this one (I am in fact considering converting to this): http://code.google.com/p/json-rpc/. Voila, no argument parsing or manual creation of helper classes! This one happens to implement a json rpc client-server interface which is good if you are doing an ajax "thick client." If you are generating most of your HTML on the server side you might want another solution.

I have built a server on GAE that handles 6 different types requests over
HTTP POST, all of which involve either creating, updating, or deleting objects
from the datastore. What is the best design for this?
It sounds like a job for a web service. My favorite is REST (though with REST actions are usually mapped to URLs not parameters). Take a look at Resteasy docs.

Since they all do separate things, use separate servlets. There's no point combining them into a single servlet: It makes both your code and the URL mapping messier.

Too many servlets can lead to slow class loading time (cold startups) in GAE environment, but too few can lead to request contention, leading to poor performance due to high latency. So there is a trade-off.
Workaround that should be considered is to enable "always on" and "warm-up request" features, and make your servlet multithread-safe.

Related

Ad Blocker w/ Segment.io

I'm considering using segment.io for several of my client-side 3rd party API needs, but I'm a little concerned about ad-blockers.
My app has no ads, but I do a lot of event-tracking for product analytics, as well as error tracking.
Segment.io offers a nice all-in-one solution, but if it's blocked, and all my eggs are in that basket, then, well, I won't have any eggs left, or however that idiom ends.
So my question is: is there a way to integrate multi-purpose event tracking (segment.io, keen.io, etc.) that isn't as susceptible to ad-blocking?
My app is mostly serverless, using a Firebase+AWS Lambda setup, so I've tried to think of some kind of back-end solution, but no big ideas so far.
ETA: I'm not looking to track adblocking users or violate anyone's trust. my question is about event-tracking unrelated to a user's identity, and whether or not that's possible in an all-in-one event tracking library that might be ad-blocked.
First, I'd typically consider such blocking to be "privacy" blocking instead of ads. So instead of Adblock it's more likely to be Ghostery or uBlock Origin.
Although most website uses of analytics are benign (improving conversion rates, catching browser exceptions, etc), the concern many have is that it allows the third party analytics sites (including segment, etc) to track users across multiple websites. Now most of these analytics sites are also not interested in that, but better safe than sorry?
The ethics of wanting to have analytics about all your webapp use are far more nuanced than "privacy good, tracking bad" and I don't think this is the forum for it, so I'll provide you a technical answer. Just note that your disclaimer about not wanting to "track adblocking users" is not really valid. If your aim is to gather analytics about them, that's still essentially tracking. Otherwise just use a hosted solution and realise that maybe 10-20% of users don't provide you with analytics.
The bad news: basically every "hosted" analytics solution is or will be in the block lists. Not only are their API hosts directly blocked, but there are also blocks in placed based on the name of JS files you may try to include.
The good news: you can work around it if you relay events through your own API, and AWS API Gateway which you may already be using is perfect for this.
There are multiple steps to this.
Step 1: The analytics provider need to provide the option of a fully bundled/built JS file. If they require you to pull the script dynamically from their own servers then it will be blocked there before it even downloads.
Step 2: Rename the bundled script so that it doesn't trigger any filename-based blocks, e.g. rename from mixpanel.umd.js to mp.js, and add it to your server.
Step 3: Create an API gateway to relay events back to the "correct" API (e.g. to api.analyticshost.com). You can actually do this with AWS API gateway only (no lambda required) if you pass through the right headers and URL params.
Step 4: Initialise the library to use your API host rather than the default one.
The result of this is (a) the browser no longer needs to dynamically pull the analytics from the analytics provider's CDN, and instead gets it from your server, and (b) the browser sends it to your API and then relayed through to the analytics provider's.
When gathering analytics segment also provides server side tracking libraries. This can be quite useful when you want to gather metrics for certain types of events that might be blocked by users on the client. At it's simplest, Segment has an HTTP Source but there are a number of popular languages supported as well.
https://segment.com/docs/connections/sources/catalog/#server
The classic example is the order complete event, this would typically happen in your server once that transaction has been committed to a database. Regardless of browser configuration, you could send this event from the server and track.
Be sure you respect the users consent management settings here though.
A lot of valid points are already mentioned in the accepted answer, I would mention a few technical considerations to minimize ad blockers impact on tracking tools (Segment, Google Tag Manager, etc):
Develop for server-side tracking. Whatever is on server cannot be blocked by ad blockers. However, this is usually tricky and very custom, Segment gives some examples on it as well.
Use managed client-side proxy solutions like DataUnlocker. This is great and does not require many code changes.
Use self-hosted open-source solutions for proxying Google Analytics and Google Tag Manager like this or this. I believe these solutions can be extended to support Segment as well.

Guidelines to understand when to use Apache Camel

I am slowly getting familiar with Camel, however I am struggling to understand the level of granularity at which it should be considered. Should Camel be used only if passing messages from one application to another, or it is also appropriate to use Camel to pass messages between components and / or layers within a single application?
For example, I have a requirement to expose a web service that accepts bookings, validates them and writes them to a queue. Would you recommend using camel in this scenario or does it really depend on the level of flexibility I want my solution to allow.
Put another way, if I was required to save the bookings to a database I would never have considered camel and instead just built it as a traditional app that calls a DAL to save the booking. Of course I could use camel-ibatis to insert the data but in this context using camel seems overkill.
Thank you for any pointers on this.
As you obviously suspect, it's somewhat of a grey line. The more that you need to flexible, the more benefit you'll get from using Camel.
Just this past week, I built a prototype of an app that needed to accept an HTTP post, put the data on a queue, and then pull messages from the queue and use them to update a Mongo database.
Initially, I used Camel, and it worked well. Then, the requirement for the HTTP POST was removed (it became just consuming messages from a queue and updating the database), and the database update became more complex than was easily supported via a simple string-based camel mongo endpoint spec, so I wound up doing away with camel, and rewrote it with just a jms connector and the Mongo api.
So, as usual, it depends. I would say that if you're just moving data between two endpoints, and there are no content-based decisions or routing, then you probably won't benefit from using Camel. Once you actually want to use one or more of the Enterprise Integration Patterns, then Camel will be a benefit.
As camel has lots of component, it looks like it can do every thing. But sometime it could be more straight forward if you can use the third party lib directly and don't have lots of business logic which can leverage the Enterprise Integration Patterns.
The real benefit of Camel is you can focus on how to route your message to meet your business logic needs without caring about implementation detail of the component.
as suggested, there are no hard rules...here is my take
use Camel to simplify technology challenges
complex message routing algorithms (EIPs)
technology integrations (components)
use Camel for these types of requirements
highly event based processes (EIPs)
exposing multiple interfaces to biz logic (http, file, jms, etc)
complex runtime management needs (lifecycle, policies)
that said...
don't use for just one simple use case
don't add unnecessary complexity
have a quorum of use cases/reasons/justification to use it
along these lines, I presented the following at ApacheCon focused on why/how to use Camel:
http://www.consulting-notes.com/2014/06/apachecon-2014-presentation-apache.html

Why not use only one HttpServlet for all request to an Rest API

Lets say I have a Rest API which can be accessed via e.g.: mypage.com/v1/users/1234
And i am using Java EE and HttpServlets for this rest api.
Is it a good idea to send all v1 requests to a single servlet and then pass it to my own structure to be more independent and maybe later switch more easily from HttpServlets to something else? Or better create and register a servlet for all types of ressources, so one for mypage.com/v1/users and one for mypage.com/v1/cars and so on
Is it much slower to use only one servlet or inefficient to do this or just inconvenient?
Maintenance is going to become very difficult quickly. Take a look at Jersey or Resteasy. The learning curve is small and jax-rs takes care of a whole lot of the boilerplate code you would have to write in vanilla servlets.

Efficient web services using AppEngine

I'm trying to use AppEngine as sort of a RESTful web service. The service is supposed to do simple finds and puts from the Datastore so Objectify seems good for covering that part. It also does a few lookups to other services if data is not available in the Datastore'. I'm usingRedstone XMLRPC` for that part.
Now, I have a couple of questions about how to design the serving part in view of AppEngine' quotas (I guess one should think about efficiency in most case but AppEngine's billing make more people think about efficiency).
First lets consider I use simple Servlets. In this case, I see two options. Either I create a number of servlets each providing a different service with Json passed to each of them or I use a single (or a fewer number of) service and determine the action to perform based on a parameter passed with the Json. Will either design have any significance on the number of hours, etc. clocked by AppEngine?
What is the cost difference if I use a RESTful framework such as Restlet or RestEasy as opposed to the barebones approach ?
This question is something of a follow up to : Creating Java Web Service using Google AppEngine
It's not so important, because most costs are going to datastore, so frontend micro-optimisation doesn't matter.
You can save there may be few cents, by choosing 'simple servet', but... is it your goal? It's much more important to make good data structures, prepare all required data in background, make good caching strategy, etc.
I agree with #Igor.
However, there is an additional thing to consider: http sessions.
GAE supports http sessions. Since GAE is a distributed system, sessions are stored in Datastore (and cached in Memcache for efficient reading). Session is updated in every request (to support expiration), so on every request Datastore is accessed.
Sessions are not required for REST and should be turned off.

Google Web Toolkit (GWT) + Google App Engine (GAE) + Detached Data Persistence

I would like to develop a web-app requiring data persistence using GWT and GAE. As I understand it, my only (or at least by far the most convenient) option for data persistence is GAE's Datastore, using JDO or JPA annotated objects. I would also like to be able to send my objects back and forth client-server using GWT Remote Procedure Calls (RPC), therefore my objects must be able to "detach". However, GWT RPC serialization cannot handle detached JDO/JPA objects and it doesn't appear as though it will in the near future.
My question: what is the simplest and most direct solution to this? Being able to share the same objects client/server with server-side persistence would be extremely convenient.
EDIT
I should clarify that I still wish to use GWT RPC with GAE's Datastore. I am just looking for the best solution that would allow all these technologies to work together.
Try use http://gilead.sourceforge.net/
I've recently found Objectify, which is designed to be a replacement for JDO. Not much experience with it yet but its simpler to use than JDO, seems more lightweight, and claims to get around the need for DTOs with GWT, though I haven't tried that particular feature yet.
Ray Cromwell has a temporary hack up. I've tried it, and it works.
It forces you to use Transient instead of Detachable entities, because GWT can't serialize a hidden Object[] used by DataNucleus; This means that the objects you send to the client can't be inserted back into the datastore, you must retrieve the actual datastore object, and copy all the persistent fields back into it. Ray's method uses reflection to iterate over the methods, retrieve the getBean() and setBean() methods, and apply the entity setBean() with your transient gwt object's getBean().
You should strive to use JDO, the JPA isn't much more than a wrapper class for now. To use this hack, you must have both getter and setter methods for every persistent field, using PROPER getBean and setBean syntax for every "bean" field. Well, ALMOST PROPER, as it assumes all getters will start with "get", when the default boolean field use is "is".
I've fixed this issue and posted a comment on Ray's blog, but it's awaiting approval and I'm not sure if he'll post it. Basically, I implemented a #GetterPrefix(prefix=MethodPrefix.IS) annotation in the org.datanucleus package to augment his work.
In case it doesn't get posted, and this is an issue, email x_AT_aiyx_DOT_info Re: #GetterPrefix for JDO and I'll send you the fix.
Awhile ago I wrote a post Using an ORM or plain SQL?
This came up last year in a GWT
application I was writing. Lots of
translation from EclipseLink to
presentation objects in the service
implementation. If we were using
ibatis it would've been far simpler to
create the appropriate objects with
ibatis and then pass them all the way
up and down the stack. Some purists
might argue this is Badâ„¢. Maybe so (in
theory) but I tell you what: it
would've led to simpler code, a
simpler stack and more productivity.
which basically matches your observation.
But of course that isn't an option with Google App Engine so you're pretty much stuck having a translation layer between client-side objects and your JPA entities.
JPA entities are quite rigid so they're not really appropriate for sending back and forth between the client anyway. Typically you want little bits from several entities when doing this (thus ending up with some sort of presentation-layer value object). That is your path forward.
Try this. It is a module for serializing GAE core types and send them to the GWT client.
You can consider using JSON. GWT has necessary API to parse & generate JSON string in the client side. You get a lot of JSON API for server side. I tried with google-gson, which is fine. It converts your JSON string to POJO model and viceversa. Hope this helps you providing a decent solution for your requirement
Currently, I use the DTO (DataTransferObject) pattern. Not necessarily as clean and plenty more boilerplate but GAE still requires a fair amount of boilerplate at current. ;)
I have a Domain Object mapped (usually) one-to-one with a DTO. When a client needs Domain info, a DAO(DataAccessObject) coughs up a DTO representation of the Domain object and sends that across the wire. When a DTO comes back, I hand the DAO the DTO which then updates all the appropriate Domain Objects.
Not as clean as being able to pass Domain Objects directly across the wire obviously but the limitations of GAE's JDO implementation and GWT's Serialization process means this is the cleanest way for me to handle this currently.
I believe Google's official answer for this is GWT 2.1 RequestFactory.
Given that you are using GWT and GAE, I'd suggest you stick to the official Google framework... I have a similar GWT / GAE based app and that's what I am doing.
By the way, setting up RequestFactory is a bit of pain in the ass. The current Eclipse plug-in doesn't include all the jars but I was able to find the help I needed, in Stackoverflow
I've been using Objectify as well, and I really like it. You still have to do some dancing around with pre/postLoad methods to translate e.g. Text to String and back.
since GWT ultimately compiles to JavaScript, for detached persistence it would need one of a few services available. the best known are HTML5 stores and Gears (both use SQLite!). of course, neither is widely deployed, so you'd have to convince your users to either use a modern browser or install a little-known plugin. be sure to degrade to a usable subset if the user doesn't comply
What about directly using Datastore API to load/store POJO domain objects?
It should be comparable to DTO approach, meaning e.g. that you have to manually handle all fields (if you don't use tricks like reflection-based automation) while it should give you more flexibility and full access to all Datastore features.

Resources