Block a URL path on Google Appengine - google-app-engine

I would like to block a specific path (e.g. https://myapp.appspot.com/foo/bar) from being accessed on the server such that the caller gets a 404 or something to that extent. Please note that I have regex based handlers installed (e.g. /foo/.* - will trigger Handler) so by default the /app/foo/bar is being directed to this Handler. I would like to add a specific handler for '/foo/bar' at a higher level before the lower /app//).
One way to do this is to add url handler and direct it to a not_found app handler such as:
- url: /foo/bar.*
script: not_found.app
If there is a better way to do this, please care to share and will be highly appreciated.
Essentially, I have a rogue client who is using a bot to hit my server continuously and is consuming undesired resources. The specific URL being called by this bot is one that I could completely disable. If there are any tips on how one could use such URL's and direct them to a lower priority instance then that would be also very helpful.
Btw, I have already added a range of IP's being used by this bot to dos.yaml. But that has not helped since it keeps changing its IP-Address.
I am sure this is a pretty typical scenario which the web-masters have expert advice on (any help/recommendation is highly welcomed - pardon my pedestrian question).

You can force-route requests to any module of your choosing with dispatch.yaml:
dispatch:
- url: "*/foo.bar"
module: cheapmodule
and then in cheapmodule.yaml you make sure you have at most a single instance of the cheapest kind, say basic scaling with instance_class B1 and max_instances 1 (not sure what happens if cheapmodule is specified to have zero instances, e.g manual scaling with instances 0, or instances 1 to start but then on its _ah/start handler it calls google.appengine.api.modules.modules.set_num_instances_async(instances, module='cheapmodule') -- perhaps worth experimenting with).

Related

Any examples of using a Wandsearcher in vespa ? (After a weighted set query)

Currently i am using the REST interface to query vespa, which seems to work great but something tells me that i should be using searchers in the application to make the client(server side code) a bit lighter (bundle the jar file in the application package) to make it a bit smoother. I have managed to do some simple searcher/processor applications. But this is a bit overwhelming.
So are there any readily available examples ?
Basicially i want to:
Send to /search?query=someId
Do a ordinary search for the weighted set on this documentID (I guess this one can be handy: https://docs.vespa.ai/documentation/reference/inspecting-structured-data.html)
Take those items in the response and add it to a wand item(s) and query for a wand with wandsearcher on a given field. Similar to the yql:
"select * from sources * where wand(interest, some weightedsets));","ranking":"combined_score" and return the matches.
Just curious also, apart from the trouble of string building with the http request i am doing at the moment are there any performance gains of using a searcher or go the java route vs rest?
thanks for any insight or code help i can start with.
There is an example of using the WandItem (YQL wand)here https://docs.vespa.ai/documentation/advanced-ranking.html and see also https://docs.vespa.ai/documentation/using-wand-with-vespa.html as there are two wand implementations available in Vespa, it sounds from the description that the wand() is what you want to use for this use case. For the first call you probably want to have a dedicated document summary to reduce the amount of data fetched for your first query and also the option of serving it out of memory only (See https://docs.vespa.ai/documentation/document-summaries.html)
Also see https://docs.vespa.ai/documentation/searcher-development.html as a general resource on writing searchers.
For your use case it makes a lot of sense to write a searcher to perform these two queries as your second query depends on the first and you avoid the cost of rendering/http/yql parsing which might matter if your client is remote with high network latency.

Is there a way to use global property in apache camel

I have a scheduler that puts some value(N or Y) into a topic for every 10 mins(usually 'N', unless something abnormal happens with topic). When the topic goes down, the scheduler will populate a property(kind of inter-scheduler communication), so that it can be used during scheduler's next cycle, as way of telling the scheduler that something bad happened during last cycle, so that, it'll place a different value('Y') in topic in this cycle. But the problem here is normal exchange property isn't helping. The property is always null during every scheduler cycle.
When i went through the http://camel.apache.org/schema/blueprint/camel-blueprint.xsd, looking out for something similar to global properties, i got this one "tns:properties"
which can be set at context level.
Can this be used as a global property?
is there a way to read/write it in my scheduler route?
I'm also thinking about having a bean with an instance variable to hold this inter-scheduler-communication property.
Can anyone suggest the right option?
What you've described sounds to me like a means for maintaining state between processes, and using the properties for that will be problematic for a number of reasons.
I suggest breaking the app into a couple different pieces, and use a shared OSGi service to maintain the state.
public interface MyScheduleState() {
public setSomeValue(String x)
public String getSomeValue()
}
Route 1: Timer starts the task .. check the service for values.. send event. if error occurs, sends error message to some queue://MY.ERRORS
Route 2: Listen for errors on MY.ERRORS and update the OSGi service with new values
This gives you control over the behavior and you can change how the "stateful service" stores its data.. either in memory, on disk as a file or in a cache" and your routes will never know the specifics.
Take a look to http://camel.apache.org/properties.html
It seems to be exactly you are looking for - context properties. You can set a property value on each cycle and it will be available in the next cycle too.

What's the difference between "direct:" and to() in Apache Camel?

The DirectComponent documentation gives the following example:
from("activemq:queue:order.in")
.to("bean:orderServer?method=validate")
.to("direct:processOrder");
from("direct:processOrder")
.to("bean:orderService?method=process")
.to("activemq:queue:order.out");
Is there any difference between that and the following?
from("activemq:queue:order.in")
.to("bean:orderServer?method=validate")
.to("bean:orderService?method=process")
.to("activemq:queue:order.out");
I've tried to find documentation on what the behaviour of the to() method is on the Java DSL, but beyond the RouteDefinition javadoc (which gives the very curt "Sends the exchange to the given endpoint") I've come up blank :(
In the very case above, you will not notice much difference. The "direct" component is much like a method call.
Once you start build a bit more complex routes, you will want to segment them in several different parts for multiple reasons.
You can, for instance, create "sub routes" that could be reused among multiple routes in your Camel context. Much like you segment out methods in regular programming to allow reusability and make code more clear. The same goes for sub routes using, for instance the direct component.
The same approach can be extended. Say you want multiple protocols to be used as endpoints to your route. You can use the direct endpoint to create the main route, something like this:
// Three endpoints to one "main" route.
from("activemq:queue:order.in")
.to("direct:processOrder");
from("file:some/file/path")
.to("direct:processOrder");
from("jetty:http://0.0.0.0/order/in")
.to("direct:processOrder");
from("direct:processOrder")
.to("bean:orderService?method=process")
.to("activemq:queue:order.out");
Another thing is that one route is created for each "from()" clause in DSL. A route is an artifact in Camel, and you could do certain administrative tasks towards it with the Camel API, such as start, stop, add, remove routes dynamically. The "to" clause is just an endpoint call.
Once starting to do some real cases with somewhat complexity in Camel, you will note that you cannot get too many "direct" routes.
Direct Component is used to name the logical segment of the route. This is similar process to naming procedures in structural programming.
In your example there is no difference in message flow. In the terms of structural programming, we could say that you make a kind of inline expansion to your route.
Another difference is Direct component doesn't has any thread pool, the direct consumer process method is invoked by the calling thread of direct producer.
Mainly its used for break the complex route configuration like in java we used to have method for reusability. And also by configuring threads at direct route we can reduce the work for calling thread .
from(A).to(B).to(OUT)
is chaining
A --- B --- OUT
But
from(A ).to( X)
from(B ).to( X)
from( X).to( OUT )
where X is a direct:?
is basically like a join
A
\____ OUT
/
B
obviously these are different behaviours, and with the second you could implement anylogic you wanted, not just a serial chain

App Engine Memcache Key Prefix Across Versions

Greetings!
I've got a Google App Engine Setup where memcached keys are prefixed with os.environ['CURRENT_VERSION_ID'] in order to produce a new cache on deploy, without having to flush the cache manually.
This was working just fine until it became necessary for development to run two versions of the application at the same time. This, of course, is yielding inconsistencies in caching.
I'm looking for suggestions as to how to prefix the keys now. Essentially, there needs to be a variable that changes across versions when any version is deployed. (Well, this isn't quite ideal, as the cache gets totally blown out.)
I was thinking of the following possibilities:
Make a RuntimeEnvironment entity that stores the latest cache prefix. Drawbacks: even if cached, slows down every request. Cannot be cached in memory, only in memcached, as deployment of other version may change it.
Use a per-entity version number. This yields very nice granularity in that the cache can stay warm for non-modified entities. The downside is we'd need to push to all versions when models are changed, which I want to avoid in order to test model changes out before deploying to production.
Forget key prefix. Global namespace for keys. Write a script to flush the cache on every deploy. This actually seems just as good as, if not better than, the first idea: the cache is totally blown in both scenarios, and this one avoids the overhead of the runtime entity.
Any thoughts, different ideas greatly appreciated!
The os.environ['CURRENT_VERSION_ID'] value will be different to your two versions, so you will have separate caches for each one (the live one, and the dev/testing one).
So, I assume your problem is that when you "deploy" a version, you do not want the cache from development/testing to be used? (otherwise, like Nick and systempuntoout, I'm confused).
One way of achieving this would be to use the domain/host header in the cache - since this is different for your dev/live versions. You can extract the host by doing something like this:
scheme, netloc, path, query, fragment = urlparse.urlsplit(self.request.url)
# Discard any port number from the hostname
domain = netloc.split(':', 1)[0]
This won't give particularly nice keys, but it'll probably do what you want (assuming I understood correctly).
There was a bit of confusion with how i worded the question.
I ended up going for a per-class hash of attributes. Take this class for example:
class CachedModel(db.Model):
#classmethod
def cacheVersion(cls):
if not hasattr(cls, '__cacheVersion'):
props = cls.properties()
prop_keys = sorted(props.keys())
fn = lambda p: '%s:%s' % (p, str(props[p].model_class))
string = ','.join(map(fn, prop_keys))
cls.__cacheVersion = hashlib.md5(string).hexdigest()[0:10]
return cls.__cacheVersion
#classmethod
def cacheKey(cls, key):
return '%s-%s' % (cls.cacheVersion(), str(key))
That way, when entities are saved to memcached using their cacheKey(...), they will share the cache only if the actual class is the same.
This also has the added benefit that pushing an update that does not modify a model, leaves all cache entries for that model intact. In other words, pushing an update no longer acts as flushing the cache.
This has the disadvantage of hashing the class once per instance of the webapp.
UPDATE on 2011-3-9: I changed to a more involved but more accurate way of getting the version. Turns out using __dict__ yielded incorrect results as its str representation includes pointer addresses. This new approach just considers the datastore properties.
UPDATE on 2011-3-14: So python's hash(...) is apparently not guaranteed to be equal between runs of the interpreter. Was getting weird cases where a different app engine instance was seeing different hashes. using md5 (which is faster than sha1 faster than sha256) for now. no real need for it to be crypto-secure. just need an ok hashfn. Will probably switch to use something faster, but for now i rather be bugfree. Also ensured keys were getting sorted, not the property objects.

How to abandon a long-running search in System.DirectoryServices.Protocols

I've been trying to work out how to cancel a long-running AD search in System.DirectoryServices.Protocols. Can anyone help?
I've looked at the supportControl/supportedCapabilities attributes on RootDSE and they don't contain the 1.3.6.1.1.8 OID so I think that means it doesn't support the LDAP CANCEL extended operation as defined here: https://www.rfc-editor.org/rfc/rfc3909
That leaves the original LDAP ABANDON command (see here for list). But there doesn't seem to be a matching DirectoryRequest Class.
Anyone have any ideas?
I think I've found my answer: whilst I was reading around your suggestion, Martin, I came across the Abort method on the LdapConnection class. I didn't expect to find it there: starting out from the LDAP documentation I'd expected to find it as just another LDAPMessage but the MS guys seem to have treated it as a special case. If anyone is familiar with a non-MS implementation of LDAP and can comment on whether the MS approach is typical, I'd appreciate it to improve my understanding.
I think, but I'm not positive, there is no asynch query with a cancel. It has an asynch property but it's to allow a collection to be filled, nothing to do with cancelling. The best I can offer is to put your query in a background worker thread and put an asynch callback that will deal with the answer when it comes back. If the user decides to cancel, you can just cancel the background worker thread. You'll free your app up, even if you haven't freed the ldap server up until it finishes it's query. You can find info on background worker threads at http://www.c-sharpcorner.com/UploadFile/LivMic/BGWorker07032007000515AM/BGWorker.aspx
Don't forget to call .Dispose() when cleaning up your active directory objects to prevent memory leaks.
If the query will produce many data also, you can abandon them through paging. Specify a PageResultRequestControl option in the query, giving a fairly low page size (IIUC, 1000 is the default page size). IIUC, you'll send new requests every time you got a page (passing cookies from one response into the next request). When you choose to cancel the query, send another request with zero expected results.

Resources