Google App Engine optional slash redirect - google-app-engine

I have a java app running on Google App Engine... I'd like to make the trailing slash optional for directories... so navigating to www.domain.com/test and www.domain.com/test/ would yield the same thing.
How do I achieve that?
I know about the app.yaml configuration file but I am running a Java app not python..

See this post. Works for me, though looks like a hack. I think it worth posting issue to google, as thee servlet specification requires adding trailing slashes when attempting to find a proper welcome-file.

The easiest way to do this would be to create a filter that intercepts requests and appends the forward slash if necessary. Generally it's better to send a redirect rather than serving the same content, so you don't end up with two canonical URLs for everything, and all your contents indexed twice.
What constitutes a 'directory' depends on your application, of course, and there's no hard and fast rule for figuring that out.

Related

Semantics of dispatch.yaml

I'm looking at various pages about dispatch.yaml, most of which contain similar information and examples:
https://cloud.google.com/appengine/docs/flexible/nodejs/how-requests-are-routed#routing_with_a_dispatch_file
https://cloud.google.com/appengine/docs/python/config/dispatchref
https://cloud.google.com/appengine/docs/go/config/dispatchref
etc.
I happen to be using node.js on GAE Flexible Environment, but I think it would be the same for every language and environment.
The problem is that these pages don't really specify how dispatch.yaml works. In particular:
Are rules applied in the order given? I'm assuming that the first matching rule is the one used, but nothing seems to say so.
Do leading glob (wildcard) characters match only the domain name, or could they match the first part of the URL's path? If the rule is */hello, would that match myapp.appspot.com/path/hello? I'm guessing not, based on some vague hints in the docs, but it isn't very clear.
If no rule in dispatch.yaml matches the URL, will it be routed to the default service? I would think it would have to, but again, these pages don't say.
Do URLs get rewritten based on the rules before they're sent to the service? If the rule is */path/* and the URL is https://myapp.appspot.com/path/hello, will the service see it as /path/hello or as /hello? I'm guessing the former.
I'm doing some trial and error now, so I may be able to answer my own question soon. I'm also submitting this to Google through their documentation feedback system.
Things I know so far:
Yes, rules are tried in order. So for example, if you want one URL to go to a specific service, and all other URLs to go to another service, you should specify the specific one first:
dispatch:
- url: "*/specific"
module: specific
- url: "*/*"
module: general
If you put those rules in the opposite order, module specific will never be used, because the URL /specific will be caught by the wildcard rule.
Unknown
Yes. You can test this by making a request not matching any dispatch.yaml rule and watching the default's service logs.
No rewriting. If the rule is */path/* and the actual URL is https://myapp.appspot.com/path/hello, your service should still handle /path/hello, not /hello.
Just to fill in the blank (feel free to paste this into the accepted answer):
No. It only matches the start of the path.
I created two apps with the following resources:
default -> /abc/def/test.html -> <h1>default</h1>
other -> /abc/def/test.html -> <h1>other</h1>
And 1 route:
<dispatch>
<url>*/def/*</url>
<module>other</module>
</dispatch>
When I hit {app engine}/abc/def/test.html I got "default"

why i couldn't see any text in "http://crawlservice.appspot.com/?key=123456&url=http://mydomain.com#!article"?

Ok, i found this link https://code.google.com/p/gwt-platform/wiki/CrawlerSupport#Using_gwtp-crawler-service that explain how you can make your GWTP app crawlable.
I got some GWTP experience, but i know nothing about AppEngine.
Google said its "crawlservice.appspot.com" can parse any Ajax page. Now I have a page "http://mydomain.com#!article" that has an artice that was pulled from Database. Say that page has the text "this is my article". Now I open this link:
crawlservice.appspot.com/?key=123456&url=http://mydomain.com#!article, then i can see all javascript but I couldn't find the text "this is my article".
Why?
Now let check with a real life example
open this link https://groups.google.com/forum/#!topic/google-web-toolkit/Syi04ArKl4k & you will see the text "If i open that url in IE"
Now you open http://crawlservice.appspot.com/?key=123456&url=https://groups.google.com/forum/#!topic/google-web-toolkit/Syi04ArKl4k you can see all javascript but there is no text "If i open that url in IE",
Why is it?
SO if i use http://crawlservice.appspot.com/?key=123456&url=mydomain#!article then Can google crawler be able to see the text in mydomain#!article?
also why the key=123456, it means everyone can use this service? do we have our own key? does google limit the number of calls to their service?
Could you explain all these things?
Extra Info:
Christopher suggested me to use this example
https://github.com/ArcBees/GWTP-Samples/tree/master/gwtp-samples/gwtp-sample-crawler-service
However, I ran into other problem. My app is a pure GWTP, it doesn't have appengine-web.xml in WEB-INF. I have no idea what is appengine or GAE mean or what is Maven.
DO i need to register AppEngine?
My Appp may have a lot of traffic. Also I am using Godaddy VPS. I don't want to register App Engine since I have to pay for Google for extra traffic.
Everything in my GWTP App is ok right now except Crawler Function.
So if I don't use Google App Engine, then how can i build Crawler Function for GWTP?
I tried to use HTMLUnit for my app, but HTMLUnit doesn't work for GWTP (See details in here Why HTMLUnit always shows the HostPage no matter what url I type in (Crawlable GWT APP)? )
I believe you are not allowed to crawl Google Groups. Probably they are actively trying to prevent this, so you do not see the expected content.
There's a couple points I wish to elaborate on:
The Google Code documentation is no longer maintained. You should look on Github instead: https://github.com/ArcBees/GWTP/wiki/Crawler-Support
You shouldn't use http://crawlservice.appspot.com. This isn't a Google service, it's out of date and we may decide to delete it down the road. This only serves as a public example. You should create your own application on App Engine (https://appengine.google.com/)
There is a sample here (https://github.com/ArcBees/GWTP-Samples/tree/master/gwtp-samples/gwtp-sample-crawler-service) using GWTP's Crawler Service. You can basically copy-paste it. Just make sure you update the <application> tag in appengine-web.xml to the name of your application and use your own service key in CrawlerModule.
Finally, if your client uses GWTP and you followed the documentation, it will work. If you want to try it manually, you must encode the Query Parameters.
For example http://crawlservice.appspot.com/?key=123456&url=http://www.arcbees.com#!service will not work because the hash (everything including and after #) is not sent to the server.
On the other hand http://crawlservice.appspot.com/?key=123456&url=http%3A%2F%2Fwww.arcbees.com%2F%23!service will work.

CakePHP Application: some pages with SSL, some without

I have an application written with the CakePHP framework and it is currently located in httpdocs. I want a few pages to be redirected to https://
Basically this shouldn't be a problem to detect whether the user is already on https://... or not. My issue is a different one: In my opinion I would need to make a copy of the whole project and store it in httpsdocs, right? This sounds so silly but how should it work without duplicating the code? I think I miss something but I don't get it ...
I have never had to copy the code for ssl. You should specify in the vhost for the site what the path is.
On apache there is a vhost for each, ssl and non ssl. Both can have the same webroot path.
If your webhoster requires you to put the https part of your website in httpsdocs, then you will need to put something there. But not the whole project: maybe only the /web part (the part that is actually served up by the webhoster).
Something like
/cake/app/ --> your app code
/httpsdoc/.. --> index.php and possibly css stuff, images etc
/httpsdocs/.. --> copy of index.php and the rest as well
Of course, you could also use some internal redirect in .htaccess
One suggestion: now that google indexes https urls, you could also choose to make the whole site available through https.

URL redirection with webapp2

I'm developing an application with webapp2 to be deployed on Google App Engine. URLs will always be preceded by a language identifier, such as:
http://www.mydomain.com/en/foo
http://www.mydomain.com/en/bar
I would like to automatically redirect any request that doesn't start with a language identifier to the corresponding English version. For example, the following URLs should redirect to the URLs above:
http://www.mydomain.com/foo
http://www.mydomain.com/bar
Currently, I'm using webapp2_extras to set up one redirect for every possible URL, which is creating a lot of code duplication. The problem is that, to my understanding, URL redirection in webapp2 needs to be defined on a per-handler basis.
How can I go about redirecting all requests that don't match a regular expression (language identifier in my case) to the corresponding modified URL (adding en/ in my case)?
what you are searching for is a middlware. here an example.
Old question but it seems like setting routes and catching exceptions would be a good way to go for this: http://webapp-improved.appspot.com/guide/exceptions.html#exceptions-in-the-wsgi-app
Routes for the http://www.mydomain.com/en/foo cases and any http://www.mydomain.com/foo cases will be a 404 exception, which you can address with a handler, redirecting to the appropriate "en" page.

How can I create a persistent vanity URL in DotNetNuke?

I'm not aware of a solution for implementing custom persistent vanity URLs (my term, not sure if thats what they're really called) in DotNetNuke. Does anyone know of a solution? It can be configuring the core, using a third party module, or a suggestion of how to write it from scratch.
Here is what I'm thinking:
I want to point people to: http://mywebsite.com/awesome
I want the underlying URL to be http://mywebsite.com/genericpage.aspx?key=awesome&etc=etc
I don't want the URL to redirect. I want the user to see http://mywebsite.com/awesome only.
Essentially I'd envision an administrator being able to create these vanity URLs and specify what the vanity URL is and what the underlying URL is.
The closest thing, out of the box, is to define your friendly urls in SiteUrls.config found in the DotNetNuke root.
This way:
you point people to:
http://mywebsite.com/awesome.aspx
you have an underlying URL
http://mywebsite.com/Default.aspx?tabid=ID&etc=etc
users see:
http://mywebsite.com/awesome.aspx
Main restriction is that you will have an .aspx extension.
SiteUrl.config rules look like this:
<RewriterRule>
<LookFor>.*/awesome.aspx</LookFor>
<SendTo>~/default.aspx?tabid=ID&etc=etc</SendTo>
</RewriterRule>
Rewriter rule matches incoming url to a regular expression in the LookFor section, and sends it to an underlying url in the SendTo section. You need to be careful with the XML escape character '&' in the querystring parameters.
3rd party extensions like URL Master provide much more fine grained control, and you can have a global friendly url scheme based on page names, with or without .aspx extensions. Nevertheless, a simple "one url at a time" approach can be safer if you have custom modules with URL dependencies.
ActiveSocial supports these and I thought I saw something about support for this in Version 2.x of IFinity's URL Master, but I can't find anything on it now.

Resources