Subdomain is preventing my search results from rising as it should in page rank - google-app-engine

My problem is that I have a site which has requires a dedicated page for every city I choose to support. Early on, I decided to use subdomains rather than a directly after my domain (ie i used la.truxmap.com rather than truxmap.com/la). I realize now that this was a major mistake because Google seems to treat la.truxmap.com as a completely different site as ny.truxmap.com. So for instance, if i search "la food truck map" my site will be near the top, however, if i search "nyc food truck map" im no where in sight because ny.truxmap.com wouldnt be very high in the page rank by itself, and it doesnt have the boost that it ought to be getting from the better known la.truxmap.com
So a mistake I made a year ago is now haunting my page rank. I'd like to know what the most painless way of resolving my dilemma might be. I have received so much press at la.truxmap.com that I can't just kill the site, but could I re-direct all requests at la.truxmap.com to truxmap.com/la and do the same for all cities supported without trashing my current, satisfactory page rank results I'm getting from la.truxmap.com ??
EDIT
I left out some critical information. I am using Google Apps to manage my domain (that is, to add the subdomains) and Google App Engine to host my site. Thus, Google Apps provides a simple mechanism to mask truxmap.appspot.com (the app engine domain) as la.truxmap.com, but I don't see how I can mask it as truxmap.com/la. If I can get this done, then I can just 301 redirect la.truxmap.com to truxmap.com/la as suggested below.
Thanks so much!

You could send a "301 Moved Permanently" redirect to cause the Google crawler to update its references to your site, no?
See this article on 301 redirects and SEO.

You'll need to modify your app as follows:
Add www.truxmap.com as an alias for the app (you can't serve naked domains in App Engine, so just truxmap.com won't work)
Add support to your app for handling URLs of the form www.truxmap.com/something/, routing to the same handlers as the subdomain. You'll need to make sure you've debugged any relative path issues well before continuing.
Modify your app to serve 302 redirects for every url under something.truxmap.com/whatever to www.truxmap.com/something/whatever.

Related

CDN serving private images / videos

I would like to know how do CDNs serve private data - images / videos. I came across this stackoverflow answer but this seems to be Amazon CloudFront specific answer.
As a popular example case lets say the problem in question is serving contents inside of facebook. So there is access controlled stuff at an individual user level and also at a group of users level. Besides, there is some publicly accessible data.
All logic of what can be served to whom resides on the server!
The first request to CDN will go to application server and gets validated for access rights. But there is a catch - keep this in mind:
Assume that first request is successful and after that, anyone will be able to access the image with that CDN URL. I tested this with Facebook user uploaded restricted image and it was accessible with the CDN URL by others too even after me logging out. So, the image will be accessible till the CDN cache expiry time.
I believe this should work - all requests first come to the main application server. After determining whether access is allowed or not, a redirect to the CDN server or access-denied error can be shown.
Each CDN working differently, so unless you specify which CDN you are looking for its hard to tell.

why i couldn't see any text in "http://crawlservice.appspot.com/?key=123456&url=http://mydomain.com#!article"?

Ok, i found this link https://code.google.com/p/gwt-platform/wiki/CrawlerSupport#Using_gwtp-crawler-service that explain how you can make your GWTP app crawlable.
I got some GWTP experience, but i know nothing about AppEngine.
Google said its "crawlservice.appspot.com" can parse any Ajax page. Now I have a page "http://mydomain.com#!article" that has an artice that was pulled from Database. Say that page has the text "this is my article". Now I open this link:
crawlservice.appspot.com/?key=123456&url=http://mydomain.com#!article, then i can see all javascript but I couldn't find the text "this is my article".
Why?
Now let check with a real life example
open this link https://groups.google.com/forum/#!topic/google-web-toolkit/Syi04ArKl4k & you will see the text "If i open that url in IE"
Now you open http://crawlservice.appspot.com/?key=123456&url=https://groups.google.com/forum/#!topic/google-web-toolkit/Syi04ArKl4k you can see all javascript but there is no text "If i open that url in IE",
Why is it?
SO if i use http://crawlservice.appspot.com/?key=123456&url=mydomain#!article then Can google crawler be able to see the text in mydomain#!article?
also why the key=123456, it means everyone can use this service? do we have our own key? does google limit the number of calls to their service?
Could you explain all these things?
Extra Info:
Christopher suggested me to use this example
https://github.com/ArcBees/GWTP-Samples/tree/master/gwtp-samples/gwtp-sample-crawler-service
However, I ran into other problem. My app is a pure GWTP, it doesn't have appengine-web.xml in WEB-INF. I have no idea what is appengine or GAE mean or what is Maven.
DO i need to register AppEngine?
My Appp may have a lot of traffic. Also I am using Godaddy VPS. I don't want to register App Engine since I have to pay for Google for extra traffic.
Everything in my GWTP App is ok right now except Crawler Function.
So if I don't use Google App Engine, then how can i build Crawler Function for GWTP?
I tried to use HTMLUnit for my app, but HTMLUnit doesn't work for GWTP (See details in here Why HTMLUnit always shows the HostPage no matter what url I type in (Crawlable GWT APP)? )
I believe you are not allowed to crawl Google Groups. Probably they are actively trying to prevent this, so you do not see the expected content.
There's a couple points I wish to elaborate on:
The Google Code documentation is no longer maintained. You should look on Github instead: https://github.com/ArcBees/GWTP/wiki/Crawler-Support
You shouldn't use http://crawlservice.appspot.com. This isn't a Google service, it's out of date and we may decide to delete it down the road. This only serves as a public example. You should create your own application on App Engine (https://appengine.google.com/)
There is a sample here (https://github.com/ArcBees/GWTP-Samples/tree/master/gwtp-samples/gwtp-sample-crawler-service) using GWTP's Crawler Service. You can basically copy-paste it. Just make sure you update the <application> tag in appengine-web.xml to the name of your application and use your own service key in CrawlerModule.
Finally, if your client uses GWTP and you followed the documentation, it will work. If you want to try it manually, you must encode the Query Parameters.
For example http://crawlservice.appspot.com/?key=123456&url=http://www.arcbees.com#!service will not work because the hash (everything including and after #) is not sent to the server.
On the other hand http://crawlservice.appspot.com/?key=123456&url=http%3A%2F%2Fwww.arcbees.com%2F%23!service will work.

Should a www. to m. redirect for mobile devices accessing a PC site use a 301 or a 302?

I read in a few places that people are recommending a 301 redirect to redirect mobile devices from a PC site the mobile optimised site.
Making Websites Mobile Friendly (Google Webmaster blog) - http://googlewebmastercentral.blogspot.com/2011/02/making-websites-mobile-friendly.html
Untangling Your Mobile Metrics With Better Redirects - http://searchengineland.com/untangling-your-mobile-metrics-with-better-redirects-113015?utm_source=twitterfeed&utm_medium=twitter&utm_campaign=feed-main
The arguments seem to be 1) Reduce duplication in Google search results and 2) Preserve referrer.
However the side affect of doing a permanent redirect is that you couldn't give the option of the user back to the PC version if they wanted (since the 301 permanent redirect would have been cached by the client) - See http://mobiforge.com/designing/story/a-very-modern-mobile-switching-algorithm-part-ii for why giving user choice is important.
What is the recommendation to optimise for search (e.g. following Google's SEO guidelines) at the same time as giving the user choice as to whether to visit the mobile or PC site?
If you craft your regex carefully, you could probably make redirection happen only for URLs that don't have the "force" parameter. After all, you probably want fine-grained control over that redirection anyway, rather than have blanket rules, so that you can control one-to-one page mapping.
Google now says it doesn't matter, as long as you do your canonical tags properly:
HTTP redirection is a commonly used to redirect clients to
device-specific URLs. Usually, the redirection is done based on the
user-agent in the HTTP request headers. It is important to keep the
redirection consistent with the alternate URL specified in the page's
link rel="alternate" tag or in the Sitemap.
For this purpose, it does not matter if the server redirects with an
HTTP 301 or a 302 status code.

Getting a forwarded URL from thousands of different domains in Google App Engine

I actually asked this question before, but I cannot get my account details back, so I'm asking again:
I have a series of different domain names that I would like to all point (via URL forwarding from my domain host) to a google app engine application that reads what the forwarding URL is. So if the domain typed in was original XYZ.com, then when I am forwarded to my application, I can return what that original domain name was. I'm using the python variant. How best can I do this without coding for each and every variant?
So for example I might have aaa.com and bbb.com and ccc.com that all should point to the same appspotdomain, and I wish to somehow determine what the referring URL was. I have thousands of domains and I have URL forwarding set-up. So unless I put something in the header is there a smart way to pull out the referring URL. I have tried the os.environ["SERVER_NAME"] route but this just gives the app-engine domain.
Try
os.environ['HTTP_REFERER']
or
self.request.headers['Referer']
Be careful though, it might not always be available.

How do i get foo.somedomain.com get handled by myapp.appspot.com/foo on appengine

Here is what I'd like to achieve
http://foo.somedomain.com gets handled by
http://myapp.appspot.com/foo (google appengine app myapp)
and the underlying url is masked.
Note the following:
somedomain.com is a third party domain that would like to add foo.somedomain.com
mydomain.com would be CNAME'd to myapp.appspot.com
mydomain.com/foo would point to myapp.appspot.com/foo
other scenarios
can foo.mydomain.com be made to point to myapp.appsot.com/foo
can foo.somedomain.com point directly to myapp.appspot.com/foo
Added: myapp.appspot.com is developed using django w/ app-engine-patch
You can't do this in the way described. In order to do this, you need to:
CNAME foo.somedomain.com to ghs.google.com (not to myapp.appspot.com)
Set up Google Apps for your Domain on somedomain.com, if it's not already
Add the app 'myapp' to foo.somedomain.com through the Apps control panel
Once that's done, your app can check self.request.host to determine which hostname was sent, and route requests appropriately.
You can parse the sub-domain from the Host header, then call the webapp.RequestHandler appropriate for the path /[sub-domain], assuming *.yourdomain.com is directed to the Google App Engine application.
Have a look at webapp.WSGIApplication and see if there's a way to get the mapped webapp.RequestHandler for a path. Alternatively, you might be able to modify the request object to change the requested path (this I'm not sure about, however.)
This question was asked in one of the 2009 Google I/O app engine talks. Unfortunately the answer given was along the lines of not supported at this time but the possibilities of some workarounds may exist. 2009 Google I/O videos

Resources