Prerender caching risks Google cloaking penalty? - angularjs

Following some trouble getting the Google Crawler to parse our AngularJS site, we're using Prerender to serve a crawler-friendly version of our pages.
This has worked well - except Webmaster Tools indicates that our site speed has worsened considerably, due to Prerender's latency. We're concerned this will impact ranking.
So two questions:
Does Google use the Prerender pages in measuring site speed - or the (true) Javascript-enabled version of our site? We suspect it's the former.
One possible solution is to cache the Prerendered pages. However these cached pages may not perfectly match what the user sees, due to the time delay between the page being put into cache and returned to the crawler - e.g. we may add additional products to the page and the title/metatags reflect the number of products available at any one time. Are these small differences to title, meta descriptions and page content sufficient to risk a cloaking penalty? If so, what is the alternative to caching?
Many thanks for any help.

When it comes to crawl speed, Google uses the Prerender page response time. This is why it's important to cache your pages so that the Prerender server doesn't have to load the page in the browser each time. Returning cached pages will make Googlebot crawl your site very fast.
As long as you are using the ?_escaped_fragment_= protocol and not matching on the Googlebot user agent, you won't be penalized for cloaking even if the pages differ in the ways you mentioned. Just don't match on the Googlebot user agent and don't try to pad your Prerender pages with keywords and you'll be fine.

Related

angular urls in sitemap.xml with hashbang or not?

how should I put Angular url's in the sitemap.xml?
Now I've added this:
https://www.domain.com/user-area#!/logon
but I think Google doesn't like it.
I've read about the _escaped_fragment_ but I don't understand what that means.
The _escaped_fragment_ was recently depreciated by google. See their blog post here:
http://googlewebmastercentral.blogspot.com/2015/10/deprecating-our-ajax-crawling-scheme.html
Anyways, from what you wrote, it seems like your server structure isn't prepared for the _escaped_fragment_. I won't go into too much detail about it here, since it was depreciated after all.
Anyways, Google's bots weren't always able to process websites with AJAX content (content rendered via Javascript). To create a workaround, Google proposed adding the hashbang #! to all AJAX sites. Bots would be able to detect the hashbang and know that the website had content rendered through AJAX. The bots would then request a pre-rendered version of the AJAX pages by replace the hashbang with the _escaped_fragment_. However, this required the server hosting the AJAX pages to know about the _escaped_fragment_ and be able to serve up a pre-rendered page, which was a difficult process to set up and execute.
Now, according to the depreciation blogpost, the URL you entered in your sitemap.xml should be fine, since Google's bots should be able to "crawl, render, and index the #! URLs". If you really want to know if Google can understand your website, I'd recommend using their webmaster console, located at https://www.google.com/intl/en/webmasters/ . Using that tool, you can register your site with google, observe how google indexes your site, and be notified if any problems arise with your site.
In general though, getting Google to index an AJAX site is a pain. I'd strongly recommend using the webmaster console and referring to it frequently. It does help.

Fetch as Google Webmaster tools

I have an AngularJS SPA site which I wanted to test using google's "Fetch as Google" feature in webmaster tools. I am a little confused about the results. The screenshot from Googlebot looks correct however the response doesn't include any of the contents inside the "ui-view" (ui-router)... can someone explain what is happening here? Is google indexing the site properly since the screenshot is correct? Or is google not able to execute the JS properly for indexing?
This is a mixed bag. From some tests I've seen the GoogleBot is able to index some of the AJAX fetched content in some cases. A safe bet though to make all the search engines happy is to use prerender.io or download their open source stuff (uses PhantomJS) to have your site be easily indexable. Basically what this does is saves the version of your site after async operations have completed for a given URL and then you setup a redirect on your server that points any of the potential bots for search engines over to the preprocessed page. It sounds pretty complicated but following the instructions on the site it's not too hard to setup, and if you don't want to pay for prerender.io to serve cached copies of your pages to search engines you can run the server component yourself too.

Drupal 7 Boost is creating but not serving cached pages

I have a Drupal 7 site and I installed Boost which I have installed successfully on other sites. On this specific site, Boost is creating the cached pages but not serving them. I'm able to navigate to the cached pages using www.mysite.com/cache/normal/mysiteURL/cachedpage.html and the cached page displays correctly. I thought the problem might have been that the pages were being re-cached with every request but I checked and the cached pages appear to be remaining for the expiration period I've set so they are not being re-cached.
Can anyone suggest why Boost is able to create the cached pages but the system is not serving them?
thanks,
You need to switch off drupals builtin 'cache pages for anonymous users'. And after this you empty cache. Boost will only work for anonymous users, be aware of it.

Angular SEO using HTML Unit and prerender.io

I have Ajax heavy website which is developed using angular. I tried using HTML Unit to take care of SEO to render my pages for the bots. But HTML Unit is really slow takes almost 12 secs to render my page.
I then shifted to phantom jus based prerender.io .. The performance was better around 3secs. But phantom server crashes when loaded with 10 users for just 2 mins.
Should I really worry about response time from my SEO servers to respond to bots and what would the load from bots be in website with ~100 different pages. How many parallel request should my SEO server handle.
Is page rank dependent on response time of the bots.
Google does care about response speed.
If you are serving snapshots of your pages then you should be following Google's Crawlable Ajax Specification: https://developers.google.com/webmasters/ajax-crawling/ (Bing and Yandex support this too)
Two good reasons to follow this spec are:
1) Google will understand that you are doing the rendering on the server and they are probably smart enough to understand that this means the response time will not be the normal one for your site.
2) Google wont misinterpret as nefarious the fact that you are serving general vistors one version of your page and Google another.
More info about these issues on our site: https://ajaxsnapshots.com (we provide a SaaS solution for crawlable Javascript)

CloudFlare with Google App Engine overquota

I've been looking at CloudFlare as a CDN service for my Google App Engine hosting, and as a student, cost is always an issue (aka free services only). I read on the CF blog that when the origin server is down, CF will serve a cached version of the website from its own servers to users.
So if we hit the GAE quota limit, is the server considered as "down"? Will CF display the cached website? I don't plan to have a lot of dynamic content so serving an entire cached website is not too much of an issue to me.
If the answer to my first question is no, is it possible to get CF to serve it's cached website content automatically once GAE hits any quota limit? I know it's probably unlikely but just wanted to throw this question out.
According to CloudFlare's wiki, the Always Online feature will return a cached page only if the backend server is unavailable or returns a response code of 502 or 504. When you hit quota limits App Engine itself will generally still be available, so whether the cache works depends on the response code in your case.
If your app exceeds its bandwidth or instance hour quota, App Engine will return a 403 Forbidden response code. It is possible to customize the content of the error response, but not the code. It seems then that CloudFlare will not serve a cached page in this case.
However, if your app hits an API usage quota, your code will receive an exception and you can choose to return one of those 50x codes and trigger the cache.
I'm not sure if this particular case will work for CloudFlare because of the error code that App Engine returns (we are working on some enhancements for Always Online, but it really won't tackle 403 errors).
It does appear that AppEngine does allow you some customization of the error pages?
Tip: You can configure your application to serve a custom error page when your application exceeds a quota. For details, see Custom Error Responses documentation for Python and Java.

Resources