Angular SEO using HTML Unit and prerender.io

Angular SEO using HTML Unit and prerender.io - angularjs

I have Ajax heavy website which is developed using angular. I tried using HTML Unit to take care of SEO to render my pages for the bots. But HTML Unit is really slow takes almost 12 secs to render my page.
I then shifted to phantom jus based prerender.io .. The performance was better around 3secs. But phantom server crashes when loaded with 10 users for just 2 mins.
Should I really worry about response time from my SEO servers to respond to bots and what would the load from bots be in website with ~100 different pages. How many parallel request should my SEO server handle.
Is page rank dependent on response time of the bots.

Google does care about response speed.
If you are serving snapshots of your pages then you should be following Google's Crawlable Ajax Specification: https://developers.google.com/webmasters/ajax-crawling/ (Bing and Yandex support this too)
Two good reasons to follow this spec are:
1) Google will understand that you are doing the rendering on the server and they are probably smart enough to understand that this means the response time will not be the normal one for your site.
2) Google wont misinterpret as nefarious the fact that you are serving general vistors one version of your page and Google another.
More info about these issues on our site: https://ajaxsnapshots.com (we provide a SaaS solution for crawlable Javascript)

Related

How to use NodeJS to combat social sharing and search engines issues when using single-page frameworks like AngularJS

I read an article about social sharing issues in AngularJS and how to combat by using Apache as a proxy.
The solution is usable for small websites. But if a web app has 20+ different pages, I have to url-write and create static files for all of them. Moreover, a different stack is added to the app by using PHP and Apache.
Can we use NodeJS as the proxy and re-write the url, and what's the approach?
Is there a way to minimize static files creation?
Is there a way to remove proxy, url-rewrite, and static files all together? For example, inside our NodeJS app to check the user agent, if it is facebook bot or twitter and the like, we use request module to download our page and return the raw html code for them, is it a plausible solution?

Normally when someone shares a url in a social network, that social network request that page to generate a preview/thumbnail (aka "scrape").
Most likely those scrapers won't run javascript, so they need a static html version of that page.
Same applies for search engines (even though Google and others are starting to support javascript sites).
Here's a good approach for an SPA to still support scrapers:
use history.pushState in angular to get virtual urls when navigating thru your app (ie. urls without a #)
server-side (node.js or any), detect if a request comes from a user or a bot (eg. check the User-Agent using this lib https://www.npmjs.com/package/is-bot )
if the request url has a file extension, it's probably a static resource request (images, .css, .js), proxy to get the static file
if the request url is a page, for real users, if the url is a page (ie. not a static resource) always serve your index.html that loads your angular app (pro tip: keep this file cached in memory)
if the request url is a page, serve a pre-rendered version of the requested url (they won't run javascript), this is the hard part (side note: ReactJS makes this problem much simpler), you can use a service like https://prerender.io/ they'd take care of loading your angular app, and saving each page as html (if you're curious, they use a headless/virtual browser in memory called PhantomJS to do that, simulating what a real user would do clicking "Save As..."), then you can request and proxy those prerendered pages to bot requests (like social network scrappers). If you want, it's possible to run a prerender instance on your own servers.
All this server-side process I described is implemented in this express.js middleware by prerender:
https://github.com/prerender/prerender-node/blob/master/index.js
(even if you don't like prerender, you can use that code as implementation guide)
Alternatively, here's an implementation example using only nginx:
https://gist.github.com/thoop/8165802

angular urls in sitemap.xml with hashbang or not?

how should I put Angular url's in the sitemap.xml?
Now I've added this:
https://www.domain.com/user-area#!/logon
but I think Google doesn't like it.
I've read about the _escaped_fragment_ but I don't understand what that means.

The _escaped_fragment_ was recently depreciated by google. See their blog post here:
http://googlewebmastercentral.blogspot.com/2015/10/deprecating-our-ajax-crawling-scheme.html
Anyways, from what you wrote, it seems like your server structure isn't prepared for the _escaped_fragment_. I won't go into too much detail about it here, since it was depreciated after all.
Anyways, Google's bots weren't always able to process websites with AJAX content (content rendered via Javascript). To create a workaround, Google proposed adding the hashbang #! to all AJAX sites. Bots would be able to detect the hashbang and know that the website had content rendered through AJAX. The bots would then request a pre-rendered version of the AJAX pages by replace the hashbang with the _escaped_fragment_. However, this required the server hosting the AJAX pages to know about the _escaped_fragment_ and be able to serve up a pre-rendered page, which was a difficult process to set up and execute.
Now, according to the depreciation blogpost, the URL you entered in your sitemap.xml should be fine, since Google's bots should be able to "crawl, render, and index the #! URLs". If you really want to know if Google can understand your website, I'd recommend using their webmaster console, located at https://www.google.com/intl/en/webmasters/ . Using that tool, you can register your site with google, observe how google indexes your site, and be notified if any problems arise with your site.
In general though, getting Google to index an AJAX site is a pain. I'd strongly recommend using the webmaster console and referring to it frequently. It does help.

Prerender caching risks Google cloaking penalty?

Following some trouble getting the Google Crawler to parse our AngularJS site, we're using Prerender to serve a crawler-friendly version of our pages.
This has worked well - except Webmaster Tools indicates that our site speed has worsened considerably, due to Prerender's latency. We're concerned this will impact ranking.
So two questions:
Does Google use the Prerender pages in measuring site speed - or the (true) Javascript-enabled version of our site? We suspect it's the former.
One possible solution is to cache the Prerendered pages. However these cached pages may not perfectly match what the user sees, due to the time delay between the page being put into cache and returned to the crawler - e.g. we may add additional products to the page and the title/metatags reflect the number of products available at any one time. Are these small differences to title, meta descriptions and page content sufficient to risk a cloaking penalty? If so, what is the alternative to caching?
Many thanks for any help.

When it comes to crawl speed, Google uses the Prerender page response time. This is why it's important to cache your pages so that the Prerender server doesn't have to load the page in the browser each time. Returning cached pages will make Googlebot crawl your site very fast.
As long as you are using the ?_escaped_fragment_= protocol and not matching on the Googlebot user agent, you won't be penalized for cloaking even if the pages differ in the ways you mentioned. Just don't match on the Googlebot user agent and don't try to pad your Prerender pages with keywords and you'll be fine.

Fetch as Google Webmaster tools

I have an AngularJS SPA site which I wanted to test using google's "Fetch as Google" feature in webmaster tools. I am a little confused about the results. The screenshot from Googlebot looks correct however the response doesn't include any of the contents inside the "ui-view" (ui-router)... can someone explain what is happening here? Is google indexing the site properly since the screenshot is correct? Or is google not able to execute the JS properly for indexing?

This is a mixed bag. From some tests I've seen the GoogleBot is able to index some of the AJAX fetched content in some cases. A safe bet though to make all the search engines happy is to use prerender.io or download their open source stuff (uses PhantomJS) to have your site be easily indexable. Basically what this does is saves the version of your site after async operations have completed for a given URL and then you setup a redirect on your server that points any of the potential bots for search engines over to the preprocessed page. It sounds pretty complicated but following the instructions on the site it's not too hard to setup, and if you don't want to pay for prerender.io to serve cached copies of your pages to search engines you can run the server component yourself too.

How show page preview in Google , Facebook ,Twitter and many more social sites when url contain #!

I am working in Angula JS and Drupal in this i am facing two issue SEO and page preview when we paste url in Google,Facbook,Twitter and many more social site
i have done case for _escaped_fragment_ in url by this help url(https://developers.google.com/webmasters/ajax-crawling/docs/getting-started)
But the challange is coming when i paste my url like
http://example.com/#!/test/a/1235
then no preview is generated
How will i show preview in social sites.
Any help is appreciated.
Thanks

Welcome to Javascript application! :)
_escaped_fragment_ is not standard & Social platforms do not support it
Google developed the _escaped_fragment_ system but this not a standard.
There is many bots on the web and most of them do not understand the _escaped_fragment_ solution and do not understand Javascript applications as AngluarJS ones.
As far as I know (I worked on many JS application websites), social platforms do not use the _escaped_fragment_ system.
Moreover, some Google services do not support it yet.
URL is standard and is supported by every bots
If you want every robots to be able to crawl your website, the only way for now is to use classic URLs.
For now, you need to make sure your content is delivered on classic URLs. It is the only way to be sure that it will be interpreted by every bots of the web

This can be a bit challenging and requires a two-fold answer.
Headless Browser
The majority of crawlers and bots cannot parse the javascript in your Single Page App (SPA). Therefore, you will need some sort of headless browser to generate what those bots see. I have used PhantomJS and it works well for me. Once your headless browser is up and running you can create rewrite conditions for the _escaped_fragment_. For example in Apache:
RewriteCond %{QUERY_STRING} ^_escaped_fragment_=(.*)$
I would suggest you create rewrite conditions based on user agents as well. This would be particularly useful when detecting "FaceBot", Facebook's preview crawler, and others.
Social Applications
The other part of the solution is to read the developer docs on exactly how to manipulate the generation of these previews. Here are a couple resources for this (sorry I can't find twitter's):
Facebook's Sharing Best Practices
Google's Article Rendering
When checking you page for Facebook, they have a neat little tool that will help you troubleshoot your site/page for preview rendering: https://developers.facebook.com/tools/debug

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight