How to write htaccess for prerender with Angular and wordpress - angularjs

I am using Angular in combination with wordpress. I am trying to make prerender work but I don't know how to implement it. Whenever I attach ?_escaped_fragment_= to the end of the url it just shows the normal angular site and not the prerendered html.
I have added the fragment meta tag in the header and html5 to true in angular config.
.htaccess
# Change YOUR_TOKEN to your prerender token and uncomment that line if you want to cache urls and view crawl stats
# Change http://example.com (at the end of the last RewriteRule) to your website url
<IfModule mod_headers.c>
#RequestHeader set X-Prerender-Token "MY TOKEN"
</IfModule>
<IfModule mod_rewrite.c>
RewriteEngine On
<IfModule mod_proxy_http.c>
RewriteCond %{HTTP_USER_AGENT} baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|outbrain|pinterest|slackbot|vkShare|W3C_Validator [NC,OR]
RewriteCond %{QUERY_STRING} _escaped_fragment_
# Only proxy the request to Prerender if it's a request for HTML
RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent|\.ttf|\.woff))(.*) http://service.prerender.io/http://www.example.com$2 [P,L]
</IfModule>
</IfModule>
# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress
Also the apache mod_rewrite and mod_proxy modules are installed
LoadModule rewrite_module modules/mod_rewrite.so
LoadModule proxy_module modules/mod_proxy.so
LoadModule proxy_http_module modules/mod_proxy_http.so

Related

AngularJS w/Prerender 404 error on home page

All of my angularjs site works with prerender except for the home page. When crawled, it sends back a 404 page. I have reason to believe it is this line of code in my .htaccess file, RewriteRule ^(.*)$ http://service.prerender.io/https://%{HTTP_HOST}/$1 [P,L] but I am not sure.
<IfModule mod_rewrite.c>
RewriteEngine On
# If requested resource exists as a file or directory
# (REQUEST_FILENAME is only relative in virtualhost context, so not usable)
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -f [OR]
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -d
# Go to it as is
RewriteRule ^ - [L]
# If non existent
# If path ends with / and is not just a single /, redirect to without the trailing /
RewriteCond %{REQUEST_URI} ^.*/$
RewriteCond %{REQUEST_URI} !^/$
RewriteRule ^(.*)/$ $1 [R,QSA,L]
# Handle Prerender.io
RequestHeader set X-Prerender-Token "notprovidingthiscode"
RewriteCond %{HTTP_USER_AGENT} baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|outbrain|pinterest [NC,OR]
RewriteCond %{QUERY_STRING} _escaped_fragment_
# Proxy the request
RewriteRule ^(.*)$ http://service.prerender.io/https://%{HTTP_HOST}/$1 [P,L]
# If non existent
# Accept everything on index.html
RewriteRule ^ /index.html
The issue turned out to be that the .htaccess file was serving example.com/index.html rather than just example.com when accessing the root of the angularjs app. That in turn didn't play well with ui-router because the $stateProvider doesn't serve filenames at the end of urls without being explicit. Accessing example.com/index.html did indeed cause my page to throw a 404 error $urlRouterProvider.otherwise('404');
Adding the following code fixed my issue.
$urlRouterProvider.when('/index.html', '/');
This redirects example.com/index.html to example.com which points to the correct rendering in prerender.io.

prerender.io .htaccess variable - Reactjs CRA

I set up prerender.io for CRA and it works well, but when bot hits URL without parameters it puts in the end of URL - string ".var"
I tried variations of (.*) but it seems not working. Any ideas?
Here is .htaccess file
Options +FollowSymLinks
RewriteEngine On
RewriteCond %{HTTP_HOST} ^www\.(.*)$ [NC]
RewriteRule ^(.*)$ https://%1/$1 [R=301,L]
RewriteCond %{HTTPS} !on
RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]
<IfModule mod_headers.c>
RequestHeader set X-Prerender-Token "TOKEN"
RequestHeader set X-Prerender-Version "prerender-apache#2.0.0"
</IfModule>
<IfModule mod_rewrite.c>
RewriteEngine On
<IfModule mod_proxy_http.c>
RewriteCond %{HTTP_USER_AGENT} googlebot|bingbot|yandex|baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|outbrain|pinterest\/0\.|pinterestbot|slackbot|vkShare|W3C_Validator|whatsapp [NC,OR]
RewriteCond %{QUERY_STRING} _escaped_fragment_
RewriteCond %{REQUEST_URI} ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent|\.ttf|\.woff|\.svg))
RewriteRule ^(index\.html|index\.php)?(.*) https://service.prerender.io/%{REQUEST_SCHEME}://%{HTTP_HOST}/$2 [P,END]
</IfModule>
</IfModule>
Lately #MrWhite gave us another, better and simple solution - just add DirectoryIndex index.html to .htaccess file will do the same.
From the beginning I wrote that DirectoryIndex is working but NO!
It seems it's working when you try prerender.io, but in reality it was showing website like this:
and I had to remove it.
So it was not issue with .htaccess file, it was coming from the server.
What I did was I went into WHM->Apache Configurations->DirectoryIndex Priority and I saw this list
and yes that was it!
To fix I just moved index.html to the very top second comes index.html.var and after rest of them.
I don't know what index.html.var is for, but I did not risk just to remove it. Hope it helps someone who struggled as me.

Angular with Prerender (.htaccess settings)

I am trying to set up Angular 1.5 app for server side rendering for the crawlers by using Prerender service.
And everything works fine for the inner pages but there is a problem with the main page's rendering - the crawler sees the 404 page instead of the main page.
I suppose there is a problem with some other rules in my .htaccess - except the rules for the Prerender, I use two other rules for all the pages:
rewriting urls without trailing slashes onto the urls with trailing slashes
rewriting urls with www on the urls without www
Will be appreciate for any tips!
Here is my .htaccess file for Apache serveer
RequestHeader set X-Prerender-Token "MyToken"
RewriteEngine On
RewriteCond %{HTTP_HOST} ^www.example.com$ [NC]
RewriteRule ^(.*)$ http://example.com/$1 [R=301,L]
# If an existing asset or directory is requested go to it as it is
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -f [OR]
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -d
RewriteRule ^ - [L]
RewriteCond %{REQUEST_URI} ^/$
RewriteCond %{QUERY_STRING} ^_escaped_fragment_=/?(.*)$
RewriteRule ^(.*)$ /snapshots/%1? [NC,L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*[^/])$ /$1/ [L,R=301]
<IfModule mod_proxy_http.c>
RewriteCond %{HTTP_USER_AGENT} Googlebot|bingbot|Googlebot-Mobile|Baiduspider|Yahoo|YahooSeeker|DoCoMo|Twitterbot|TweetmemeBot|Twikle|Netseer|Daumoa|SeznamBot|Ezooms|MSNBot|Exabot|MJ12bot|sogou\sspider|YandexBot|bitlybot|ia_archiver|proximic|spbot|ChangeDetection|NaverBot|MetaJobBot|magpie-crawler|Genieo\sWeb\sfilter|Qualidator.com\sBot|Woko|Vagabondo|360Spider|ExB\sLanguage\sCrawler|AddThis.com|aiHitBot|Spinn3r|BingPreview|GrapeshotCrawler|CareerBot|ZumBot|ShopWiki|bixocrawler|uMBot|sistrix|linkdexbot|AhrefsBot|archive.org_bot|SeoCheckBot|TurnitinBot|VoilaBot|SearchmetricsBot|Butterfly|Yahoo!|Plukkie|yacybot|trendictionbot|UASlinkChecker|Blekkobot|Wotbox|YioopBot|meanpathbot|TinEye|LuminateBot|FyberSpider|Infohelfer|linkdex.com|Curious\sGeorge|Fetch-Guess|ichiro|MojeekBot|SBSearch|WebThumbnail|socialbm_bot|SemrushBot|Vedma|alexa\ssite\saudit|SEOkicks-Robot|Browsershots|BLEXBot|woriobot|AMZNKAssocBot|Speedy|oBot|HostTracker|OpenWebSpider|WBSearchBot|FacebookExternalHit [NC,OR]
RewriteCond %{QUERY_STRING} _escaped_fragment_
# Only proxy the request to Prerender if it's a request for HTML
RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent|\.ttf|\.woff))(.*) http://service.prerender.io/http://example.com/$2 [P,L]
</IfModule>
# If the requested resource doesn't exist, use index.html
RewriteRule ^ /index.html
You have this section:
RewriteCond %{REQUEST_URI} ^/$
RewriteCond %{QUERY_STRING} ^_escaped_fragment_=/?(.*)$
RewriteRule ^(.*)$ /snapshots/%1? [NC,L]
Which will try to serve files from your /snapshots/ directory if _escaped_fragment_ is in the URL. That doesn't have anything to do with Prerender.io so you'll probably want to remove that section, as it could be the cause of the 404.
You're also checking Googlebot and Bingbot by their user agents which is a bad idea because they could penalize you for cloaking.

Prerender with Angular UI Router not rendering root page dynamic content

I have prerender working and running for all pages except the root(home) page.
For example:
http://www.example.com/?_escaped_fragment_=
doesn't render the dynamic data. All other pages works like
http://www.example.com/contact?_escaped_fragment_=
It's like angular UI router doesn't know what state the root it's in so it leaves the view blank. However root page http://www.example.com without escaped_fragment does render it correctly.
I have added the fragment meta tag in the header and html5 to true in angular config.
index.html:
<meta name="fragment" content="!">
app.js:
$locationProvider.html5Mode(true);
$locationProvider.hashPrefix('!');
.htaccess:
# Change YOUR_TOKEN to your prerender token and uncomment that line if you want to cache urls and view crawl stats
# Change http://example.com (at the end of the last RewriteRule) to your website url
<IfModule mod_headers.c>
#RequestHeader set X-Prerender-Token "MY TOKEN"
</IfModule>
<IfModule mod_rewrite.c>
RewriteEngine On
<IfModule mod_proxy_http.c>
RewriteCond %{HTTP_USER_AGENT} baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|outbrain|pinterest|slackbot|vkShare|W3C_Validator [NC,OR]
RewriteCond %{QUERY_STRING} _escaped_fragment_
# Only proxy the request to Prerender if it's a request for HTML
RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent|\.ttf|\.woff))(.*) http://service.prerender.io/http://www.example.com$2 [P,L]
</IfModule>
</IfModule>
# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress
Your .htaccess might have the URL set to /index.php for the homepage. If so, you could change your rewrite rule to this:
RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent|\.ttf|\.woff))(index\.php)?(.*) http://service.prerender.io/http://www.example.com$3 [P,L]
I added (index.php)? to capture that out if found and changed the $2 to $3. Don't forget to change the http://www.example.com to your domain.

htaccess not rerouting index page to prerender.io cache (but reroutes subpages)

This works:
example.com/24
This does not:
example.com
I'm not sure if the problem in htaccess or in apache conf.
.htaccess code
<IfModule mod_headers.c>
RequestHeader set X-Prerender-Token "xxxxxxxxxxxxx"
</IfModule>
<IfModule mod_rewrite.c>
RewriteEngine on
# Redirect www to non-www
RewriteCond %{HTTP_HOST} ^www\.(.*) [NC]
RewriteRule ^(.*)$ http://%1/$1 [R=301,L]
Options +FollowSymLinks
#RewriteRule ^api/(.*)$ http://vivule.ee/api/$1 [P,L]
# Don't rewrite files or directories, but exclude adminer directory
RewriteRule ^(adminer)($|/) - [L]
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^ - [L]
# Prerender.io stuff
<IfModule mod_proxy_http.c>
RewriteCond %{HTTP_USER_AGENT} Googlebot|bingbot|Googlebot-Mobile|Baiduspider|Yahoo|YahooSeeker|DoCoMo|Twitterbot|TweetmemeBot|Twikle|Netseer|Daumoa|SeznamBot|Ezooms|MSNBot|Exabot|MJ12bot|sogou\sspider|YandexBot|bitlybot|ia_archiver|proximic|spbot|ChangeDetection|NaverBot|MetaJobBot|magpie-crawler|Genieo\sWeb\sfilter|Qualidator.com\sBot|Woko|Vagabondo|360Spider|ExB\sLanguage\sCrawler|AddThis.com|aiHitBot|Spinn3r|BingPreview|GrapeshotCrawler|CareerBot|ZumBot|ShopWiki|bixocrawler|uMBot|sistrix|linkdexbot|AhrefsBot|archive.org_bot|SeoCheckBot|TurnitinBot|VoilaBot|SearchmetricsBot|Butterfly|Yahoo!|Plukkie|yacybot|trendictionbot|UASlinkChecker|Blekkobot|Wotbox|YioopBot|meanpathbot|TinEye|LuminateBot|FyberSpider|Infohelfer|linkdex.com|Curious\sGeorge|Fetch-Guess|ichiro|MojeekBot|SBSearch|WebThumbnail|socialbm_bot|SemrushBot|Vedma|alexa\ssite\saudit|SEOkicks-Robot|Browsershots|BLEXBot|woriobot|AMZNKAssocBot|Speedy|oBot|HostTracker|OpenWebSpider|WBSearchBot|FacebookExternalHit [NC,OR]
RewriteCond %{QUERY_STRING} _escaped_fragment_
# Only proxy the request to Prerender if it's a request for HTML
RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent))(.*) http://service.prerender.io/http://vivule.ee/$2 [P,L]
</IfModule>
# Rewrite everything else to index.html to allow html5 state links
RewriteRule ^adminer - [L,NC]
RewriteRule ^ index.html [L]
</IfModule>
I'm testing with this:
https://developers.facebook.com/tools/debug/og/object/
Problem webpage:
http://vivule.ee/
Ok here's how I solved it. It turns out the problem lies with Apache 2.4 setup, for some reason it bypasses prerender servers and serves raw html from original server. I got it solved by adding "DirectoryIndex" into the htaccess file. No parameters added, this sets the page index to http://example.com/.
Here is my final code:
<IfModule mod_headers.c>
RequestHeader set X-Prerender-Token "XXXXXXXXXXXX"
</IfModule>
<IfModule mod_rewrite.c>
DirectoryIndex
RewriteEngine on
# Redirect www to non-www
RewriteCond %{HTTP_HOST} ^www\.(.*) [NC]
RewriteRule ^(.*)$ http://%1/$1 [R=301,L]
Options +FollowSymLinks
#RewriteRule ^api/(.*)$ http://vivule.ee/api/$1 [P,L]
# Prerender.io stuff
<IfModule mod_proxy_http.c>
RewriteCond %{HTTP_USER_AGENT} Baiduspider|DoCoMo|Twitterbot|TweetmemeBot|Twikle|Netseer|Daumoa|SeznamBot|Ezooms|MSNBot|Exabot|MJ12bot|sogou\sspider|bitlybot|ia_archiver|proximic|spbot|ChangeDetection|NaverBot|MetaJobBot|magpie-crawler|Genieo\sWeb\sfilter|Qualidator.com\sBot|Woko|Vagabondo|360Spider|ExB\sLanguage\sCrawler|AddThis.com|aiHitBot|Spinn3r|BingPreview|GrapeshotCrawler|CareerBot|ZumBot|ShopWiki|bixocrawler|uMBot|sistrix|linkdexbot|AhrefsBot|archive.org_bot|SeoCheckBot|TurnitinBot|VoilaBot|SearchmetricsBot|Butterfly|Yahoo!|Plukkie|yacybot|trendictionbot|UASlinkChecker|Blekkobot|Wotbox|YioopBot|meanpathbot|TinEye|LuminateBot|FyberSpider|Infohelfer|linkdex.com|Curious\sGeorge|Fetch-Guess|ichiro|MojeekBot|SBSearch|WebThumbnail|socialbm_bot|SemrushBot|Vedma|alexa\ssite\saudit|SEOkicks-Robot|Browsershots|BLEXBot|woriobot|AMZNKAssocBot|Speedy|oBot|HostTracker|OpenWebSpider|WBSearchBot|FacebookExternalHit [NC,OR]
RewriteCond %{QUERY_STRING} _escaped_fragment_
# Only proxy the request to Prerender if it's a request for HTML
RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent))(.*) http://service.prerender.io/http://vivule.ee/$2 [P,L]
</IfModule>
# Don't rewrite files or directories, but exclude adminer directory
RewriteRule ^(adminer)($|/) - [L]
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^ - [L]
# Rewrite everything else to index.html to allow html5 state links
RewriteRule ^adminer - [L,NC]
RewriteRule ^ index.html [L]
</IfModule>
This could be matching your homepage and serving the index.html from before the prerender config is run
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^ - [L]
Try moving the prerender config higher up in your file.

Resources