I have prerender working and running for all pages except the root(home) page.
For example:
http://www.example.com/?_escaped_fragment_=
doesn't render the dynamic data. All other pages works like
http://www.example.com/contact?_escaped_fragment_=
It's like angular UI router doesn't know what state the root it's in so it leaves the view blank. However root page http://www.example.com without escaped_fragment does render it correctly.
I have added the fragment meta tag in the header and html5 to true in angular config.
index.html:
<meta name="fragment" content="!">
app.js:
$locationProvider.html5Mode(true);
$locationProvider.hashPrefix('!');
.htaccess:
# Change YOUR_TOKEN to your prerender token and uncomment that line if you want to cache urls and view crawl stats
# Change http://example.com (at the end of the last RewriteRule) to your website url
<IfModule mod_headers.c>
#RequestHeader set X-Prerender-Token "MY TOKEN"
</IfModule>
<IfModule mod_rewrite.c>
RewriteEngine On
<IfModule mod_proxy_http.c>
RewriteCond %{HTTP_USER_AGENT} baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|outbrain|pinterest|slackbot|vkShare|W3C_Validator [NC,OR]
RewriteCond %{QUERY_STRING} _escaped_fragment_
# Only proxy the request to Prerender if it's a request for HTML
RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent|\.ttf|\.woff))(.*) http://service.prerender.io/http://www.example.com$2 [P,L]
</IfModule>
</IfModule>
# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress
Your .htaccess might have the URL set to /index.php for the homepage. If so, you could change your rewrite rule to this:
RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent|\.ttf|\.woff))(index\.php)?(.*) http://service.prerender.io/http://www.example.com$3 [P,L]
I added (index.php)? to capture that out if found and changed the $2 to $3. Don't forget to change the http://www.example.com to your domain.
Related
All of my angularjs site works with prerender except for the home page. When crawled, it sends back a 404 page. I have reason to believe it is this line of code in my .htaccess file, RewriteRule ^(.*)$ http://service.prerender.io/https://%{HTTP_HOST}/$1 [P,L] but I am not sure.
<IfModule mod_rewrite.c>
RewriteEngine On
# If requested resource exists as a file or directory
# (REQUEST_FILENAME is only relative in virtualhost context, so not usable)
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -f [OR]
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -d
# Go to it as is
RewriteRule ^ - [L]
# If non existent
# If path ends with / and is not just a single /, redirect to without the trailing /
RewriteCond %{REQUEST_URI} ^.*/$
RewriteCond %{REQUEST_URI} !^/$
RewriteRule ^(.*)/$ $1 [R,QSA,L]
# Handle Prerender.io
RequestHeader set X-Prerender-Token "notprovidingthiscode"
RewriteCond %{HTTP_USER_AGENT} baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|outbrain|pinterest [NC,OR]
RewriteCond %{QUERY_STRING} _escaped_fragment_
# Proxy the request
RewriteRule ^(.*)$ http://service.prerender.io/https://%{HTTP_HOST}/$1 [P,L]
# If non existent
# Accept everything on index.html
RewriteRule ^ /index.html
The issue turned out to be that the .htaccess file was serving example.com/index.html rather than just example.com when accessing the root of the angularjs app. That in turn didn't play well with ui-router because the $stateProvider doesn't serve filenames at the end of urls without being explicit. Accessing example.com/index.html did indeed cause my page to throw a 404 error $urlRouterProvider.otherwise('404');
Adding the following code fixed my issue.
$urlRouterProvider.when('/index.html', '/');
This redirects example.com/index.html to example.com which points to the correct rendering in prerender.io.
I am trying to set up Angular 1.5 app for server side rendering for the crawlers by using Prerender service.
And everything works fine for the inner pages but there is a problem with the main page's rendering - the crawler sees the 404 page instead of the main page.
I suppose there is a problem with some other rules in my .htaccess - except the rules for the Prerender, I use two other rules for all the pages:
rewriting urls without trailing slashes onto the urls with trailing slashes
rewriting urls with www on the urls without www
Will be appreciate for any tips!
Here is my .htaccess file for Apache serveer
RequestHeader set X-Prerender-Token "MyToken"
RewriteEngine On
RewriteCond %{HTTP_HOST} ^www.example.com$ [NC]
RewriteRule ^(.*)$ http://example.com/$1 [R=301,L]
# If an existing asset or directory is requested go to it as it is
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -f [OR]
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -d
RewriteRule ^ - [L]
RewriteCond %{REQUEST_URI} ^/$
RewriteCond %{QUERY_STRING} ^_escaped_fragment_=/?(.*)$
RewriteRule ^(.*)$ /snapshots/%1? [NC,L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*[^/])$ /$1/ [L,R=301]
<IfModule mod_proxy_http.c>
RewriteCond %{HTTP_USER_AGENT} Googlebot|bingbot|Googlebot-Mobile|Baiduspider|Yahoo|YahooSeeker|DoCoMo|Twitterbot|TweetmemeBot|Twikle|Netseer|Daumoa|SeznamBot|Ezooms|MSNBot|Exabot|MJ12bot|sogou\sspider|YandexBot|bitlybot|ia_archiver|proximic|spbot|ChangeDetection|NaverBot|MetaJobBot|magpie-crawler|Genieo\sWeb\sfilter|Qualidator.com\sBot|Woko|Vagabondo|360Spider|ExB\sLanguage\sCrawler|AddThis.com|aiHitBot|Spinn3r|BingPreview|GrapeshotCrawler|CareerBot|ZumBot|ShopWiki|bixocrawler|uMBot|sistrix|linkdexbot|AhrefsBot|archive.org_bot|SeoCheckBot|TurnitinBot|VoilaBot|SearchmetricsBot|Butterfly|Yahoo!|Plukkie|yacybot|trendictionbot|UASlinkChecker|Blekkobot|Wotbox|YioopBot|meanpathbot|TinEye|LuminateBot|FyberSpider|Infohelfer|linkdex.com|Curious\sGeorge|Fetch-Guess|ichiro|MojeekBot|SBSearch|WebThumbnail|socialbm_bot|SemrushBot|Vedma|alexa\ssite\saudit|SEOkicks-Robot|Browsershots|BLEXBot|woriobot|AMZNKAssocBot|Speedy|oBot|HostTracker|OpenWebSpider|WBSearchBot|FacebookExternalHit [NC,OR]
RewriteCond %{QUERY_STRING} _escaped_fragment_
# Only proxy the request to Prerender if it's a request for HTML
RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent|\.ttf|\.woff))(.*) http://service.prerender.io/http://example.com/$2 [P,L]
</IfModule>
# If the requested resource doesn't exist, use index.html
RewriteRule ^ /index.html
You have this section:
RewriteCond %{REQUEST_URI} ^/$
RewriteCond %{QUERY_STRING} ^_escaped_fragment_=/?(.*)$
RewriteRule ^(.*)$ /snapshots/%1? [NC,L]
Which will try to serve files from your /snapshots/ directory if _escaped_fragment_ is in the URL. That doesn't have anything to do with Prerender.io so you'll probably want to remove that section, as it could be the cause of the 404.
You're also checking Googlebot and Bingbot by their user agents which is a bad idea because they could penalize you for cloaking.
I am using Angular in combination with wordpress. I am trying to make prerender work but I don't know how to implement it. Whenever I attach ?_escaped_fragment_= to the end of the url it just shows the normal angular site and not the prerendered html.
I have added the fragment meta tag in the header and html5 to true in angular config.
.htaccess
# Change YOUR_TOKEN to your prerender token and uncomment that line if you want to cache urls and view crawl stats
# Change http://example.com (at the end of the last RewriteRule) to your website url
<IfModule mod_headers.c>
#RequestHeader set X-Prerender-Token "MY TOKEN"
</IfModule>
<IfModule mod_rewrite.c>
RewriteEngine On
<IfModule mod_proxy_http.c>
RewriteCond %{HTTP_USER_AGENT} baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|outbrain|pinterest|slackbot|vkShare|W3C_Validator [NC,OR]
RewriteCond %{QUERY_STRING} _escaped_fragment_
# Only proxy the request to Prerender if it's a request for HTML
RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent|\.ttf|\.woff))(.*) http://service.prerender.io/http://www.example.com$2 [P,L]
</IfModule>
</IfModule>
# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress
Also the apache mod_rewrite and mod_proxy modules are installed
LoadModule rewrite_module modules/mod_rewrite.so
LoadModule proxy_module modules/mod_proxy.so
LoadModule proxy_http_module modules/mod_proxy_http.so
This works:
example.com/24
This does not:
example.com
I'm not sure if the problem in htaccess or in apache conf.
.htaccess code
<IfModule mod_headers.c>
RequestHeader set X-Prerender-Token "xxxxxxxxxxxxx"
</IfModule>
<IfModule mod_rewrite.c>
RewriteEngine on
# Redirect www to non-www
RewriteCond %{HTTP_HOST} ^www\.(.*) [NC]
RewriteRule ^(.*)$ http://%1/$1 [R=301,L]
Options +FollowSymLinks
#RewriteRule ^api/(.*)$ http://vivule.ee/api/$1 [P,L]
# Don't rewrite files or directories, but exclude adminer directory
RewriteRule ^(adminer)($|/) - [L]
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^ - [L]
# Prerender.io stuff
<IfModule mod_proxy_http.c>
RewriteCond %{HTTP_USER_AGENT} Googlebot|bingbot|Googlebot-Mobile|Baiduspider|Yahoo|YahooSeeker|DoCoMo|Twitterbot|TweetmemeBot|Twikle|Netseer|Daumoa|SeznamBot|Ezooms|MSNBot|Exabot|MJ12bot|sogou\sspider|YandexBot|bitlybot|ia_archiver|proximic|spbot|ChangeDetection|NaverBot|MetaJobBot|magpie-crawler|Genieo\sWeb\sfilter|Qualidator.com\sBot|Woko|Vagabondo|360Spider|ExB\sLanguage\sCrawler|AddThis.com|aiHitBot|Spinn3r|BingPreview|GrapeshotCrawler|CareerBot|ZumBot|ShopWiki|bixocrawler|uMBot|sistrix|linkdexbot|AhrefsBot|archive.org_bot|SeoCheckBot|TurnitinBot|VoilaBot|SearchmetricsBot|Butterfly|Yahoo!|Plukkie|yacybot|trendictionbot|UASlinkChecker|Blekkobot|Wotbox|YioopBot|meanpathbot|TinEye|LuminateBot|FyberSpider|Infohelfer|linkdex.com|Curious\sGeorge|Fetch-Guess|ichiro|MojeekBot|SBSearch|WebThumbnail|socialbm_bot|SemrushBot|Vedma|alexa\ssite\saudit|SEOkicks-Robot|Browsershots|BLEXBot|woriobot|AMZNKAssocBot|Speedy|oBot|HostTracker|OpenWebSpider|WBSearchBot|FacebookExternalHit [NC,OR]
RewriteCond %{QUERY_STRING} _escaped_fragment_
# Only proxy the request to Prerender if it's a request for HTML
RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent))(.*) http://service.prerender.io/http://vivule.ee/$2 [P,L]
</IfModule>
# Rewrite everything else to index.html to allow html5 state links
RewriteRule ^adminer - [L,NC]
RewriteRule ^ index.html [L]
</IfModule>
I'm testing with this:
https://developers.facebook.com/tools/debug/og/object/
Problem webpage:
http://vivule.ee/
Ok here's how I solved it. It turns out the problem lies with Apache 2.4 setup, for some reason it bypasses prerender servers and serves raw html from original server. I got it solved by adding "DirectoryIndex" into the htaccess file. No parameters added, this sets the page index to http://example.com/.
Here is my final code:
<IfModule mod_headers.c>
RequestHeader set X-Prerender-Token "XXXXXXXXXXXX"
</IfModule>
<IfModule mod_rewrite.c>
DirectoryIndex
RewriteEngine on
# Redirect www to non-www
RewriteCond %{HTTP_HOST} ^www\.(.*) [NC]
RewriteRule ^(.*)$ http://%1/$1 [R=301,L]
Options +FollowSymLinks
#RewriteRule ^api/(.*)$ http://vivule.ee/api/$1 [P,L]
# Prerender.io stuff
<IfModule mod_proxy_http.c>
RewriteCond %{HTTP_USER_AGENT} Baiduspider|DoCoMo|Twitterbot|TweetmemeBot|Twikle|Netseer|Daumoa|SeznamBot|Ezooms|MSNBot|Exabot|MJ12bot|sogou\sspider|bitlybot|ia_archiver|proximic|spbot|ChangeDetection|NaverBot|MetaJobBot|magpie-crawler|Genieo\sWeb\sfilter|Qualidator.com\sBot|Woko|Vagabondo|360Spider|ExB\sLanguage\sCrawler|AddThis.com|aiHitBot|Spinn3r|BingPreview|GrapeshotCrawler|CareerBot|ZumBot|ShopWiki|bixocrawler|uMBot|sistrix|linkdexbot|AhrefsBot|archive.org_bot|SeoCheckBot|TurnitinBot|VoilaBot|SearchmetricsBot|Butterfly|Yahoo!|Plukkie|yacybot|trendictionbot|UASlinkChecker|Blekkobot|Wotbox|YioopBot|meanpathbot|TinEye|LuminateBot|FyberSpider|Infohelfer|linkdex.com|Curious\sGeorge|Fetch-Guess|ichiro|MojeekBot|SBSearch|WebThumbnail|socialbm_bot|SemrushBot|Vedma|alexa\ssite\saudit|SEOkicks-Robot|Browsershots|BLEXBot|woriobot|AMZNKAssocBot|Speedy|oBot|HostTracker|OpenWebSpider|WBSearchBot|FacebookExternalHit [NC,OR]
RewriteCond %{QUERY_STRING} _escaped_fragment_
# Only proxy the request to Prerender if it's a request for HTML
RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent))(.*) http://service.prerender.io/http://vivule.ee/$2 [P,L]
</IfModule>
# Don't rewrite files or directories, but exclude adminer directory
RewriteRule ^(adminer)($|/) - [L]
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^ - [L]
# Rewrite everything else to index.html to allow html5 state links
RewriteRule ^adminer - [L,NC]
RewriteRule ^ index.html [L]
</IfModule>
This could be matching your homepage and serving the index.html from before the prerender config is run
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^ - [L]
Try moving the prerender config higher up in your file.
I've got an AngularJS app running on top of CakePHP 2.6. I'd like to enable html5mode in Angular so I don't have to include a # in URLs, but doing so requires configuring some .htaccess files so that all requests hit index.html.
Not all of my web app is Angular yet, so I can't just redirect everything to index.html- instead, I'd like to set things up so that all requests to a certain subdirectory (and its subdirectories) redirect to that subdirectory's index.html.
CakePHP has a folder structure like this:
/ // Server web root directory: index.php, etc
----/app // The directory containing the actual Cake app
----/webroot // Web assets used by the Cake app
----/myapp // Where I'd like to store the index.html for my Angular app
I'd like to get this set up so that any requests to /myapp or its subdirectories get automatically redirected to /app/webroot/index.html. Does anyone have any ideas about how to go about this?
My current .htaccess files:
/:
<IfModule mod_rewrite.c>
RewriteEngine on
RewriteBase /
RewriteRule ^$ app/webroot/ [L]
RewriteRule (.*) app/webroot/$1 [L]
</IfModule>
/app:
<IfModule mod_rewrite.c>
RewriteEngine on
RewriteRule ^$ webroot/ [L]
RewriteRule (.*) webroot/$1 [L]
</IfModule>
/app/webroot:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} !^/(app/webroot/)?(img|css|js|eot)/(.*)$
RewriteRule ^(.*)$ index.php [QSA,L]
</IfModule>
I would just create a controller in CakePHP to handle all requests for myapp and render the static HTML as a view with a custom layout (if required).
Performance will not be an issue because the route for CakePHP will only be executed once. After that the AngularJS app is running and will take over all HTML5 routing locally in the browser.
While you can achieve what you want in .htaccess it limits you to running the CakePHP application on Apache. If you want to scale later to IIS or Nginx you'll have to recreate these rewrite rules.