htaccess not rerouting index page to prerender.io cache (but reroutes subpages) - angularjs

This works:
example.com/24
This does not:
example.com
I'm not sure if the problem in htaccess or in apache conf.
.htaccess code
<IfModule mod_headers.c>
RequestHeader set X-Prerender-Token "xxxxxxxxxxxxx"
</IfModule>
<IfModule mod_rewrite.c>
RewriteEngine on
# Redirect www to non-www
RewriteCond %{HTTP_HOST} ^www\.(.*) [NC]
RewriteRule ^(.*)$ http://%1/$1 [R=301,L]
Options +FollowSymLinks
#RewriteRule ^api/(.*)$ http://vivule.ee/api/$1 [P,L]
# Don't rewrite files or directories, but exclude adminer directory
RewriteRule ^(adminer)($|/) - [L]
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^ - [L]
# Prerender.io stuff
<IfModule mod_proxy_http.c>
RewriteCond %{HTTP_USER_AGENT} Googlebot|bingbot|Googlebot-Mobile|Baiduspider|Yahoo|YahooSeeker|DoCoMo|Twitterbot|TweetmemeBot|Twikle|Netseer|Daumoa|SeznamBot|Ezooms|MSNBot|Exabot|MJ12bot|sogou\sspider|YandexBot|bitlybot|ia_archiver|proximic|spbot|ChangeDetection|NaverBot|MetaJobBot|magpie-crawler|Genieo\sWeb\sfilter|Qualidator.com\sBot|Woko|Vagabondo|360Spider|ExB\sLanguage\sCrawler|AddThis.com|aiHitBot|Spinn3r|BingPreview|GrapeshotCrawler|CareerBot|ZumBot|ShopWiki|bixocrawler|uMBot|sistrix|linkdexbot|AhrefsBot|archive.org_bot|SeoCheckBot|TurnitinBot|VoilaBot|SearchmetricsBot|Butterfly|Yahoo!|Plukkie|yacybot|trendictionbot|UASlinkChecker|Blekkobot|Wotbox|YioopBot|meanpathbot|TinEye|LuminateBot|FyberSpider|Infohelfer|linkdex.com|Curious\sGeorge|Fetch-Guess|ichiro|MojeekBot|SBSearch|WebThumbnail|socialbm_bot|SemrushBot|Vedma|alexa\ssite\saudit|SEOkicks-Robot|Browsershots|BLEXBot|woriobot|AMZNKAssocBot|Speedy|oBot|HostTracker|OpenWebSpider|WBSearchBot|FacebookExternalHit [NC,OR]
RewriteCond %{QUERY_STRING} _escaped_fragment_
# Only proxy the request to Prerender if it's a request for HTML
RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent))(.*) http://service.prerender.io/http://vivule.ee/$2 [P,L]
</IfModule>
# Rewrite everything else to index.html to allow html5 state links
RewriteRule ^adminer - [L,NC]
RewriteRule ^ index.html [L]
</IfModule>
I'm testing with this:
https://developers.facebook.com/tools/debug/og/object/
Problem webpage:
http://vivule.ee/

Ok here's how I solved it. It turns out the problem lies with Apache 2.4 setup, for some reason it bypasses prerender servers and serves raw html from original server. I got it solved by adding "DirectoryIndex" into the htaccess file. No parameters added, this sets the page index to http://example.com/.
Here is my final code:
<IfModule mod_headers.c>
RequestHeader set X-Prerender-Token "XXXXXXXXXXXX"
</IfModule>
<IfModule mod_rewrite.c>
DirectoryIndex
RewriteEngine on
# Redirect www to non-www
RewriteCond %{HTTP_HOST} ^www\.(.*) [NC]
RewriteRule ^(.*)$ http://%1/$1 [R=301,L]
Options +FollowSymLinks
#RewriteRule ^api/(.*)$ http://vivule.ee/api/$1 [P,L]
# Prerender.io stuff
<IfModule mod_proxy_http.c>
RewriteCond %{HTTP_USER_AGENT} Baiduspider|DoCoMo|Twitterbot|TweetmemeBot|Twikle|Netseer|Daumoa|SeznamBot|Ezooms|MSNBot|Exabot|MJ12bot|sogou\sspider|bitlybot|ia_archiver|proximic|spbot|ChangeDetection|NaverBot|MetaJobBot|magpie-crawler|Genieo\sWeb\sfilter|Qualidator.com\sBot|Woko|Vagabondo|360Spider|ExB\sLanguage\sCrawler|AddThis.com|aiHitBot|Spinn3r|BingPreview|GrapeshotCrawler|CareerBot|ZumBot|ShopWiki|bixocrawler|uMBot|sistrix|linkdexbot|AhrefsBot|archive.org_bot|SeoCheckBot|TurnitinBot|VoilaBot|SearchmetricsBot|Butterfly|Yahoo!|Plukkie|yacybot|trendictionbot|UASlinkChecker|Blekkobot|Wotbox|YioopBot|meanpathbot|TinEye|LuminateBot|FyberSpider|Infohelfer|linkdex.com|Curious\sGeorge|Fetch-Guess|ichiro|MojeekBot|SBSearch|WebThumbnail|socialbm_bot|SemrushBot|Vedma|alexa\ssite\saudit|SEOkicks-Robot|Browsershots|BLEXBot|woriobot|AMZNKAssocBot|Speedy|oBot|HostTracker|OpenWebSpider|WBSearchBot|FacebookExternalHit [NC,OR]
RewriteCond %{QUERY_STRING} _escaped_fragment_
# Only proxy the request to Prerender if it's a request for HTML
RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent))(.*) http://service.prerender.io/http://vivule.ee/$2 [P,L]
</IfModule>
# Don't rewrite files or directories, but exclude adminer directory
RewriteRule ^(adminer)($|/) - [L]
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^ - [L]
# Rewrite everything else to index.html to allow html5 state links
RewriteRule ^adminer - [L,NC]
RewriteRule ^ index.html [L]
</IfModule>

This could be matching your homepage and serving the index.html from before the prerender config is run
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^ - [L]
Try moving the prerender config higher up in your file.

Related

Site home page not redirecting to prerender.io

Have a site built with Angular 1.5 that is using prerender.io to server pages to bots. For the most part it is working well, however, The home page of the site does not appear to be redirecting to prerender which is causing me some issues. The path to my index file for angular is /templates/index.html.
<IfModule mod_headers.c>
RequestHeader set X-Prerender-Token "xxxxxxxxxxxxxxxx"
</IfModule>
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} -s [OR]
RewriteCond %{REQUEST_FILENAME} -l [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^.*$ - [NC,L]
<IfModule mod_proxy_http.c>
RewriteCond %{HTTP_USER_AGENT} baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|outbrain|pinterest|slackbot|vkShare|W3C_Validator [NC,OR]
RewriteCond %{QUERY_STRING} _escaped_fragment_
RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.torrent|\.ttf|\.woff|\.svg|\.eot|\.woff2))(.*) http://service.prerender.io/https://www.example.com/$2 [P,L]
RewriteCond %{HTTP_HOST} ^www\. [NC]
RewriteRule ^(.+)\.xml$ http://content.example.com/$1.xml [P]
</IfModule>
RewriteRule ^(.*) templates/index.html [NC,L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)([^/])$ /$1$2/ [L,R=301]
Header set Access-Control-Allow-Origin "*"
Header set Access-Control-Allow-Methods "GET,POST,PUT,OPTIONS"
Header set Access-Control-Allow-Credentials "true"
Not great with htaccess or httpd.conf. Took over this from another developer. Any insite would be a great help.

How to remove index.php from url in cakephp 2

I have tried commenting this line present in congif/core.php,
// Configure::write('App.baseUrl', env('SCRIPT_NAME'));
But when I hit my url it gives me 404 error.
Make sure you are loading mod_rewrite correctly. You should see something like
#uncomment it by removing leading #
LoadModule rewrite_module libexec/apache2/mod_rewrite.so
Verify that your .htaccess files are actually in the right directories. Some operating systems treat files that start with ‘.’ as hidden and therefore won’t copy them.
#CAKEPHP root .htaccess
<IfModule mod_rewrite.c>
RewriteEngine on
RewriteRule ^$ app/webroot/ [L]
RewriteRule (.*) app/webroot/$1 [L]
</IfModule>
#Cakephp App directory .htacess
<IfModule mod_rewrite.c>
RewriteEngine on
RewriteRule ^$ webroot/ [L]
RewriteRule (.*) webroot/$1 [L]
</IfModule>
#Cakephp webroot directory .htaccess
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ index.php [QSA,L]
</IfModule>
URL_REWRITING
You may fix it with your web-sersver (.htaccess rules for apache, rules in configuration file for nginx).

Angular with Prerender (.htaccess settings)

I am trying to set up Angular 1.5 app for server side rendering for the crawlers by using Prerender service.
And everything works fine for the inner pages but there is a problem with the main page's rendering - the crawler sees the 404 page instead of the main page.
I suppose there is a problem with some other rules in my .htaccess - except the rules for the Prerender, I use two other rules for all the pages:
rewriting urls without trailing slashes onto the urls with trailing slashes
rewriting urls with www on the urls without www
Will be appreciate for any tips!
Here is my .htaccess file for Apache serveer
RequestHeader set X-Prerender-Token "MyToken"
RewriteEngine On
RewriteCond %{HTTP_HOST} ^www.example.com$ [NC]
RewriteRule ^(.*)$ http://example.com/$1 [R=301,L]
# If an existing asset or directory is requested go to it as it is
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -f [OR]
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -d
RewriteRule ^ - [L]
RewriteCond %{REQUEST_URI} ^/$
RewriteCond %{QUERY_STRING} ^_escaped_fragment_=/?(.*)$
RewriteRule ^(.*)$ /snapshots/%1? [NC,L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*[^/])$ /$1/ [L,R=301]
<IfModule mod_proxy_http.c>
RewriteCond %{HTTP_USER_AGENT} Googlebot|bingbot|Googlebot-Mobile|Baiduspider|Yahoo|YahooSeeker|DoCoMo|Twitterbot|TweetmemeBot|Twikle|Netseer|Daumoa|SeznamBot|Ezooms|MSNBot|Exabot|MJ12bot|sogou\sspider|YandexBot|bitlybot|ia_archiver|proximic|spbot|ChangeDetection|NaverBot|MetaJobBot|magpie-crawler|Genieo\sWeb\sfilter|Qualidator.com\sBot|Woko|Vagabondo|360Spider|ExB\sLanguage\sCrawler|AddThis.com|aiHitBot|Spinn3r|BingPreview|GrapeshotCrawler|CareerBot|ZumBot|ShopWiki|bixocrawler|uMBot|sistrix|linkdexbot|AhrefsBot|archive.org_bot|SeoCheckBot|TurnitinBot|VoilaBot|SearchmetricsBot|Butterfly|Yahoo!|Plukkie|yacybot|trendictionbot|UASlinkChecker|Blekkobot|Wotbox|YioopBot|meanpathbot|TinEye|LuminateBot|FyberSpider|Infohelfer|linkdex.com|Curious\sGeorge|Fetch-Guess|ichiro|MojeekBot|SBSearch|WebThumbnail|socialbm_bot|SemrushBot|Vedma|alexa\ssite\saudit|SEOkicks-Robot|Browsershots|BLEXBot|woriobot|AMZNKAssocBot|Speedy|oBot|HostTracker|OpenWebSpider|WBSearchBot|FacebookExternalHit [NC,OR]
RewriteCond %{QUERY_STRING} _escaped_fragment_
# Only proxy the request to Prerender if it's a request for HTML
RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent|\.ttf|\.woff))(.*) http://service.prerender.io/http://example.com/$2 [P,L]
</IfModule>
# If the requested resource doesn't exist, use index.html
RewriteRule ^ /index.html
You have this section:
RewriteCond %{REQUEST_URI} ^/$
RewriteCond %{QUERY_STRING} ^_escaped_fragment_=/?(.*)$
RewriteRule ^(.*)$ /snapshots/%1? [NC,L]
Which will try to serve files from your /snapshots/ directory if _escaped_fragment_ is in the URL. That doesn't have anything to do with Prerender.io so you'll probably want to remove that section, as it could be the cause of the 404.
You're also checking Googlebot and Bingbot by their user agents which is a bad idea because they could penalize you for cloaking.

Rewrite rule (if index.html do nothing) for .htaccess

Im trying the find a rule that says: "if index.html part of url then do nothing"
Routes
localhost/rest/token <== Symfony REST backend
localhost/index.html/dashboard <== AngularjS frontend
.htaccess
DirectoryIndex app.php
<IfModule mod_php5.c>
php_value memory_limit 1024M
php_value max_execution_time 600
</IfModule>
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{REQUEST_URI} !^admin
RewriteCond %{HTTP:Authorization} ^(.*)
RewriteRule .* - [e=HTTP_AUTHORIZATION:%1]
RewriteCond %{REQUEST_URI}::$1 ^(/.+)/(.*)::\2$
RewriteRule ^(.*) - [E=BASE:%1]
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteRule ^app\.php(/(.*)|$) %{ENV:BASE}/$2 [R=301,L]
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule .? - [L]
RewriteRule .? %{ENV:BASE}/app.php [L]
</IfModule>
<IfModule !mod_rewrite.c>
<IfModule mod_alias.c>
RedirectMatch 302 ^/$ /app.php/
</IfModule>
</IfModule>
At the top of your htaccess, put the following
RewriteEngine on
RewriteRule index\.html - [L]
This will exclude the index.html from your all rewriteRules.

How to do the SEO of angular website

I have a website in angularjs. I am using prerender.io for the SEO. But my website is still not crawlable by Google. What to do ?
My .htaccess file looks something like this.
# Handle Prerender.io
RequestHeader set X-Prerender-Token "pPSyD3N1tOziIdRgQIwT"
RewriteEngine On
<IfModule mod_proxy_http.c>
RewriteCond %{HTTP_USER_AGENT} baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|outbrain|pinterest|slac$
RewriteCond %{QUERY_STRING} _escaped_fragment_
# Only proxy the request to Prerender if it's a request for HTML
RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpe$
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -f [OR]
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -d
RewriteRule ^ - [L]
#If does not matches take to index.html
RewriteRule ^ index.html

Resources