I am trying to set up Angular 1.5 app for server side rendering for the crawlers by using Prerender service.
And everything works fine for the inner pages but there is a problem with the main page's rendering - the crawler sees the 404 page instead of the main page.
I suppose there is a problem with some other rules in my .htaccess - except the rules for the Prerender, I use two other rules for all the pages:
rewriting urls without trailing slashes onto the urls with trailing slashes
rewriting urls with www on the urls without www
Will be appreciate for any tips!
Here is my .htaccess file for Apache serveer
RequestHeader set X-Prerender-Token "MyToken"
RewriteEngine On
RewriteCond %{HTTP_HOST} ^www.example.com$ [NC]
RewriteRule ^(.*)$ http://example.com/$1 [R=301,L]
# If an existing asset or directory is requested go to it as it is
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -f [OR]
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -d
RewriteRule ^ - [L]
RewriteCond %{REQUEST_URI} ^/$
RewriteCond %{QUERY_STRING} ^_escaped_fragment_=/?(.*)$
RewriteRule ^(.*)$ /snapshots/%1? [NC,L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*[^/])$ /$1/ [L,R=301]
<IfModule mod_proxy_http.c>
RewriteCond %{HTTP_USER_AGENT} Googlebot|bingbot|Googlebot-Mobile|Baiduspider|Yahoo|YahooSeeker|DoCoMo|Twitterbot|TweetmemeBot|Twikle|Netseer|Daumoa|SeznamBot|Ezooms|MSNBot|Exabot|MJ12bot|sogou\sspider|YandexBot|bitlybot|ia_archiver|proximic|spbot|ChangeDetection|NaverBot|MetaJobBot|magpie-crawler|Genieo\sWeb\sfilter|Qualidator.com\sBot|Woko|Vagabondo|360Spider|ExB\sLanguage\sCrawler|AddThis.com|aiHitBot|Spinn3r|BingPreview|GrapeshotCrawler|CareerBot|ZumBot|ShopWiki|bixocrawler|uMBot|sistrix|linkdexbot|AhrefsBot|archive.org_bot|SeoCheckBot|TurnitinBot|VoilaBot|SearchmetricsBot|Butterfly|Yahoo!|Plukkie|yacybot|trendictionbot|UASlinkChecker|Blekkobot|Wotbox|YioopBot|meanpathbot|TinEye|LuminateBot|FyberSpider|Infohelfer|linkdex.com|Curious\sGeorge|Fetch-Guess|ichiro|MojeekBot|SBSearch|WebThumbnail|socialbm_bot|SemrushBot|Vedma|alexa\ssite\saudit|SEOkicks-Robot|Browsershots|BLEXBot|woriobot|AMZNKAssocBot|Speedy|oBot|HostTracker|OpenWebSpider|WBSearchBot|FacebookExternalHit [NC,OR]
RewriteCond %{QUERY_STRING} _escaped_fragment_
# Only proxy the request to Prerender if it's a request for HTML
RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent|\.ttf|\.woff))(.*) http://service.prerender.io/http://example.com/$2 [P,L]
</IfModule>
# If the requested resource doesn't exist, use index.html
RewriteRule ^ /index.html
You have this section:
RewriteCond %{REQUEST_URI} ^/$
RewriteCond %{QUERY_STRING} ^_escaped_fragment_=/?(.*)$
RewriteRule ^(.*)$ /snapshots/%1? [NC,L]
Which will try to serve files from your /snapshots/ directory if _escaped_fragment_ is in the URL. That doesn't have anything to do with Prerender.io so you'll probably want to remove that section, as it could be the cause of the 404.
You're also checking Googlebot and Bingbot by their user agents which is a bad idea because they could penalize you for cloaking.
Related
I have two react apps in the same subdomain - in different directories.
app.domain.com
app.domain.com/login
For react router to work I have existing .htaccess rules to redirect all traffic to the app (index.html)
RewriteEngine On
#Redirect to https
RewriteCond %{HTTPS} off
RewriteRule ^(.*)$ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
RewriteBase /
#Not sure what this does
RewriteCond %{QUERY_STRING} ^open&guid=
RewriteRule ^ ? [R=301,L,NE]
#Not sure what this does
RewriteRule ^index\.html$ - [L]
#Redirect all to index.html
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.html [L]
I want to allow to load the react app in directory /login and redirect all /login/* requests to /login/index.html.
How can I apply these rules using .htaccess?
Kind regards /K
Your existing .htaccess is fine. Just create another .htaccess under /login/ directory as this:
RewriteEngine On
#Redirect all to index.html
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . index.html [L]
This will route every non-file and non-directory /login/* request to /login/index.html
With your shown samples, could you please try following. Please make sure you clear your browser cache before testing URLs.
RewriteRule ^login/?$ login/index.html [NC,L]
OR uri has starting login with having other things in uri.
RewriteRule ^login(?!=index\.html) login/index.html [NC,L]
Have a site built with Angular 1.5 that is using prerender.io to server pages to bots. For the most part it is working well, however, The home page of the site does not appear to be redirecting to prerender which is causing me some issues. The path to my index file for angular is /templates/index.html.
<IfModule mod_headers.c>
RequestHeader set X-Prerender-Token "xxxxxxxxxxxxxxxx"
</IfModule>
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} -s [OR]
RewriteCond %{REQUEST_FILENAME} -l [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^.*$ - [NC,L]
<IfModule mod_proxy_http.c>
RewriteCond %{HTTP_USER_AGENT} baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|outbrain|pinterest|slackbot|vkShare|W3C_Validator [NC,OR]
RewriteCond %{QUERY_STRING} _escaped_fragment_
RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.torrent|\.ttf|\.woff|\.svg|\.eot|\.woff2))(.*) http://service.prerender.io/https://www.example.com/$2 [P,L]
RewriteCond %{HTTP_HOST} ^www\. [NC]
RewriteRule ^(.+)\.xml$ http://content.example.com/$1.xml [P]
</IfModule>
RewriteRule ^(.*) templates/index.html [NC,L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)([^/])$ /$1$2/ [L,R=301]
Header set Access-Control-Allow-Origin "*"
Header set Access-Control-Allow-Methods "GET,POST,PUT,OPTIONS"
Header set Access-Control-Allow-Credentials "true"
Not great with htaccess or httpd.conf. Took over this from another developer. Any insite would be a great help.
This works:
example.com/24
This does not:
example.com
I'm not sure if the problem in htaccess or in apache conf.
.htaccess code
<IfModule mod_headers.c>
RequestHeader set X-Prerender-Token "xxxxxxxxxxxxx"
</IfModule>
<IfModule mod_rewrite.c>
RewriteEngine on
# Redirect www to non-www
RewriteCond %{HTTP_HOST} ^www\.(.*) [NC]
RewriteRule ^(.*)$ http://%1/$1 [R=301,L]
Options +FollowSymLinks
#RewriteRule ^api/(.*)$ http://vivule.ee/api/$1 [P,L]
# Don't rewrite files or directories, but exclude adminer directory
RewriteRule ^(adminer)($|/) - [L]
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^ - [L]
# Prerender.io stuff
<IfModule mod_proxy_http.c>
RewriteCond %{HTTP_USER_AGENT} Googlebot|bingbot|Googlebot-Mobile|Baiduspider|Yahoo|YahooSeeker|DoCoMo|Twitterbot|TweetmemeBot|Twikle|Netseer|Daumoa|SeznamBot|Ezooms|MSNBot|Exabot|MJ12bot|sogou\sspider|YandexBot|bitlybot|ia_archiver|proximic|spbot|ChangeDetection|NaverBot|MetaJobBot|magpie-crawler|Genieo\sWeb\sfilter|Qualidator.com\sBot|Woko|Vagabondo|360Spider|ExB\sLanguage\sCrawler|AddThis.com|aiHitBot|Spinn3r|BingPreview|GrapeshotCrawler|CareerBot|ZumBot|ShopWiki|bixocrawler|uMBot|sistrix|linkdexbot|AhrefsBot|archive.org_bot|SeoCheckBot|TurnitinBot|VoilaBot|SearchmetricsBot|Butterfly|Yahoo!|Plukkie|yacybot|trendictionbot|UASlinkChecker|Blekkobot|Wotbox|YioopBot|meanpathbot|TinEye|LuminateBot|FyberSpider|Infohelfer|linkdex.com|Curious\sGeorge|Fetch-Guess|ichiro|MojeekBot|SBSearch|WebThumbnail|socialbm_bot|SemrushBot|Vedma|alexa\ssite\saudit|SEOkicks-Robot|Browsershots|BLEXBot|woriobot|AMZNKAssocBot|Speedy|oBot|HostTracker|OpenWebSpider|WBSearchBot|FacebookExternalHit [NC,OR]
RewriteCond %{QUERY_STRING} _escaped_fragment_
# Only proxy the request to Prerender if it's a request for HTML
RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent))(.*) http://service.prerender.io/http://vivule.ee/$2 [P,L]
</IfModule>
# Rewrite everything else to index.html to allow html5 state links
RewriteRule ^adminer - [L,NC]
RewriteRule ^ index.html [L]
</IfModule>
I'm testing with this:
https://developers.facebook.com/tools/debug/og/object/
Problem webpage:
http://vivule.ee/
Ok here's how I solved it. It turns out the problem lies with Apache 2.4 setup, for some reason it bypasses prerender servers and serves raw html from original server. I got it solved by adding "DirectoryIndex" into the htaccess file. No parameters added, this sets the page index to http://example.com/.
Here is my final code:
<IfModule mod_headers.c>
RequestHeader set X-Prerender-Token "XXXXXXXXXXXX"
</IfModule>
<IfModule mod_rewrite.c>
DirectoryIndex
RewriteEngine on
# Redirect www to non-www
RewriteCond %{HTTP_HOST} ^www\.(.*) [NC]
RewriteRule ^(.*)$ http://%1/$1 [R=301,L]
Options +FollowSymLinks
#RewriteRule ^api/(.*)$ http://vivule.ee/api/$1 [P,L]
# Prerender.io stuff
<IfModule mod_proxy_http.c>
RewriteCond %{HTTP_USER_AGENT} Baiduspider|DoCoMo|Twitterbot|TweetmemeBot|Twikle|Netseer|Daumoa|SeznamBot|Ezooms|MSNBot|Exabot|MJ12bot|sogou\sspider|bitlybot|ia_archiver|proximic|spbot|ChangeDetection|NaverBot|MetaJobBot|magpie-crawler|Genieo\sWeb\sfilter|Qualidator.com\sBot|Woko|Vagabondo|360Spider|ExB\sLanguage\sCrawler|AddThis.com|aiHitBot|Spinn3r|BingPreview|GrapeshotCrawler|CareerBot|ZumBot|ShopWiki|bixocrawler|uMBot|sistrix|linkdexbot|AhrefsBot|archive.org_bot|SeoCheckBot|TurnitinBot|VoilaBot|SearchmetricsBot|Butterfly|Yahoo!|Plukkie|yacybot|trendictionbot|UASlinkChecker|Blekkobot|Wotbox|YioopBot|meanpathbot|TinEye|LuminateBot|FyberSpider|Infohelfer|linkdex.com|Curious\sGeorge|Fetch-Guess|ichiro|MojeekBot|SBSearch|WebThumbnail|socialbm_bot|SemrushBot|Vedma|alexa\ssite\saudit|SEOkicks-Robot|Browsershots|BLEXBot|woriobot|AMZNKAssocBot|Speedy|oBot|HostTracker|OpenWebSpider|WBSearchBot|FacebookExternalHit [NC,OR]
RewriteCond %{QUERY_STRING} _escaped_fragment_
# Only proxy the request to Prerender if it's a request for HTML
RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent))(.*) http://service.prerender.io/http://vivule.ee/$2 [P,L]
</IfModule>
# Don't rewrite files or directories, but exclude adminer directory
RewriteRule ^(adminer)($|/) - [L]
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^ - [L]
# Rewrite everything else to index.html to allow html5 state links
RewriteRule ^adminer - [L,NC]
RewriteRule ^ index.html [L]
</IfModule>
This could be matching your homepage and serving the index.html from before the prerender config is run
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^ - [L]
Try moving the prerender config higher up in your file.
I have a website in angularjs. I am using prerender.io for the SEO. But my website is still not crawlable by Google. What to do ?
My .htaccess file looks something like this.
# Handle Prerender.io
RequestHeader set X-Prerender-Token "pPSyD3N1tOziIdRgQIwT"
RewriteEngine On
<IfModule mod_proxy_http.c>
RewriteCond %{HTTP_USER_AGENT} baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|outbrain|pinterest|slac$
RewriteCond %{QUERY_STRING} _escaped_fragment_
# Only proxy the request to Prerender if it's a request for HTML
RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpe$
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -f [OR]
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -d
RewriteRule ^ - [L]
#If does not matches take to index.html
RewriteRule ^ index.html
I've got an Angular app using html5mode, but it's currently sitting within a sub-folder of a site: http://somesite.com/za/
So I've managed to rewrite the url to direct to my assets folder correctly, but I'm still getting 404s when I hit refresh or navigate directly to a page.
So this works: http://somesite.com/za/assets/js/bundle.js
This doesn't work, 404: http://somesite.com/za/home
.htaccess
RewriteEngine On
#This apparently creates a redirect loop for angular
RewriteRule ^(.*)/$ /$1 [L,R=301]
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(za/.*) /za/index.html [NC]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(?:[^/]*/)*([^/.]+\.(?:jpe?g|gif|bmp|png||css|js))$ za/assets/$1 [R=301,L,NC]
AddType image/svg+xml svg svgz
AddEncoding gzip svgz
So where is the rewrite failing to redirect to index.html when I hit or refresh a page?
Solved it with a more elegant rewrite rule:
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} -s [OR]
RewriteCond %{REQUEST_FILENAME} -l [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^.*$ - [NC,L]
RewriteRule ^(.*) /za/index.html [NC,L]
However, this does give read access to other files and directories, so it can have security issues.