I've coded a brand new website as a side project and I'd like it to get indexed by Google and co.
The website has been made with AngularJS and optimised for SEO (at least, I tried to).
So far, Google has indexed each page in a terrible way:
www.mywebsite.com/#!/post/title-of-this-post/
While the page has been declared in a sitemap.xml:
<url>
<loc>http://mywebsite.com/post/title-of-this-post</loc>
<lastmod>2015-08-04T00:00:00+00:00</lastmod>
<changefreq>daily</changefreq>
</url>
The Website is using HTML5 routes to remove the #! symbols.
When I try to reach the indexed page, it goes to the home one. I need to remove the trailing slash.
So far, I've been able to create the following HtAccess file:
DirectoryIndex index.html
RewriteEngine On
RewriteBase /
# BEGIN Seo crawler
RewriteCond %{QUERY_STRING} ^_escaped_fragment_=(.*)$
RewriteRule ^$ /crawler.php$1 [QSA,L]
# END Seo crawler
# BEGIN sitemap and rss
RewriteRule ^sitemap.xml$ sitemap.php [L]
RewriteRule ^rss.xml$ rssfeed.php [L]
# END sitemap and rss
# BEGIN Remove trailing slash from URLs
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} (.*)$
RewriteRule (.+)/$ http://%{HTTP_HOST}/$1 [R=301,L]
# END Remove trailing slash from URLs
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !index
RewriteRule (.*) index.html [L]
But this doesn't work.
Typing http://mywebsite.com/post/title-of-this-post/ will correctly redirect to http://mywebsite.com/post/title-of-this-post (Trailing slash is removed).
Typing www.mywebsite.com/#!/post/title-of-this-post/ will sadly redirect to www.mywebsite.com/home.
Typing www.mywebsite.com/#!/post/title-of-this-post (Without the trailing slash) will correctly redirect to http://mywebsite.com/post/title-of-this-post.
Is there a way to achieve that ?
I'm using Route UI on my AngularJS Project.
Finally, I've managed to fix it on my own:
Just add this piece of code to your AngularJS source:
app.config(function ($urlMatcherFactoryProvider) {
$urlMatcherFactoryProvider.caseInsensitive(true);
$urlMatcherFactoryProvider.strictMode(false);
});
Related
Found a problem with my site on NextJS. During development, I navigated the site using buttons and manually changing the browser address bar. It happened that I accidentally added a slash to the end, but my localhost server removed it and everything worked fine.
But everything changed when I uploaded my static application to the hosting. It automatically began to add these slashes when reloading the page. Because of this, my pictures on the site break.
As far as I understand, you need to correctly configure the .htaccess file.
Here is what it looks like now:
RewriteEngine On
RewriteRule ^([^/]+)/$ $1.html
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^([^/]+)/$ $1.html
RewriteRule ^([^/]+)/([^/]+)/$ /$1/$2.html
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !(\.[a-zA-Z0-9]{1,5}|/)$
RewriteRule (.*)$ /$1/ [R=301,L]
RewriteRule ^([^/]+)/$ $1.html
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^([^/]+)/$ $1.html
RewriteRule ^([^/]+)/([^/]+)/$ /$1/$2.html
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !(\.[a-zA-Z0-9]{1,5}|/)$
RewriteRule (.*)$ /$1/ [R=301,L]
Your existing rules are all expecting (or forcing) a trailing slash on all your URLs. So, if the canonical URL (and the URL you are linking to) does not include a trailing slash then all these rules essentially need to be reversed. However, there are other issues here (the first rule, for instance, is unconditionally rewriting the request to append the .html extension, which is repeated in the next rule with a condition.)
Try the following instead:
RewriteEngine On
# (OPTIONAL) Remove trailing slash if it happens to be on the request
# Exclude physical directories (which must end in a slash)
RewriteRule %{REQUEST_FILENAME} !-d
RewriteRule (.+)/$ /$1 [R=301,L]
# Rewrite request to corresponding ".html" file if it exists
RewriteCond %{DOCUMENT_ROOT}/$1.html -f
RewriteRule ^([^.]+)$ $1.html [L]
Your original directives only handled URLs with one or two path depth (eg. /foo/ or /foo/bar/). The second rule above handles any path depth (if so required). eg. /foo, /foo/bar, /foo/bar/baz etc. (no trailing slash).
As an optimisation I've assumed your URLs that require rewriting do not contain dots (that are otherwise used to delimit the file extension).
Note that the RewriteRule pattern (first argument) matches against the URL-path only (not the query string). If there is any query string on the initial request then this is simply passed through by default. (With regards to the rewrite and client-side JS, the query string is available on the initial request and should be parsed as before.)
Because of this, my pictures on the site break.
This will happen if you are using relative URLs to your images. You should really be using root-relative (starting with a slash) or absolute URLs to resolve this issue. See also:
404 not found - broken links to CSS, images
All of my angularjs site works with prerender except for the home page. When crawled, it sends back a 404 page. I have reason to believe it is this line of code in my .htaccess file, RewriteRule ^(.*)$ http://service.prerender.io/https://%{HTTP_HOST}/$1 [P,L] but I am not sure.
<IfModule mod_rewrite.c>
RewriteEngine On
# If requested resource exists as a file or directory
# (REQUEST_FILENAME is only relative in virtualhost context, so not usable)
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -f [OR]
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -d
# Go to it as is
RewriteRule ^ - [L]
# If non existent
# If path ends with / and is not just a single /, redirect to without the trailing /
RewriteCond %{REQUEST_URI} ^.*/$
RewriteCond %{REQUEST_URI} !^/$
RewriteRule ^(.*)/$ $1 [R,QSA,L]
# Handle Prerender.io
RequestHeader set X-Prerender-Token "notprovidingthiscode"
RewriteCond %{HTTP_USER_AGENT} baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|outbrain|pinterest [NC,OR]
RewriteCond %{QUERY_STRING} _escaped_fragment_
# Proxy the request
RewriteRule ^(.*)$ http://service.prerender.io/https://%{HTTP_HOST}/$1 [P,L]
# If non existent
# Accept everything on index.html
RewriteRule ^ /index.html
The issue turned out to be that the .htaccess file was serving example.com/index.html rather than just example.com when accessing the root of the angularjs app. That in turn didn't play well with ui-router because the $stateProvider doesn't serve filenames at the end of urls without being explicit. Accessing example.com/index.html did indeed cause my page to throw a 404 error $urlRouterProvider.otherwise('404');
Adding the following code fixed my issue.
$urlRouterProvider.when('/index.html', '/');
This redirects example.com/index.html to example.com which points to the correct rendering in prerender.io.
I have deployed a React app with React Router to my Bluehost server, and need to configure the htaccess file to redirect all of my routed URLs (/portfolio, /about, etc) to index.html instead of trying to fetch a new file from the server and throwing a 404.
I have read about countless similar problems in which the solution seems to be to add this into your htaccess file:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.html$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-l
RewriteRule . /index.html [L]
</IfModule>
I tried this, but I am still getting 404's when I try to visit any page of my site directly that isn't the homepage. I'm wondering if there is anything else in my existing htaccess file that is preventing the above code from working?
There was some code already in there from Bluehost, and I see another IfModule statement, so I'm wondering if that one is overwriting the first one. However I am afraid to edit it and break something, as it clearly says "do not edit." Here is my full htaccess code:
Header always set Content-Security-Policy: upgrade-insecure-requests
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.html$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-l
RewriteRule . /index.html [L]
</IfModule>
# php -- BEGIN cPanel-generated handler, do not edit
# Set the “ea-php74” package as the default “PHP” programming language.
<IfModule mime_module>
AddHandler application/x-httpd-ea-php74 .php .php7 .phtml
</IfModule>
# php -- END cPanel-generated handler, do not edit
# BEGIN WordPress
# The directives (lines) between "BEGIN WordPress" and "END WordPress" are
# dynamically generated, and should only be modified via WordPress filters.
# Any changes to the directives between these markers will be overwritten.
# END WordPress
Any ideas? I've double-checked that my BrowserRouter is set up correctly and also tried a few other htaccess configurations. I want to avoid using HashRouter or Node if possible but am getting frustrated. I can provide my React code as well if needed, but I'm pretty sure the error is not with the React setup.
You can create a virtual host file in the /etc/apache/sites-available folder and add this:
<VirtualHost *:8080>
ServerName example.com
DocumentRoot /var/www/httpd/example.com
<Directory "/var/www/httpd/example.com">
...
RewriteEngine on
# Don't rewrite files or directories
RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^ - [L]
# Rewrite everything else to index.html to allow html5 state links
RewriteRule ^ index.html [L]
</Directory>
</VirtualHost>
This worked for me
I'm currently running php on an apache server locally, with a React frontend.
This is how my current .htaccess is laid out:
Options -MultiViews
RewriteEngine On
RewriteRule ^api/(.*)$ api/$1\.php [L]
RewriteCond %{REQUEST_URI} !^/api.*?
RewriteRule ^ index.html [QSA,L]
The bottom condition is so that routing works in my React app. I'm then taking the production build and copying it into my htdocs.
The routing works, however, I want to be able to call the .php files inside my /api directory without using the file extension. So I want anything that comes after /api/ to be redirected to whatever is entered, followed by .php.
E.g. /api/authentication would go to /api/authentication.php, and /api/register would go to /api/register.php, and so on.
With this current setup, I'm getting a 500 internal server error when making requests to /api/authentication etc.
Is there something wrong with my .htaccess file?
Your first rule is looping as you're matching .*. You may use:
Options -MultiViews
RewriteEngine On
RewriteRule ^index\.html$ - [L,NC]
RewriteCond %{REQUEST_URI} !\.php$ [NC]
RewriteRule ^api/(.+)$ api/$1.php [L,NC]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule !^api index.html [L,NC]
RewriteCond %{REQUEST_URI} !\.php$ [NC] will skip rewriting when a URI ends with .php.
I am trying to set up Angular 1.5 app for server side rendering for the crawlers by using Prerender service.
And everything works fine for the inner pages but there is a problem with the main page's rendering - the crawler sees the 404 page instead of the main page.
I suppose there is a problem with some other rules in my .htaccess - except the rules for the Prerender, I use two other rules for all the pages:
rewriting urls without trailing slashes onto the urls with trailing slashes
rewriting urls with www on the urls without www
Will be appreciate for any tips!
Here is my .htaccess file for Apache serveer
RequestHeader set X-Prerender-Token "MyToken"
RewriteEngine On
RewriteCond %{HTTP_HOST} ^www.example.com$ [NC]
RewriteRule ^(.*)$ http://example.com/$1 [R=301,L]
# If an existing asset or directory is requested go to it as it is
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -f [OR]
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -d
RewriteRule ^ - [L]
RewriteCond %{REQUEST_URI} ^/$
RewriteCond %{QUERY_STRING} ^_escaped_fragment_=/?(.*)$
RewriteRule ^(.*)$ /snapshots/%1? [NC,L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*[^/])$ /$1/ [L,R=301]
<IfModule mod_proxy_http.c>
RewriteCond %{HTTP_USER_AGENT} Googlebot|bingbot|Googlebot-Mobile|Baiduspider|Yahoo|YahooSeeker|DoCoMo|Twitterbot|TweetmemeBot|Twikle|Netseer|Daumoa|SeznamBot|Ezooms|MSNBot|Exabot|MJ12bot|sogou\sspider|YandexBot|bitlybot|ia_archiver|proximic|spbot|ChangeDetection|NaverBot|MetaJobBot|magpie-crawler|Genieo\sWeb\sfilter|Qualidator.com\sBot|Woko|Vagabondo|360Spider|ExB\sLanguage\sCrawler|AddThis.com|aiHitBot|Spinn3r|BingPreview|GrapeshotCrawler|CareerBot|ZumBot|ShopWiki|bixocrawler|uMBot|sistrix|linkdexbot|AhrefsBot|archive.org_bot|SeoCheckBot|TurnitinBot|VoilaBot|SearchmetricsBot|Butterfly|Yahoo!|Plukkie|yacybot|trendictionbot|UASlinkChecker|Blekkobot|Wotbox|YioopBot|meanpathbot|TinEye|LuminateBot|FyberSpider|Infohelfer|linkdex.com|Curious\sGeorge|Fetch-Guess|ichiro|MojeekBot|SBSearch|WebThumbnail|socialbm_bot|SemrushBot|Vedma|alexa\ssite\saudit|SEOkicks-Robot|Browsershots|BLEXBot|woriobot|AMZNKAssocBot|Speedy|oBot|HostTracker|OpenWebSpider|WBSearchBot|FacebookExternalHit [NC,OR]
RewriteCond %{QUERY_STRING} _escaped_fragment_
# Only proxy the request to Prerender if it's a request for HTML
RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent|\.ttf|\.woff))(.*) http://service.prerender.io/http://example.com/$2 [P,L]
</IfModule>
# If the requested resource doesn't exist, use index.html
RewriteRule ^ /index.html
You have this section:
RewriteCond %{REQUEST_URI} ^/$
RewriteCond %{QUERY_STRING} ^_escaped_fragment_=/?(.*)$
RewriteRule ^(.*)$ /snapshots/%1? [NC,L]
Which will try to serve files from your /snapshots/ directory if _escaped_fragment_ is in the URL. That doesn't have anything to do with Prerender.io so you'll probably want to remove that section, as it could be the cause of the 404.
You're also checking Googlebot and Bingbot by their user agents which is a bad idea because they could penalize you for cloaking.