How to redirect crawlers requests to pre-rendered pages when using Amazon S3? - angularjs

Problem
I have a static SPA site built with Angular and hosted on Amazon S3. I'm trying to make my pre-rendered pages accessible by crawlers, but I can't redirect the crawlers requests since Amazon S3 does not offer a URL Rewrite option and the Redirect rules are limited.
What I have
I've added the following meta-tag to the <head> of my index.html page:
<meta name="fragment" content="!">
Also, my SPA is using pretty URLs (without the hash # sign) with HTML5 push state.
With this setup, when a crawler finds my http://mywebsite.com/about link, it will make a GET request to http://mywebsite.com/about?_escaped_fragment_=. This is a pattern defined by Google and followed by others crawlers.
What I need is to answer this request with a pre-rendered version of the about.html file. I've already done this pre-rendering with Phantom.js, but I can't serve the correct file to crawlers because Amazon S3 do not have a rewrite rule.
In a nginx server, the solution would be to add a rewrite rule like:
location / {
if ($args ~ "_escaped_fragment_=") {
rewrite ^/(.*)$ /snapshots/$1.html break;
}
}
But in Amazon S3, I'm limited by their redirect rules based on KeyPrefixes and HttpErrorCodes. The ?_escaped_fragment_= is not a KeyPrefix, since it appears at the end of the URL, and it gives no HTTP error since Angular will ignore it.
What I've tried
I've started trying using dynamic templates with ngRoute, but later I've realized that I can't solve this with any Angular solution since I'm targeting crawlers that can't execute JavaScript.
With Amazon S3, I have to stick with their redirect rules.
I've managed to get it working with an ugly workaround. If I create a new rule for each page, I'm done:
<RoutingRules>
<!-- each page needs it own rule -->
<RoutingRule>
<Condition>
<KeyPrefixEquals>about?_escaped_fragment_=</KeyPrefixEquals>
</Condition>
<Redirect>
<HostName>mywebsite.com</HostName>
<ReplaceKeyPrefixWith>snapshots/about.html</ReplaceKeyPrefixWith>
</Redirect>
</RoutingRule>
</RoutingRules>
As you can see in this solution, each page will need its own rule. Since Amazon limits to only 50 redirect rules, this is not a viable solution.
Another solution would be to forget about pretty URLs and use hashbangs. With this, my link would be http://mywebsite.com/#!about and crawlers would request this with http://mywebsite.com/?_escaped_fragment_=about. Since the URL will start with ?_escaped_fragment_=, it can be captured with the KeyPrefix and just one redirect rule would be enough. However, I don't want to use ugly URLs.
So, how can I have a static SPA in Amazon S3 and be SEO-friendly?

Short Answer
Amazon S3 (and Amazon CloudFront) does not offer rewrite rules and have only limited redirect options. However, you don't need to redirect or rewrite your URL requests. Just pre-render all HTML files and upload them following your website paths.
Since a user browsing the webpage has JavaScript enabled, Angular will be triggered and will take control over the page which results into a re-rendering of the template. With this, all Angular functionalities will be available for this user.
Regarding the crawler, the pre-rendered page will be enough.
Example
If you have a website named www.myblog.com and a link to another page with the URL www.myblog.com/posts/my-first-post. Probably, your Angular app has the following structure: an index.html file that is in your root directory and is responsible for everything. The page my-first-post is a partial HTML file located in /partials/my-first-post.html.
The solution in this case is to use a pre-rendering tool at deploy time. You can use PhantomJS for this, but you can't use a middleware tool like Prerender because you have a static site hosted in Amazon S3.
You need to use this pre-render tool to create two files: index.html and my-first-post. Note that my-first-post will be an HTML file without the .html extension, but you will need to set its Content-Type to text/html when you upload to Amazon S3.
You will place the index.html file in your root directory and my-first-post inside a folder named posts to match your URL path /posts/my-first-post.
With this approach, the crawler will be able to retrieve your HTML file and the user will be happy to use all Angular functionalities.
Note: this solution requires that all files be referenced using the root path. Relative paths will not work if you visit the link www.myblog.com/posts/my-first-post.
By root path, I mean:
<script src="/js/myfile.js"></script>
The wrong way, using relative paths, would be:
<script src="js/myfile.js"></script>
EDIT:
Below follows a small JavaScript code that I've used to prerender pages using PhantomJS. After installing PhantomJS and testing the script with a single page, add to your build process a script to prerender all pages before deploying your site.
var fs = require('fs');
var webPage = require('webpage');
var page = webPage.create();
// since this tool will run before your production deploy,
// your target URL will be your dev/staging environment (localhost, in this example)
var path = 'pages/my-page';
var url = 'http://localhost/' + path;
page.open(url, function (status) {
if (status != 'success')
throw 'Error trying to prerender ' + url;
var content = page.content;
fs.write(path, content, 'w');
console.log("The file was saved.");
phantom.exit();
});
Note: it looks like Node.js, but it isn't. It must be executed with Phantom executable and not Node.

Related

Removing the need for pathing in cloudFront distribution of S3 bucket requiring .html at the end of the page name, in Next.js project

I have a Next.js, React, Ts project that exists on a S3 bucket as a static site and is distributed via cloudFront.
The problem I'm running into is for me to go a different page I have to append .html at the end of the page name.
So mysite.com/profile will return a <Code>NoSuchKey</Code> error, however mysite.com/profile.html will route me correctly.
Is there some way to remove this necessity?
If this is a next issue i'm using
npx next build
npx next export
To build and export the /out directory which I then upload to my S3 bucket
my next.config.js
module.exports = {
target: "serverless"
}
I had it like this as I was originally making use of serverless for Next but have since moved away from it as I'm largely making use of client-side rendering and don't need any of the features it was providing and I am still in the process of doing a cleanup on the project.
Routing in S3 is done with exact match of the file name. You can remove .html extension to use routing as you like. And set metadata Content-type to text/html, to view it properly in browser

Cannot access pages with direct url after building project for deployment Spring and React

When running my spring app from my IDE and running the React app from within VSCode, everything worked perfectly. I used the build script to build my React project, and then put the output into my /static folder of Spring. Then I used mvn clean install to build the .jar file. After running the entire app from the .jar file, I can access my homepage with localhost:5000. I can also use my navbar links to access different parts of the website, like the Home page and the About page... But if I try to manually enter the url localhost:5000/about I get a 404 Not found error.. What am I doing wrong?
My guess is that your Spring (webmvc?) application is not configured to listen to different URLs other than /. And while it may seem as if the navbar successfully redirects to http://localhost:5000/about, in reality the single page application uses JavaScript client-side routing to change the URL in the browser, unload the currently rendered page, and load another page.
If you are indeed using Spring MVC, you could (among other options) modify your Spring static resource configuration, modify your #RequestMapping to listen to multiple endpoints, or use a ViewControllerRegistry.

Managing routes in reactjs app in production

How is routing handled in a built react app?
Specifically, in a development environment, we can simply hit <host>:<port>/<some-path> and the corresponding component is loaded, but once the app is built, we get a bunch of static files and single index.html file, which are then served by some server.
Now, upon hitting the url <server-host>:<server-port>, the app works as intended, but upon entering the path, say <server-host>:<server-port>/<component-path>, a 404 error is returned.
If there is, say a button, upon clicking which, the same /<component-path> is to be redirected, the app works, but then again upon refreshing that page, 404 error occurs.
How can this be solved? What is the correct way to serve such apps having many components at different routes?
approach1:(recommended)
In server config you should point all urls ( http://ipaddress:port/<* any url pattern>) to index.html of react-app . this is known as fallback-mechanism.
And when any request comes,index.html of React app will take care of that automatically because it is single page application.
approach2:
Use HashRouter in React app. So You will not have to configure anything.
Depending on which server you are deploying to, you should redirect all errors to the index.html look for the configuration maybe htaccess or for example if it an AWS S3 bucket you just specify the error page to the same index.html file that is served. Then you handle actual error in your code using a routing library like maybe react-router-dom to take care of actual error. Your failure is because in normal circumstances in a static host when you provide a URL like <server-port>/<component-path> the assumption the server makes is that there is a folder with name component-path in your root directory which has an index file from where to load and display but in the case of React single page application, everything is managed by the index.html. So every request has to pass via the index.html

Redirect all AWS S3 http requests to index.html for AngularJS HTML5Mode

How do I redirect all requests to my static AWS S3 website to index.html so I can use AngularJS' HTML5 Mode?
I recently learned (to my unending delight) that it is possible to use AngularJS without the # in the URL by using HTML5 Mode. However, I know from this answer that this requires some setup on the server, since all requests have to be redirected to the right html file (in this case, index.html) for this to work.
I use AWS S3's static website hosting for my site. I tried adding this to my redirection rules:
<RoutingRules>
<RoutingRule>
<Redirect>
<ReplaceKeyWith>/</ReplaceKeyWith>
</Redirect>
</RoutingRule>
</RoutingRules>
and
<RoutingRules>
<RoutingRule>
<Redirect>
<ReplaceKeyWith>index.html</ReplaceKeyWith>
</Redirect>
</RoutingRule>
</RoutingRules>
but I get issues with too many redirects.
Is there a way to do the kind of redirection necessary in AWS S3 with the static website hosting?
You can use AWS CloudFront for your use case. Setup the S3 bucket behind CloudFront and add index.html as the default route.
Still if the page is refreshed in a angular route (e.g /home), AWS CloudFront will search for a /home.html file in S3 and return 404: Not Found Response. However there is a workaround for this, where you can setup an custom error response for 404: Not Found HTTP error code to points towards the /index.html response page path.
For more details refer the blog post Using AWS CloudFront to serve an SPA hosted on S3.

Configure Amazon S3 static site with Angular JS ui.router html5Mode(true) on page refresh

How can I configure an Amazon S3 static webpage to properly route Angular ui.router html5Mode routes? On page refresh, it will make a request for a file that doesn't exist, and angular can't handle it. In the docs, they recommend changing your URL rewrites on the server.
https://github.com/angular-ui/ui-router/wiki/Frequently-Asked-Questions#how-to-configure-your-server-to-work-with-html5mode
However, S3 is storage, and doesn't offer the same redirection options
I have been trying to use the built in redirection rules such as
<RoutingRules>
<RoutingRule>
<Condition>
<HttpErrorCodeReturnedEquals>404</HttpErrorCodeReturnedEquals >
</Condition>
<Redirect>
<HostName>[[ your application's domain name ]]</HostName>
<ReplaceKeyPrefixWith>#/</ReplaceKeyPrefixWith>
</Redirect>
</RoutingRule>
</RoutingRules>
However, this just leads to a redirect loop.
Any suggestions?
In the Frequently Asked Questions, they rewrite almost everything to serve the index.html page. For HTML5 fallback mode you need to use #!/ (hashbang).
You could change this:
<ReplaceKeyPrefixWith>#/</ReplaceKeyPrefixWith>
with
<ReplaceKeyPrefixWith>#!/</ReplaceKeyPrefixWith>
More details on this answer: https://stackoverflow.com/a/16877231/1733117
You may also need to configure your app for using that prefix:
angular.module(...)
...
.config(function($locationProvider) {
$locationProvider.html5Mode(true).hashPrefix('!');
})
Make sure you have the index route configured for your website. Mostly it is index.html
Remove routing rules from S3 configurations
Put a Cloudfront in front of your S3 bucket.
Configure error page rules for your Cloudfront instance.
In the error rules specify:
Http error code: 404 (and 403 or other errors as per need)
Error Caching Minimum TTL (seconds) : 0
Customize response: Yes
Response Page Path : /index.html
HTTP Response Code: 200
Basically there are 3 options, use an EC2 instance to perform the actual server rewrites to the configured HTML5 routes, or, like dnozay suggested, use the fallback mode and re-write requests to use the #! hashbang. Finally, you could just use the standard angular routes, which is the option I went with. Less hassle, and when Angular 2.0 rolls around, you can update to that.
https://stackoverflow.com/a/16877231/1733117
Doesn't really address the routing issue here.
here is another option using nginx proxy_pass, it also allows you to have multiple projects in subfolders and use subdomains
S3 Static Website Hosting Route All Paths to Index.html

Resources