How to crawl component-based web applications built by Vue and React? - reactjs

I want to crawl my SPA built by the Vue framework (Relatively same as React framework). However, I see that the content is not rendered while crawling. The result is:
<!doctype html>
<HTML>
<body>
<div id=app>
</div>
<script type=text/javascript src=/static/js/manifest.2ae2e69a05c33dfc65f8.js></script>
<script type=text/javascript src=/static/js/vendor.60c471696de493d48a1c.js></script>
<script type=text/javascript src=/static/js/app.335a9e9866cb7dc6a517.js></script>
</body>
</html>
Are the component-based javascript frameworks anti crawling? How can I make the component to be rendered by the crawler?
I'm using Abot framework for crawling propose

All Abot does is send a request to the target website, parse the data, and pass it back to you. As you probably know, frameworks like React or Vue are 100% JavaScript based, meaning no data will be rendered unless you run the JavaScript. So the solution here is to launch a headless browser or another DOM engine and scrape the data.
Several engines you could use are Selenium (browser automation framework available in Python and some other languages), Puppeteer (Chromium-based web-scraper in NodeJS), or a DOM engine like JSDOM.
Moral of the story is: if you want to see result rendered by JavaScript you must execute the JavaScript inside a DOM.

Related

Why og:image does not rendered with React?

Try to have thumbnail when sending website link in Facebook message. Started to use s-yadav/react-meta-tags followed tutorial but image is not present after link sent.
Link: https://ticket-44-portal-staging.herokuapp.com/buyTicket?status=1959165690885328
applied following code in React componenet:
return (
<div className="container">
<MetaTags>
<title>{searchedEventTime.name}</title>
<meta name="description" content={searchedEventTime.description} />
<meta property="og:title" content={searchedEventTime.name} />
<meta
property="og:image"
content={`https://ticket-t01.s3.eu-central-1.amazonaws.com/${eventId}.cover.jpg`}
/>
</MetaTags>
I can see rendered meta tags in HTML, why it isn't work?
It is because the website is a single-page app. Before the JavaScript is loaded, everything rendered by React is not there yet(including those meta tags). You can verify it by right-clicking the page and select "view source", you will see that inside body, there is only a <div id="root"></div>. The problem is that many search engines and crawlers don't actually run JavaScript when they crawl. Instead, they look at what's in the initial HTML file. And that's why Facebook cannot find that "og:image" tag. There are two ways to solve this problem.
TL;DR Host your app on Netlify if you can. They offer prerendering service.
First, you may look at prerendering which is a service to render your javascript in a browser, save the static HTML, and return that when the service detects that the request is coming from a crawler. If you can host your React on Netlify, you can use their free prerendering service(which caches prerendered pages for between 24 and 48 hours). Or you can check out prerender.io. This is the solution if you don't want to move to another framework and just want to get the SEO stuffs working.
Another more common way to deal with this problem is do static site generation(SSG) or server side rendering(SSR). These mean that HTML content is statically generated using React DOM server-side APIs. When the static HTML content reaches client side, it will call the hydrate() method to turn it back into a functioning React app. Two most popular frameworks are Gatsby.js and Next.js. With these frameworks, you will be writing React and JSX code like you already do. But they offer more to power your app, including SSG, SSR, API routes, plugins, etc. I'd recommend you to check out their "Get started" tutorials. Transferring from a create-react-app to one of these frameworks can take less than a day, depending of the size of your project.
Next.js Tutorials
Gatsby.js Tutorials
After implementing one of these solutions, visit Facebook Sharing Debugger to ask Facebook to scape your page again.

where and why do we use ReactDOMServer.renderToString()? And do we use this method in today's version of react?

So I got to know about this method recently but I'am unable to understand that where do I use this method and is it necessary to use this method?
You don't need to use it if you don't mind about Server-Side Rendering(SSR).
There are 2 types of rendering Client-Side Rendering and SSR.
You can tell that a site uses SSR when you request a page using postman say for example http://youtube.com, the server returns the html markup together with dynamic data from the backend. It is very crucial for SEO purposes to allow bots to crawl your app. It also improves performance of your site by First Contentful Paint(FCP) and Time to Interactive (TTI) metrics.
rendertoString method is used in the server side to convert a react component instance into a HTML string.
On the other side, CSR just returns this markup from the server.
<html>
<head>
<!-- More tags here -->
</head>
<body>
<div id="root"></div>
<script src='/bundle.js'></script>
</body>
</html>
Most of the react-apps like create-react-app are client-side rendered. The markup with dynamic data is rendered by the browser.

Integrating Accelerated Mobile Pages(AMP) into existing backbone application

I've a Backbone application, which initialises from index.html. I tried adding new amp html called index.amp.html and followed instructions in Create Your AMP HTML Page.
My index.html has only hook to require js to start loading backbone app. All the html is generated dynamically.
Is there a way I can include AMP practices in dynamic generated HTML? Because all I have is one index.html entire content is generated through handlebars dynamically on client side.
I didn't find any good article to make SPAs to support AMP. Are there any best practices to follow? Please help me out.
At this time, the only JavaScripts that can be triggered in an AMP document are these two scripts:
<script async src="https://cdn.ampproject.org/v0.js"></script>
<script async custom-element="amp-analytics" src="https://cdn.ampproject.org/v0/amp-analytics-0.1.js"></script>
You can use a mustache template as part of the custom-element script as follows:
<script async custom-template="amp-mustache" src="https://cdn.ampproject.org/v0/amp-mustache-0.1.js"></script>
The templates are described here:
https://github.com/ampproject/amphtml/blob/master/spec/amp-html-templates.md
Without access to your code, can't say how easy or difficult it may be to modify your handlebar templates to fit the model above.

having two versions of (twitter) bootstrap running simultaneously on a web application

I have currently started trying my hand at client side development with bootstrap and angularjs. I've been given a task to make a more or less isolated feature of our website (an angularjs application) and have been working on it but noticed that the bootstrap functions I learned were not working.
Upon inspection I found that our app is using bootstrap 2.3.x and I want to use features of bootstrap 3.0
Because bootstrap has made quite a huge change in its new version, the main web app coders do not want to switch over so that is not an option. (at least not yet).
My question: is there a way I could have my isolated view use bootstrap 3 while the rest of the app uses bootstrap 2? I really don't want to take the time to learn deprecated technology so any advice would be greatly appreciated.
If you are creating an isolated feature on your site, will it be embed in one of the pages or is it a section in its own right? Your app pages can use bootstrap 3.x without it causing problems on other pages if the script links are only in the header of your app pages and not added to other pages in the site. The link will not leak bootstrap 3.x to previous code that does not have these script tags in the header. If that is the case, you can go ahead and use bootstrap 3.x and angular.js and should have no issues.
I would stick your app in a separate folder on the website and design away with the more up-to-date tools.
I'll use some buzzwords here:
Shadow DOM
Web Components
Polymer
Scoped styling is one of the many features of Shadow DOM. Styles defined inside the shadow tree don’t leak out and page styles don’t bleed in.
https://www.polymer-project.org/articles/styling-elements.html
http://plnkr.co/edit/hypZyjc4yFxIubfOn31N?p=preview
<!DOCTYPE html>
<html>
<head>
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.1/css/bootstrap.min.css">
<script src="https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/0.5.2/webcomponents.min.js"></script>
<link rel="import" href="your-component.html">
</head>
<body>
<h1>Bootstrap 3.3.1</h1>
<your-component></your-component>
</body>
</html>
your-component.html
<link rel="import" href="http://www.polymer-project.org/components/polymer/polymer.html">
<polymer-element name="your-component">
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/2.3.2/css/bootstrap.min.css">
<template>
<h1>Bootstrap 2.3.2</h1>
</template>
<script>
Polymer({});
</script>
</polymer-element>

How to host an angularjs app in domain

Currently Learning to create a simple AngularJs App.
This is my first MVC app, so my questions may be very basic.
I understand that we require nodejs to run the Angularjs App, so will the hosting provider have the node.js installed with their server?
If so, which does the hosting provider support: the Angularjs, Emberjs, or Knockoutjs? Currently my client have hostgator and netfirms.
While reading some posts, I learned a few terms which they use with Angular js like Yo, Grunt, Bower, so I just wanted to know what is this used for?.
Can anybody tell how exactly you host the AngularJs app?
If you're deploying a client-side only Angular application (ie. you don't have custom server code), then you might want to check out Firebase Hosting. It's specifically designed for hosting client-side applications that use Angular, Ember, React, etc.
Firebase's Hosting will let you deploy from command line without managing your own servers (no NodeJS), and it handles SSL, CDN, and other best practices for you.
(Disclaimer: I work for Firebase)
AngularJS doesn't require node.js to work, you can even use it from the google CDN without having the script hosted on your server. Node.js is used mostly for the testing, like if you want to do e2e test with Karma for example. It's a good point if you have it, but you can host an app on a simple server without node.js, and test it locally for example
Actually, any server on the internet (a good old apache for example) will be able to host an angular app, as all the work will be done on the client side
Node.js, however, is required for the question 3. Yeoman (the Yo command), grunt and bower are part of the workflow sometimes used to build an angular app. But, these are not required either. It allows you to create quickly a skeleton for a new app, test it an deploy it. It's explained on their website, http://yeoman.io/
Those tools require node.js to work, but they're not a requirement for an angular app to work. It can be useful to have them somewhere if you plan on building many angular app in the future, but you will be able to host thoses apps even without any of these tools.
How to host your angular app ? Like any other HTML page. You can even copy this code and save it to your hard drive :
<html ng-app>
<head>
<script src='https://ajax.googleapis.com/ajax/libs/angularjs/1.0.7/angular.min.js'></script>
</head>
<body>
{{"hello"+" world"}}
</body>
</html>
then open it, it will work !
NodeJS isn't a requirement for AngularJS. Angular is a client-side library.
Angular's team uses NodeJS to help you do things like test your javascript, or pull down files and set up everything you need to run a little web server.
So all you really need for an "AngularJS app" is 2 files: angular.js and some HTML file.
<html ng-app>
<head>
<script src="angular.js"></script>
</head>
<body ng-init="what = 'gas'">
Now you're cooking with {{what}}!
</body>
</html>
Beyond that, any web server that will host static files will do for the most part.
I understand that we require nodejs to run the Angularjs App, so will
the hosting provider have the node.js installed with their server?
No, AngularJS doesnot require Node.js to run. It is just that tutorial you are using to learn AngularJS might be using Node.js for their examples.
AngularJS is a client side technology, so it does not need any specific server to run. And since you are using ASP.Net MVC as your development platform, you'll need an ASP.Net hosting for deploying your app.
You can take a look at my recent ASP.Net MVC / AngularJS development here:
http://bipolarapp.bitsstech.com
and this app is hosted at Windows Azure.

Resources