What's the best way to prevent React app being scraped? - reactjs

I'm still very new to React so forgive me if the question is too naive. To my understand, React usually requires an API to do XHR requests. So someone with very basic tech background can easily figure out what the api looks like by looking at the network tab in web browser's debug console.
For example, people might find a page that calls to api https://www.example.com/product/1, then they can just do brute force scraping on product id 1 - 10000 to get data for all products.
https://www.example.com/api/v1/product/1
https://www.example.com/api/v1/product/2
https://www.example.com/api/v1/product/3
https://www.example.com/api/v1/product/4
https://www.example.com/api/v1/product/5
https://www.example.com/api/v1/product/6
...
Even with user auth, one can just use same cookie or token when they login to make the call and get the data.
So what is the best way to prevent scraping on React app? or maybe the api shouldn't be designed as such, hence I'm just asking the wrong question?

Here are some suggestions to address the issue you're facing:
This is a common problem. You need to solve it by using id's that are GUID's and not sequentially generated integers.
Restricting to the same-origin won't work because someone can make a request through Postman or Insomnia or Curl.
You can also introduce rate-limiting
In addition, you can invalidate your token after a certain number of requests or require it to be renewed after every 10 requests

I think no matter what you do to the JavaScript code, reading your API endpoint is the easiest thing in the world(Wireshark is an easy, bad example), once it is called upon from the browser. Expect it to be public, with that said, protecting it it is easier than you might anticipate.
Access-Control-Allow-Origin is your friend
Only allow requests to come from given urls. This may or may not allow GET requests but it will always allow direct access on GET routes. Keep that in mind.
PHP Example
$origin = $_SERVER['HTTP_ORIGIN'];
$allowed_domains = [
'http://mysite1.com',
'https://www.mysite2.com',
'http://www.mysite2.com',
];
if (in_array($origin, $allowed_domains)) {
header('Access-Control-Allow-Origin: ' . $origin);
}
Use some form of token that can be validated
This is another conventional approach, and you can find more about this here: https://www.owasp.org/index.php/REST_Security_Cheat_Sheet
Cheers!

Related

How to deal with authorization/authentication in that case?

Here at work we have a portal with like 6 systems lying under, each one has its own backend API, built with .NET Core, the frontend is React by the way.
The situation :
Today we have a single API that is used for both Authentication and Authorization (i will call it LoginAndProfileAPI) and its is used globally for all of the systems, here's my attempt to describe how it works :
As Is
we have this global,
The Issue :
Now we have a necessity to have local user management inside each system so that the local admins for that system can manage the access level/permission/profiles only of the profiles related to that application, we also need to turn our Auth process a little bit more standard, separating Authorization and Authentication a little bit more.
What would you guys do in this situation, what i came across until now was :
The idea i had, but stil needs some shape
but it kinda looks like a mess tho.
Does anyone have a clue of how can i be more standard with this
Edit :
Would that be best scenario ?
Looks pretty much like a more minimalistic/lightweight approach !!!
#WandererAboveTheSea suggestion about it
(did i get it right ?)
Here is a cleaner image :
JWT Auth Example
What looked kinda odd to me was that it seems like JWT is beeing used for both Authentication and Authorization
JWT is meant to be stateless and lite weight. Your using JWT for authorization is the correct approach.
You can improve on the following part.
After getting the username you are making database calls to get the user profile which you can avoid by adding those details to JWT itself. So that part can be simplified.
Information Exchange: JSON Web Tokens are a good way of securely transmitting information between parties.
Apart from this rest of the design looks good.

Anyone can fetch my blog posts using my GET end-points and use them on his own site? Is there a way to protect this?

I have a blogging app built on top of the MERN Stack. I am fetching my blog posts on the react front-end, however, I feel anyone can use my blog posts on his own site by hitting the same end-point. I want to protect this behaviour. Is there a way?
If, for some reason, it isn't enabled already, make sure your endpoints have standard Access-Control-Allow-Origin restrictions - that is, that they only permit direct connections from your domain, not from other sites. This will make it slightly more difficult for other sites to scrape yours, because they won't be able to make requests directly from the frontend.
You could also change your application structure so that the blog data gets sent with the initial HTML response. For a small example, you could have
<script type="application/json" class="blog-data">
[{"title":"some post title", "content":"some content"}]
</script>
const blogData = JSON.parse(document.querySelector('.blog-data').textContent);
This will also make it a bit harder for a scraper to work - they won't have an endpoint ready to serve the plain blog data, they'll have to parse through your HTML response first.
You could also frequently change up the DOM structure of the data in the HTML response to make it harder.
But web scraping is fundamentally nearly impossible to stop, for someone who's determined enough.
Basically, you can use CORS on your backend to protected fetching your endpoints from any browsers origins except allowed ones.
Anyway it will not help you to protect from calling API from such things like mobile apps, Postman etc.
If you worry about loading to the server you can add something like rate limiting.
But keep in mind if your API is public it will be public for all, you can't restrict to use it from your site only.
Here are a few ideas:
Maybe add some authentication to protect your endpoints.
If you are using CORS, only accept requests from a certain URL.
In your package.json, add a proxy.

How to protect RESTful service?

Contemplating building an Angular 2 front-end to my website. My question is not necessarily related to Angular but I want to provide full context.
Application logic that displays content to user would shift to the client. So on the server side, I would need to expose data via a RESTful JSON feed. What worries me, is that someone can completely bypass my front-end and execute requests to the service with various parameters, effectively scraping my database. I realize some of this is possible by scraping HTML but exposing a service with nicely formatted data is just a no-brainer.
Is there a way to protect the RESTful service from this? In other words, is there a way to ensure such service would only respond to my Angular 2 application call? Authentication certainly isn't a solution here - I don't want to force visitors to authenticate and the scraper could very well authenticate and get access, anyway.
I would recommend JWT Authorization. One such implementation is OAuth. Basically you get a json web token ( JWT ) that has been signed by an authority you trust that tells about the user and what resources they can access on your api.
If the request doesn't include an Authorization token - your API rejects it.
If the token has been tampered with by someone trying to grant themselves privledges after the token is signed by the authorization authority - your API rejects it.
It is a pretty cool piece of kit.
This site has information about OAuth implementations in different languages, hopefully your favorite is listed.
Some light bed time reading.
There is no obvious way to do it that I know of, but a lot of people seem to be looking at Amazon S3 as a model. If you put credentials in your client code, then anyone getting the client code can see them. I might suggest that you could write the server to pass a time limited token back to the browser with the client code. The client code would be required to pass it back to the server for access. This would prevent anyone from writing their own client code, as only client code sent by the server would work, though only for some period of time. The user might occasionally get timeouts, but that depends on how strict you want to make the token timeouts. Of course, even this kind of thing could be hacked by someone making a client request to get a copy of the token to use with their own client API, but at that point you should be proud that someone is trying so hard to use your API! I have not tried to write such a thing, so I don't have any practical experience with the issue. I myself have wondered about it, but also don't have enough experience with this architecture to see what, if anything, others have been doing. What do angularJS forums suggest?
Additional References: Best Practices for securing a REST API / web service
I believe the answer is "No".
You could do some security by obscurity type stuff. Your rest API could expose garbled data and you could have some function that was "hidden" in your code un-garble it. Though obviously this isn't fool proof, but if you expose data on a public site it's out there regardless of server or client rendering.

Securing a single page application built on react/react-router

I've been putting together a single-page application using React and React-Router and I can't seem to understand how these applications can be secured.
I found a nice clear blog post which shows one approach, but it doesn't look very secure to me. Basically, the approach presented in that post is to restrict rendering of components which the user is not authorized to access. The author wrote a couple more posts which are variations on the idea, extending it to React-Router routes and other components, but at their hearts all these approaches seem to rely on the same flawed idea: the client-side code decides what to do based on data in the store at the time the components are composed. And that seems like a problem to me - what's to stop an enterprising hacker from messing around with the code to get access to stuff?
I've thought of three different approaches, none of which I'm very happy with:
I could certainly write my authorization code in such a way that the client-side code is constantly checking with the server for authorization, but that seems wasteful.
I could set the application up so that modules are pushed to the client from the server only after the server has verified that the client has authority to access that code. But that seems to involve breaking my code up into a million little modules instead of a nice, monolithic bundle (I'm using browserify).
Some system of server-side rendering might work, which would ensure that the user could only see pages for which the server has decided they have authority to see. But that seems complicated and also seems like a step backward (I could just write a traditional web app if I wanted the server to do everything).
So, what is the best approach? How have other people solved this problem?
If you’re trying to protect the code itself, it seems that any approach that either sends that code to the client, or sends the code able to load that code, would be a problem. Therefore even traditional simple approaches with code splitting might be problematic here, as they reveal the filename for the bundle. You could protect it by requiring a cookie on the server, but this seems like a lot of fuss.
If hiding the internal code from unauthorized users is a requirement for your application, I would recommend splitting it into two separate apps with separate bundles. Going from one to another would require a separate request but this seems to be consistent with what you want to accomplish.
Great question. I'm not aware of any absolute best practices floating around out there that seem to outstrip others, so I'll just provide a few tips/thoughts here:
a remote API should handle the actual auth, of course.
sessions need to be shared, so a store like redis is usually a good idea, esp. for fast reads.
if you're doing server-side rendering that involves hydration, you'll need a way to share the session state between server and client. See the link below for one way to do universal react
on the client, you could send down a session cookie or JWT token, read it into memory (maybe using redux and keep it in your state tree?) and maybe use middleware (a la redux?) to set it as a header on requests.
on the client, you could also rely on localStorage to save the cookie/JWT
maybe split the code into two bundles, one for auth, one for the actual app logic?
See also:
https://github.com/erikras/react-redux-universal-hot-example for hydration example
https://github.com/erikras/react-redux-universal-hot-example/issues/608
As long as the store does not contain data that the user is not authorized to have, there shouldn't be too much of a problem even if a hacker checks the source and sees modules/links that he shouldn't have access to.
The state inside the store as well as critical logic would come from services and those need to be secured, whether it's an SPA or not; but especially on an SPA.
Also: server-side rendering with Redux isn't too complex. You can read about it here:
http://redux.js.org/docs/recipes/ServerRendering.html
It's basically only used to serve a root html with a predefined state. This further increases security and loading speeds but does not defy the idea behind SPAs.

Backbone restrict routes from user in permission based app

This is more of a request for pattern and discussion rather than a simple one-off question. I have a backbone app where user can be part of different roles. The routes are defined as usual:
routes:
"": "showHomePage"
"import": "showImportPage"
I would like the import page to be accessible only to certain user roles. I imagine I can do something like this:
showImportPage: ->
if not MyApp.CurrentUser.can_import
return
Which indeed works. Of course, as you can imagine, this is easily exploited by just using Chrome console, and even if I don't show the link anywhere it's quite simple to just go in the address bar and type it.
Even though the above should be enough to stop a normal user, my question is: how could I secure that route from being accessed?
The opinion I have until now is that the only way is to refer back to the server before serving that route, either by checking a special URL or by simply re-fetching the User model before accessing... I have this hitch, though, that this will basically defeat the purpose of the whole idea behind a "single-page-app", if every url must be authenticated by the server and I need to show the usual ajax spinner before allowing the user to navigate... I know the amount of data going back and forward is minimal (only the json user info or even less), but still...
What are your opinion or solutions if you ever had to face this problem?
I think your question is a great one.
I made a PhoneGap app using BackboneJS and Jquery mobile so I faced the same problems you are facing now.
I think authorization can't live solely on the client side since it is inherently wrong. What lives at the client, is fully controlled by the client, and that's something no one can change.
Sending a request to the server does not break the single-app-page paradigm as long as the request gets the minimal data needed and all logic/view components are located on the client.
Keep in mind that if you have sensitive data in that page that you don't want regular users to see, it also must be sent from the server after verifying the authorization of the request, so it is not only a JSON of the user info that must be sent, it is the data itself as well.
I wish someone else would prove me wrong here, but as far as it goes for me that's the deal.

Resources