Scrape website for basic data using Angular JS ( facebook like Link sharing module)

Scrape website for basic data using Angular JS ( facebook like Link sharing module) - angularjs

I am trying to make Facebook like "Link sharing" module i.e when anyone write a link while doing new POST , it will automatically show some basic data from the website like in facebook...
I tried simple scraping using $http.get and it is working only if I install CORS extension in google chroome so the main issue I am facing with this approach is to make is working without using any plugin for it...
I also tried by adding headers in config file but still no luck.
$httpProvider.defaults.headers.common = {};
$httpProvider.defaults.headers.post = {};
$httpProvider.defaults.headers.put = {};
$httpProvider.defaults.headers.patch = {};
$httpProvider.defaults.useXDomain = true;
delete $httpProvider.defaults.headers.common['X-Requested-With'];
Please share me the best approach to do this feature or if there is any way I can solve CORS issue ?
Thanks
Zeshan

This is not possible. CORS exists for a reason: to STOP you from accessing HTTP resources from other domains without those other domains explicitly allowing you to.
Again: this is not possible due to security restrictions imposed by browsers.
The only way you can accomplish this, and the way Facebook does it, is to move those cross-domain requests to a server, where there are no cross-domain restrictions.
So $http.post('/some-script-on-my-server') where that script does the actual HTTP request for the remote page, scrapes the necessary information and returns it back to the browser.

There is a workaround for this in order to have an only browser working javascript solution without configuring anything in the server (maybe useful in some particular situation) and "avoiding" CORS.
You could use the YQL. This way you only have to play in their console a little bit with the url you need to scrape and use the query they provide you with in your website as url for your request.
For example (extracted from YQL website):
select * from html where url='http://finance.yahoo.com/q?s=yhoo' and xpath='//div[#id="yfi_headlines"]/div[2]/ul/li/a'
Gets the headlines from yahoo finance, and you get also the query url that you can use in your request:
https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D'http%3A%2F%2Ffinance.yahoo.com%2Fq%3Fs%3Dyhoo'%20and%20xpath%3D'%2F%2Fdiv%5B%40id%3D%22yfi_headlines%22%5D%2Fdiv%5B2%5D%2Ful%2Fli%2Fa'&format=json&diagnostics=true&callback=
They have other examples and how to integrate it in their documentation.
You don't need to configure anything in the server side, but of course, it has to go through Yahoo's, which isn't at all optimal. Of course, performance gets directly affected...
As said again, maybe in some particular situations (dev, tests, etc) this can be useful and it is always interesting to give it a try.

Related

Anyone can fetch my blog posts using my GET end-points and use them on his own site? Is there a way to protect this?

I have a blogging app built on top of the MERN Stack. I am fetching my blog posts on the react front-end, however, I feel anyone can use my blog posts on his own site by hitting the same end-point. I want to protect this behaviour. Is there a way?

If, for some reason, it isn't enabled already, make sure your endpoints have standard Access-Control-Allow-Origin restrictions - that is, that they only permit direct connections from your domain, not from other sites. This will make it slightly more difficult for other sites to scrape yours, because they won't be able to make requests directly from the frontend.
You could also change your application structure so that the blog data gets sent with the initial HTML response. For a small example, you could have
<script type="application/json" class="blog-data">
[{"title":"some post title", "content":"some content"}]
</script>
const blogData = JSON.parse(document.querySelector('.blog-data').textContent);
This will also make it a bit harder for a scraper to work - they won't have an endpoint ready to serve the plain blog data, they'll have to parse through your HTML response first.
You could also frequently change up the DOM structure of the data in the HTML response to make it harder.
But web scraping is fundamentally nearly impossible to stop, for someone who's determined enough.

Basically, you can use CORS on your backend to protected fetching your endpoints from any browsers origins except allowed ones.
Anyway it will not help you to protect from calling API from such things like mobile apps, Postman etc.
If you worry about loading to the server you can add something like rate limiting.
But keep in mind if your API is public it will be public for all, you can't restrict to use it from your site only.

Here are a few ideas:
Maybe add some authentication to protect your endpoints.
If you are using CORS, only accept requests from a certain URL.
In your package.json, add a proxy.

What's the best way to prevent React app being scraped?

I'm still very new to React so forgive me if the question is too naive. To my understand, React usually requires an API to do XHR requests. So someone with very basic tech background can easily figure out what the api looks like by looking at the network tab in web browser's debug console.
For example, people might find a page that calls to api https://www.example.com/product/1, then they can just do brute force scraping on product id 1 - 10000 to get data for all products.
https://www.example.com/api/v1/product/1
https://www.example.com/api/v1/product/2
https://www.example.com/api/v1/product/3
https://www.example.com/api/v1/product/4
https://www.example.com/api/v1/product/5
https://www.example.com/api/v1/product/6
...
Even with user auth, one can just use same cookie or token when they login to make the call and get the data.
So what is the best way to prevent scraping on React app? or maybe the api shouldn't be designed as such, hence I'm just asking the wrong question?

Here are some suggestions to address the issue you're facing:
This is a common problem. You need to solve it by using id's that are GUID's and not sequentially generated integers.
Restricting to the same-origin won't work because someone can make a request through Postman or Insomnia or Curl.
You can also introduce rate-limiting
In addition, you can invalidate your token after a certain number of requests or require it to be renewed after every 10 requests

I think no matter what you do to the JavaScript code, reading your API endpoint is the easiest thing in the world(Wireshark is an easy, bad example), once it is called upon from the browser. Expect it to be public, with that said, protecting it it is easier than you might anticipate.
Access-Control-Allow-Origin is your friend
Only allow requests to come from given urls. This may or may not allow GET requests but it will always allow direct access on GET routes. Keep that in mind.
PHP Example
$origin = $_SERVER['HTTP_ORIGIN'];
$allowed_domains = [
'http://mysite1.com',
'https://www.mysite2.com',
'http://www.mysite2.com',
];
if (in_array($origin, $allowed_domains)) {
header('Access-Control-Allow-Origin: ' . $origin);
}
Use some form of token that can be validated
This is another conventional approach, and you can find more about this here: https://www.owasp.org/index.php/REST_Security_Cheat_Sheet
Cheers!

angularjs ngfacebook batch request

Can anyone who knows how to use the angularjs ngFacebook module help me to perform a facebook batch request? Is it possible to do it with this module?
What I need exactly is to get the user events from facebook, for that I have to do 4 different request:
$facebook.api('/me/events/attending').then(function(response) {//code here});
$facebook.api('/me/events/created').then(function(response) {//code here});
$facebook.api('/me/events/maybe').then(function(response) {//code here});
I think I could batch this request, I just don't know if it's possible to do using this module.
Also the most tricky part would be that, for each event returned I would need to get the owner, and with the owner.id to get his picture, right now what I do is:
$facebook.api('/me/events/attending?fields=owner').then(function(response) {
//And here I do a "for" into the events to request for each owner picture
});
Of course it doesn't seem the best way to do it, but I have searched a lot for the solution and I couldn't make anything work.

I think you should be able to request all user events, inluding the owner info:
GET /me/events?fields=id,name,owner{id,picture},rsvp_status
You can determine the "status" of the event to the user by the rsvp_status (attending, maybe, declined, no_reply) field.
See
https://developers.facebook.com/docs/graph-api/reference/v2.3/event#read
https://developers.facebook.com/docs/graph-api/reference/user/events/
https://developers.facebook.com/docs/graph-api/using-graph-api/v2.0#fieldexpansion

I'm not sure of the batch request protocol that Facebook uses but you could try this module.
https://github.com/jonsamwell/angular-http-batcher
If it doesn't support it add an issue and I'll looking into implementing it. Disclosure: I created this angular http batching library.

authentication/http headers support in forge.file trigger.io module?

in the official trigger.io docs there seems to be no provision for custom http headers when it comes to the forge.file module. I need this so I can download files behind an http authentication scheme. This seems like an easy thing to add, if support is not already there.
any workarounds? any chance of a quick fix in the next update? I know I could use forge.request instead, but I'd like to keep a local copy (saveURL).
thanks

Unfortunately the file module just uses simple "download url" methods rather than a full HTTP request library, which makes it a fairly big task to add support for custom headers.
I've added a task to our backlog for this, but I don't have a timeframe for it being added.
Currently on iOS you can do basic auth by using urls in the form http://user:password#url.com in case that helps.
Maybe to avoid this you can configure your server differently, or have a proxy server in front that allows you to pass authentication details as get parameters?

Silverlight and SSL Client Certificates

Can anyone point me in the right direction of how I can use SSL client-side certificates with Silverlight to access a restful web service?
I can't seem to find anything on how to handle them, or even whether they are supported.
Cheers.

Slipjig mentioned this:
"The browser stack does, and pretty much automatically, if you're willing to live with its other limitations (lack of support for all HTTP verbs, coercion of response status codes, etc.)."
If that is acceptable to you, look at how Microsoft themselves deal with this in some of their APIs using the custom X-HTTP-Method header, like how they do it for WCF and OData:
http://www.odata.org/developers/protocols/operations
In MSDN, Microsoft also mentions this about using REST in conjunction with SharePoint 2010's WCF based REST API:
msdn.microsoft.com/en-us/library/ff798339.aspx
"In practice, many firewalls and other network intermediaries block HTTP verbs other than GET and POST. To work around this issue, WCF Data Services (and the OData standard) support a technique known as "verb tunneling." In this technique, PUT, DELETE, and MERGE requests are submitted as a POST request, and an X-HTTP-Method header specifies the actual verb that the recipient should apply to the request. For more information, see X-HTTP-Method on MSDN and OData: Operations (the Method Tunneling through POST section) on the OData Web site."
Don Box's also had some words about this, but regarding GData specifically:
www.pluralsight-training.net/community/blogs/dbox/archive/2007/01/16/45725.aspx
"If I were building a GData client, I honestly wonder why I'd bother using DELETE and PUT methods at all given that X-HTTP-Method-Override is going to work in more cases/deployments."
There's an article about Silverlight and Java interop which also addresses this limitation of Silverlight by giving the same advice:
www.infoq.com/articles/silverlight-java-interop
"Silverlight supports only the GET and POST HTTP methods. Some firewalls restrict the use of PUT and DELETE HTTP methods.
It is important to point out that true RESTful service can be created (conforming to all the REST principles listed above) only using the GET and POST HTTP methods, in other words the REST architecture does not require a specific mapping to HTTP. Google’s GData X-Http-Method-Override header is an example of this approach.
The following HTTP methods overrides may be set in the header to accomplish the PUT and DELETE actions if the web services interpret the X-HTTP-Method-Override header on a POST:
* X-HTTP-Method-Override: PUT
* X-HTTP-Method-Override: DELETE"
Hope this helps
-Josh

It depends on whether you're using the browser HTTP stack or the client HTTP stack. The client stack does not support client certificates, period. The browser stack does, and pretty much automatically, if you're willing to live with its other limitations (lack of support for all HTTP verbs, coercion of response status codes, etc.).
I have however been running into a problem using the browser stack with client certificates in an OOB scenario. Prism module loading fails under these conditions - the request gets to IIS, but causes a 500 server error for no apparent reason. If I set IIS to ignore client certs, or if I run the app in-browser, it works fine :-/

take a look at this.
http://support.microsoft.com/kb/307267
just change your urls to https
hope this helps

Dim url As Uri = New Uri(Application.Current.Host.Source, "../WebService.asmx")
Dim binding As New System.ServiceModel.BasicHttpBinding
If url.Scheme = "https" Then
binding.Security.Mode = ServiceModel.BasicHttpSecurityMode.Transport
End If
binding.MaxBufferSize = 2147483647 'this value set to override a bug,
binding.MaxReceivedMessageSize = 2147483647 'this value set to override a bug,
Dim proxy As New ServiceReference1.WebServiceSoapClient(binding, New ServiceModel.EndpointAddress(url))
proxy.InnerChannel.OperationTimeout = New TimeSpan(0, 10, 0)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Scrape website for basic data using Angular JS ( facebook like Link sharing module) - angularjs

Related

Anyone can fetch my blog posts using my GET end-points and use them on his own site? Is there a way to protect this?

What's the best way to prevent React app being scraped?

angularjs ngfacebook batch request

authentication/http headers support in forge.file trigger.io module?

Silverlight and SSL Client Certificates

Categories

Resources