scrape a web made with angular - angularjs

I'm trying to scrape a web made with Angularjs, using java.
I use Selenium and a ChromeDriver to tramping the web, and, to know the next step I use devtools of Chrome.
In a 'traditional' website I can guess easily the id of the item or where a button goes, looking in Elements or Source tag but, how can I do it in a web made with angular?
I mean, where I can found the id, href property or where a button goes in an angular web? Can I find it using devtools of chrome or I need to install something?
Thanks

First of all, what your doing is probably illegal.
But to give you the benefit of the doubt, and guess that you are doing to a website that belongs to you but you didn't write, or don't have the code, or you have permission from the website owner you have two options.
Extension.
You can use this chrome extension, which lets you inspect your AngularJS in the chrome debugger tools.
Then you can check where the ng-click leads to, and look for that function on the scope.
Console.
Select the element you are interested in inspecting, and since $0 returns the current selected element in the DOM you can write in the console:
angular.element($0).scope()
which will return you an object with all the data on the current scope.
Note that you might have to go up the $parent, to find the function you are looking for.
P.S. if you are looking for how to do the same thing in Angular, you can use the following extension (thanks #user1767316) or in the console ng.probe($0).componentInstance

Related

How to scrape a hidden React component key using selenium

I am trying to scrape a React based website with selenium and python, and i came to a point where, although i can retrieve all that is being 'seen' by Google DevTools Inspector, i am not being able to find the link to the next page i needed to scrape. I could, i guess, do this, in a way where i could click every single button to the next page, although, i was kind of curious to know why selenium has a problem is seeing this particular key and how to workaround this, since i have to build a database and any 'extra' request will add add up exponentially.
Google DevTools Inspect View
As you can see, there is no 'key' or href class anywhere on the tab, but if i look in React DevTools, it is there:
React DevTools Inspect View
So my question is: is there any way i can retrieve those 'keys'?
Are there any better tools to do this job?
Thank you in advance!

How to migrate an online angular script to local server?

Hi I'm trying to make an offline version of this page:
https://u-he.com/tools/microtuning/ the script is writtin with Angular JS how do I do that?
I saved the page control-s and copied the file to the local server I'm running.
And then I browsed the local ip. the page opened but I get repeated notes ng-repeat shows up as multiple boxes instead of 1 box that edits the same note but in different octaves.
How do I solve this problem please.
You can inspect the front-end code in your browser console. In Firefox it's in the section called "Debugger", in Chrome it's called "Sources". If you use Safari, you need to enable Developer mode first.
Once you have the appropriate view, just click on u-he.com -> tools/microtuning/ -> index
Hopefully it goes without saying that you shouldn't use large swaths of another person's code without at least giving appropriate credit, or better yet getting the developer's permission, unless there is an explicit open-source license.

Security with "web_accessible_resources"

MDN docs state:
To enable a web page to contain an <img> element whose src attribute points to this image,
you could specify "web_accessible_resources" like this:
"web_accessible_resources": ["images/my-image.png"]
The file will then be available using a URL like:
moz-extension://<extension-UUID>/images/my-image.png"
<extension-UUID> is not your extension's ID.
It is randomly generated for every browser instance.
This prevents websites from fingerprinting a browser by examining
the extensions it has installed.
So, I would think that these resources cannot be read by any web page outside the extension, since they would need to know the random UUID.
However, the same MDN docs also state:
Note that if you make a page web-accessible, then any website may then link or redirect
to that page. The page should then treat any input (POST data, for examples)
as if it came from an untrusted source, just as a normal web page should.
I don't understand how "any website may then link or redirect to that page". Wouldn't it need to know the random UUID? How else could a webpage access this resource?
The point of Web Accessible Resources is to be able to include them in a web context.
While you can communicate the random UUID to the webpage so that it can use the file, it doesn't have to be included by the website code itself. Here's a hypothetical scenario:
You're writing an extension that adds a button to evil.com site's UI. That button is supposed to have an image on it.
You bundle the image with your extension, but to add it as src or CSS property to the webpage you need to be able to reference it from a web context.
So, you make it web-accessible, and then inject your UI element with a content script.
Perfectly plausible scenario.
Note that a random third-party site villains-united.com can't just scrape the URL to know if your extension is installed, since the URL is per-browser unique. This is the intent behind WebExtensions's UUID over Chrome's extension-id model.
However, let's continue our hypothetical scenario, from a security perspective.
The operators of evil.com are unhappy with your extra UI. They add a script to their code that looks for added buttons.
That script can see the DOM properties of the button, including the address of the image. Now evil.com's code can see your UUID.
Being the good guy, your extension's source code is available somewhere, including the page that launches nuclear missiles if called (why you would have that and why it would be web-accessible is another matter, perhaps to provide the functionality to good-guys-last-resort.org).
evil.com's script now can reconstruct the URL of this trigger page and XHR it, plunging the planet into nuclear apocalypse. Oops. You probably should've checked the origin of that request.
Basically, if a web-accessible resource is used in a page, the UUID likely leaks to that page's context via DOM. That may not be a page you control.

How to use Selenium WebDriver to find Firefox add-ons warning dialog

I am testing downloading and installing an add-on that our company makes. I can add the domain to the Firefox profile whitelist to eliminate the first dialog, but then FF displays a second one that says "Install add-ons only from authors whom you trust". I can't find a way for Selenium to find it. It's the one that looks like this:
I have tried driver.switchTo().alert().accept() - this is not an alert.
I have tried driver.switchTo().findElement(linkText("Install") - nothing found.
I have tried using SikuliWebDriver to find an element by location (picking some random ints to work off of) and then just send keys like Keys.TAB and Keys.ENTER, but as I step through in debug mode, driver.findELementByLocation(20,40) never returns.
I have tried driver.getKeyboard().sendKeys(Keys.TAB) (sending two tabs and an enter). Also never returns.
I think this dialog is generated by Javascript but I am unable to find out what JS generates it. Ideally I could find a name or an id for the button in the dialog and then use JavascriptExecutor to run the command. But without any kind of handle I'm stuck.
Any ideas?
Selenium can only see the DOM (document object model). It can't test desktop applications. The dialog shown is part of the Firefox application and not part of the DOM, so Selenium can't see or access or interact with it. Sad but true. Try Ranorex?

Batarang extension giving no results whatsoever

I have tried using it on local site, on hosted sites, even on the Angular sites, and all I get is a listing of the HTML with Angular. No scopes, no models, nothing useful whatsoever. I'm assuming it's supposed to help develop with Angular in Chrome, but, nothing.
Has anyone found a reason for this?
Using
Chrome Version 39.0.2171.71 m
AngularJS v1.2.26
AngularJS Batarang 0.7.4
I found this old version and it works.
https://chrome.google.com/webstore/detail/angularjs-batarang-stable/niopocochgahfkiccpjmmpchncjoapek
I had the same issue, so I rolled back to the previous version. if you follow this link, it will give you the instructions - I will add them here also in case the link moves.
Installing Previous Versions
Download and extract one of the files from the Batarang releases page on GitHub
Navigate to chrome://chrome/extensions/ in Chrome
If you've installed Batarang from the web store, disable or remove that version
On the top right, check the checkbox for "Developer mode"
Click "Load unpacked extension..."
Select the directory where you extracted the extension
Close and re-open any inspected tabs
The above was taken from the same link as I posted.

Resources