handle window.open() while headless scraping - screen-scraping

i have to hit the URL, http://judis.nic.in/supremecourt/chejudis.asp. submit a form here to get the page that is supposed to be scrapped.I am using PhantomJS and PJScrape . Issue is, after the form submit, results show up in a new window. I have skimmed through phantomjs issue list and found windown.open can not be handled . Is there any other way/framework I could use here?
I have to stick to headless scraping and i cant use Selenium etc.

You can inject javascript in the page to monkey-patch window.open like that:
var log = {};
window.my_open = window.open;
window.open=function (str1,str2,str3){
console.log("** window.open ** " + str1 + " / " + str2 + " / " + str3);
log.open = {"url":str1, "name":str2, "features":str3};
var new_win = this.my_open(str1, str2,str3);
return new_win;
}
Then you can access the url from within PJS and keep scraping from there.
Note that new_win will be "undefined" because PJS doesn't implement it.

Related

Check Response code in Selenium WebDriver in Jmeter

I am logging into a webPage using Selenium WebDriver in Jmeter, and want to check that all the links are working fine. For that, i wanted to check the response code returned when clicked on the link.
var links = WDS.browser.findElements(pkg.By.cssSelector("a"));
var href;
links.forEach(myFunction);
function myFunction(item) {
WDS.log.info("link value" + item);
href = item.getAttribute("href");
statusCode = new HttpResponseCode().httpResponseCodeViaGet(href);
if(200 != statusCode) {
System.out.println(href + " gave a response code of " + statusCode);
}
}
But the above code doesn't seem to be working. I would be glad if anyone could help me with this.
Also, is there any alternate way to check if all the links are working fine, in Jmeter Selenium Webdriver using javascript?
we're not able to help you unless you show us the code of the HttpResponseCode().httpResponseCodeViaGet beast and the relevant error message from the jmeter.log file.
If the above function is something you copied and pasted from StackOverflow, I strongly doubt that it will ever work because the language of the WebDriver Sampler is not that JavaScript which is being executed by your browser, it's a limited subset of the browser version of JavaScript (for example there is no XMLHttpRequest there)
Instead you have full access to underlying Java SDK and JMeter API so I would recommend amending your function as follows:
var links = WDS.browser.findElements(org.openqa.selenium.By.cssSelector("a"));
var href;
links.forEach(myFunction);
function myFunction(item) {
WDS.log.info("link value" + item);
href = item.getAttribute("href");
var client = org.apache.http.impl.client.HttpClientBuilder.create().build()
var request = new org.apache.http.client.methods.HttpGet(href)
var response = client.execute(request)
var statusCode = response.getStatusLine().getStatusCode()
if(200 != statusCode) {
WDS.log.error(href + " gave a response code of " + statusCode);
}
}
More information:
The WebDriver Sampler: Your Top 10 Questions Answered
HttpClient Tutorial

Protractor - Unable to access element due to fixed Top navigation bar

I'm facing the following issue in protractor with jasmine
Click/mouse hover not working because of fixed top navigation bar in my application. I need to click/perform mouse hover on a web page.
Unfortunately that element is displaying behind that fixed navigation bar. So scroll till element present & click by x & y coordinates are not working.
My dependencies are :
protractor version 5.2.2
node 8.9.3
selenium standalone 3.13
chrome driver-2.40
chromebrowser v67
OS- Windows 10
Thanks in advance
Try using prototype executeScript
Just try clicking that element from the browser console using id,name or xpath.
For example :
var el = element(by.module('header'));
var tag = browser.executeScript('return arguments[0].click()', el).then(function() {
expect(something).toMatch(something);
});
Another way, along the same lines as what Bharath Kumar S and knowing JeffC's caveat that this approach is cheating, I had a similar issue where the App-Header kept getting in my way of clicking, and I knew I was willing to never need it (so, for instance, to find other ways to navigate or log out and not check for stuff that was on it). I, therefore, did the following, which solved the problem. Note if you refresh the screen, you have to call it again. Also note I am using a number of functions from https://github.com/hetznercloud/protractor-test-helper, which do what you would expect from their names.
var removeAppHeaderIfAny = async function() {
//this function hides the app header
//it is useful to avoid having covers there when Protractor worries that something else will get the click
let found = false;
try {
found = await waitToBeDisplayed(by.className("app-header"), 2000);
} catch (e) {
let s: string = "" + e;
if (s.search("TimeoutError") != 0) flowLog("presumably fine, cover already removed: " + e);
found = false;
}
if (!found) return;
if (found) {
let coverElement = await element(by.className("app-header"));
browser.executeScript(
"arguments[0].style.visibility='hidden';",
coverElement
);
await waitToBeNotDisplayed(by.className("app-header"), 10000);
}
return;
//note after this is called you will not see the item, so you cannot click it
};
As I look at the code, it strikes me one can probably remove the if (found) and associated brackets at the end. But I pasted in something I know has been working, so I am not messing with that.
As indicated up front, I knew I was willing to forego use of the app-header, and it is a bit crude.

Prevent facebook callback from appending '#_=_' to the redirected URL of my website using react-router

How to prevent facebook callback from appending '# _ = _ ' to the redirected URL of my website.
NOTE: I am using ReactJS
Simplest thing is to just reset it. Put this as close to your script's starting point as possible:
if (location.hash == "#_=_") location.hash = "";

using angularjs to build iframe url

i am getting an iframe image from a url that i want to be dynamic. So...for example i want to pass in a different url for each application # that i am previewing.
i've tried using ng-src to generate my url but it seems to be failing.
html:
<iframe ng-src="http://localhost:3000/v4/{{applicationNumberText}}/{{documentIdentifier}}"></iframe>
controller:
$scope.applicationNumberText = '09123456';
$scope.documentIdentifier = 'E1DUJW9JPP1GUI3';
getting this error:
angular.js:11706 Error: [$interpolate:noconcat] Error while interpolating
any ideas?
Well, for one you cannot concatenate a string like that. Second, you will need to use $sce and tell your app it is a trusted url resource. see fiddle: https://jsfiddle.net/ojzdxpt1/4/
app.controller('TestController', function($scope,$sce) {
$scope.applicationNumberText = '09123456';
$scope.documentIdentifier = 'E1DUJW9JPP1GUI3';
$scope.iFrameUrl = $sce.trustAsResourceUrl("http://localhost:3000/v4/" + $scope.applicationNumberText + "/" + $scope.documentIdentifier);
});

How to stop/abort/cancel a page load in PhantomJS?

I use PhantomJS to check a list of links for specific content. Once these content is found I would like to cancel the page load to avoid further ressource requests and continue with the next page to improve speed.
I filter requests in page.onResourceRequested and request.abort(); everything that doesn't match but that doesn't prevent PhantomJS from requesting till the site finished.
Tried page.stop(); -> crashes PhantomJS
The documentation seems lacking and I wonder if there is a command I can use to do that.
I can't use page.close(); because I need the page object after the content is found.
page.onResourceRequested = function(requestData, request) {
var matchUrlNeeded = ((/someregexforurl\/js/g).test(requestData.url));
if (matchUrlNeeded) {
doStuffWithTheUrl;
response.close();
request.abort();
page.cancel(); }
}
else {
//console.log("NO MATCH : " + requestData.url); request.abort(); } { }

Resources