How to stop/abort/cancel a page load in PhantomJS? - request

I use PhantomJS to check a list of links for specific content. Once these content is found I would like to cancel the page load to avoid further ressource requests and continue with the next page to improve speed.
I filter requests in page.onResourceRequested and request.abort(); everything that doesn't match but that doesn't prevent PhantomJS from requesting till the site finished.
Tried page.stop(); -> crashes PhantomJS
The documentation seems lacking and I wonder if there is a command I can use to do that.
I can't use page.close(); because I need the page object after the content is found.

page.onResourceRequested = function(requestData, request) {
var matchUrlNeeded = ((/someregexforurl\/js/g).test(requestData.url));
if (matchUrlNeeded) {
doStuffWithTheUrl;
response.close();
request.abort();
page.cancel(); }
}
else {
//console.log("NO MATCH : " + requestData.url); request.abort(); } { }

Related

Lighthouse: increase my website performance

I did the lighthouse test on my website the report shows some issues but I don't know where to start to fix these issues
I see one of the issues fix to enabling text comparison
I am using Angular.JS with gulp
my backend Node.js
I tried using https://www.npmjs.com/package/compression but nothing changed
function shouldCompress(req, res) {
if (req.headers['x-no-compression']) {
// don't compress responses with this request header
return false;
}
// fallback to standard filter function
return compression.filter(req, res);
}
app.use(compression({ filter: shouldCompress }));

Protractor - Unable to access element due to fixed Top navigation bar

I'm facing the following issue in protractor with jasmine
Click/mouse hover not working because of fixed top navigation bar in my application. I need to click/perform mouse hover on a web page.
Unfortunately that element is displaying behind that fixed navigation bar. So scroll till element present & click by x & y coordinates are not working.
My dependencies are :
protractor version 5.2.2
node 8.9.3
selenium standalone 3.13
chrome driver-2.40
chromebrowser v67
OS- Windows 10
Thanks in advance
Try using prototype executeScript
Just try clicking that element from the browser console using id,name or xpath.
For example :
var el = element(by.module('header'));
var tag = browser.executeScript('return arguments[0].click()', el).then(function() {
expect(something).toMatch(something);
});
Another way, along the same lines as what Bharath Kumar S and knowing JeffC's caveat that this approach is cheating, I had a similar issue where the App-Header kept getting in my way of clicking, and I knew I was willing to never need it (so, for instance, to find other ways to navigate or log out and not check for stuff that was on it). I, therefore, did the following, which solved the problem. Note if you refresh the screen, you have to call it again. Also note I am using a number of functions from https://github.com/hetznercloud/protractor-test-helper, which do what you would expect from their names.
var removeAppHeaderIfAny = async function() {
//this function hides the app header
//it is useful to avoid having covers there when Protractor worries that something else will get the click
let found = false;
try {
found = await waitToBeDisplayed(by.className("app-header"), 2000);
} catch (e) {
let s: string = "" + e;
if (s.search("TimeoutError") != 0) flowLog("presumably fine, cover already removed: " + e);
found = false;
}
if (!found) return;
if (found) {
let coverElement = await element(by.className("app-header"));
browser.executeScript(
"arguments[0].style.visibility='hidden';",
coverElement
);
await waitToBeNotDisplayed(by.className("app-header"), 10000);
}
return;
//note after this is called you will not see the item, so you cannot click it
};
As I look at the code, it strikes me one can probably remove the if (found) and associated brackets at the end. But I pasted in something I know has been working, so I am not messing with that.
As indicated up front, I knew I was willing to forego use of the app-header, and it is a bit crude.

Implementing google custom search in angularjs

I am trying to implement google custom search in an angular js website.
When I click on the search button it does not display me anything, but the url is updated to the url.
I have followed the steps mentioned in the documentation by google.
I am not sure what I am doing wrong?
My search bar is located on the home page as -
<gcse:searchbox-only enableAutoComplete="true" resultsUrl="#/searchresult" lr="lang_en" queryParameterName="search"></gcse:searchbox-only>
my search result has -
<gcse:searchresults-only lr="lang_en"></gcse:searchresults-only>
Any input is much appreciated.
Thanks,
You may have more than one problem happening at the same time...
1. Query Parameter mismatch
Your searchresults-only does not match the queryParameterName specified on gcse:searchbox-only.
Index.html
<gcse:searchresults-only queryParameterName="search"></gcse:searchresults-only>
Search.html
<gcse:searchresults-only queryParameterName="search"></gcse:searchresults-only>
2. Angular.js is blocking the flow of Google CSE
Under normal circumstances, Google Search Element will trigger an HTTP GET with the search parameter. However, since you are dealing with a one-page application, you may not see the query parameter. If that suspicion is true when you target resultsUrl="#/searchresult", then you have two options:
Force a HTTP GET on resultsUrl="http://YOURWEBSITE/searchresult". You may have to match routes, or something along those lines in order to catch the REST request (Ember.js is really easy to do so, but I haven't done in Angular.js yet.)
Use JQuery alongside Angular.js to get the input from the user on Index.html and manually trigger a search on search.html. How would you do it? For the index.html you would do something like below and for the results you would implement something like I answered in another post.
Index.html
<div>GSC SEARCH BUTTON HOOK: <strong><div id="search_button_hook">NOT ACTIVATED.</div></strong></div>
<div>GSC SEARCH TEXT: <strong><div id="search_text_hook"></div></strong></div>
<gcse:search ></gcse:search>
Index.js
//Hook a callback into the rendered Google Search. From my understanding, this is possible because the outermost rendered div has id of "___gcse_0".
window.__gcse = {
callback: googleCSELoaded
};
//When it renders, their initial customized function cseLoaded() is triggered which adds more hooks. I added comments to what each one does:
function googleCSELoaded() {
$(".gsc-search-button").click(function() {
$("#search_button_hook").text('HOOK ACTIVATED');
});
$("#gsc-i-id1").keydown(function(e) {
if (e.which == 13) {
$("#enter_keyboard_hook").text('HOOK ACTIVATED');
}
else{
$("#search_text_hook").text($("#gsc-i-id1").val());
}
});
}
(function() {
var cx = '001386805071419863133:cb1vfab8b4y';
var gcse = document.createElement('script');
gcse.type = 'text/javascript';
gcse.async = true;
gcse.src = 'https://cse.google.com/cse.js?cx=' + cx;
var s = document.getElementsByTagName('script')[0];
s.parentNode.insertBefore(gcse, s);
})();
I have a live version of the index.html code, but I don't make promises that will be permanently live since it is hosted in my NDSU FTP.

Dynamic content Single Page Application SEO

I am new to SEO and just want to get the idea about how it works for Single Page Application with dynamic content.
In my case, I have a single page application (powered by AngularJS, using router to show different state) that provides some location-based search functionalities, similar to Zillow, Redfin, or Yelp. On mt site, user can type in a location name, and the site will return some results based on the location.
I am trying to figure out a way to make it work well with Google. For example, if I type in "Apartment San Francisco" in Google, the results will be:
And when user click on these links, the sites will display the correct result. I am thinking about having similar SEO like these for my site.
The question is, the page content is purely depending on user's query. User can search by city name, state name, zip code, etc, to show different results, and it's not possible to put them all into sitemap. How google can crawl the content for these kind of dynamic page results?
I don't have experience with SEO and not sure how to do it for my site. Please share some experience or pointers to help me get started. Thanks a lot!
===========
Follow up question:
I saw Googlebot can now run Javascript. I want to understand a bit more of this. When a specific url of my SPA app is opened, it will do some network query (XHR request) for a few seconds and then the page content will be displayed. In this case, will GoogleBot wait for the http response?
I saw some tutorial says we need to prepare static html specifically for Search Engines. If I only want to deal with Google, does it mean I don't have to serve static html anymore because Google can run Javascript?
Thanks again.
If a search engine should come across your JavaScript application then we have the permission to redirect the search engine to another URL that serves the fully rendered version of the page.
For this job
You can either use this tool by Thomas Davis available on github
SEOSERVER
Or
you can use the code below which does the same job as above this code is also available here
Implementation using Phantom.js
We can setup a node.js server that given a URL, it will fully render the page content. Then we will redirect bots to this server to retrieve the correct content.
We will need to install node.js and phantom.js onto a box. Then start up this server below. There are two files, one which is the web server and the other is a phantomjs script that renders the page.
// web.js
// Express is our web server that can handle request
var express = require('express');
var app = express();
var getContent = function(url, callback) {
var content = '';
// Here we spawn a phantom.js process, the first element of the
// array is our phantomjs script and the second element is our url
var phantom = require('child_process').spawn('phantomjs',['phantom-server.js', url]);
phantom.stdout.setEncoding('utf8');
// Our phantom.js script is simply logging the output and
// we access it here through stdout
phantom.stdout.on('data', function(data) {
content += data.toString();
});
phantom.on('exit', function(code) {
if (code !== 0) {
console.log('We have an error');
} else {
// once our phantom.js script exits, let's call out call back
// which outputs the contents to the page
callback(content);
}
});
};
var respond = function (req, res) {
// Because we use [P] in htaccess we have access to this header
url = 'http://' + req.headers['x-forwarded-host'] + req.params[0];
getContent(url, function (content) {
res.send(content);
});
}
app.get(/(.*)/, respond);
app.listen(3000);
The script below is phantom-server.js and will be in charge of fully rendering the content. We don't return the content until the page is fully rendered. We hook into the resources listener to do this.
var page = require('webpage').create();
var system = require('system');
var lastReceived = new Date().getTime();
var requestCount = 0;
var responseCount = 0;
var requestIds = [];
var startTime = new Date().getTime();
page.onResourceReceived = function (response) {
if(requestIds.indexOf(response.id) !== -1) {
lastReceived = new Date().getTime();
responseCount++;
requestIds[requestIds.indexOf(response.id)] = null;
}
};
page.onResourceRequested = function (request) {
if(requestIds.indexOf(request.id) === -1) {
requestIds.push(request.id);
requestCount++;
}
};
// Open the page
page.open(system.args[1], function () {});
var checkComplete = function () {
// We don't allow it to take longer than 5 seconds but
// don't return until all requests are finished
if((new Date().getTime() - lastReceived > 300 && requestCount === responseCount) || new Date().getTime() - startTime > 5000) {
clearInterval(checkCompleteInterval);
console.log(page.content);
phantom.exit();
}
}
// Let us check to see if the page is finished rendering
var checkCompleteInterval = setInterval(checkComplete, 1);
Once we have this server up and running we just redirect bots to the server in our client's web server configuration.
Redirecting bots
If you are using apache we can edit out .htaccess such that Google requests are proxied to our middle man phantom.js server.
RewriteEngine on
RewriteCond %{QUERY_STRING} ^_escaped_fragment_=(.*)$
RewriteRule (.*) http://webserver:3000/%1? [P]
We could also include other RewriteCond, such as user agent to redirect other search engines we wish to be indexed on.
Though Google won't use _escaped_fragment_ unless we tell it to by either including a meta tag; <meta name="fragment" content="!">or using #! URLs in our links.
You will most likely have to use both.
This has been tested with Google Webmasters fetch tool. Make sure you include #! on your URLs when using the fetch tool.

Loop over all resources of a web page with CasperJS

How can I loop over all resources of a web page without Casper Default Time limit of 5 seconds ?
If i loop over Web Page' resources with
casper.waitForResource(function check(resource){
....
});
after 5000ms casper raise a test fail.
casper.waitForResource is not for looping over resources. Instead, it is for waiting for a specific resource that has some property. If you use a check function, you will have access to resources that are seen up to the point when the matching resource is found.
What you want is resource.received or similar event listeners. The question is what you want to do with the resource information. Keep in mind that CasperJS and the underlying PhantomJS do not expose the resource content. You will need to download it separately with __utils__.sendAJAX inside of the page context.
If you want the resource list right in the test flow for resources after a specific action, you could do something like this:
var resources = [],
collectResources = false;
casper.on('resource.received', function(resource) {
if (!collectResources) { return; }
resources.push(resource);
});
// later...
casper.then(function(){
collectResources = true;
this.click("#someAction");
}).wait(5000).then(function(){
collectResources = false;
resources.forEach(function(resource){
// do something with it
});
});

Resources