what's the difference between the source code of a page, and the data which firebug can see - screen-scraping

I'm trying to scrape data from a webpage and firebug shows the data I want to extract but it's not shown in the source code when I right click "show source code".
Is this because firebug shows the dynamic content which gets loaded by javascript etc?
Is phantomjs and casperjs the best way of extracting the contents of this page, including all the div elements. I need to extract the data shown by firebug.
Does casper js have a casper.GrabHTML method, like mechanize and beautifulsoup? which will get all of the dom elements, like clsses, hrefs , links, buttons, text etc

This is the order in which stuff happens:
PHP generates HTML
Browser loads HTML
JavaScript manipulate loaded HTML
Why is this?
The view source browser feature normally shows the plain HTML as received by the browser. Other advanced tools like Firefug are able to display the current HTML after being changed by JavaScript. (Firefox itself has this feature as well: just right click on some generated HTML and choose "View selected source".)
How can I access the full (firebug html)?
I'm not sure about the HTML tab but the Network tab always displays documents as received from the server.
Can I do it in php/javascript?
PHP is no longer running when the original HTML reaches the browser.
JavaScript can display HTML with the .innerHTML property of any DOM node.

Related

Could you inject a script into a React page (with Draft.js editor) to append HTML to the existing contents of the editor?

I'm looking to mash up Chrome Extension's ability to inject scripts with Draft.js's ability to add content to the editor.
Would it be possible to inject a script like this into a React/Draft.js page, pass it a string (from the Chrome Extension), and have the script update the page?
Many thanks for any thoughts/guidance...

Render html (angularjs) on server and save as pdf

Say I have a page URL which is dynamic e.g. http://www.example.com/transaction=123. Now, based on the transaction the HTML will be different. The output will be html page with transaction info. I need to execute this on server and then save output as PDF.
Any example, reference to do this with nodejs?
Can I use PhantomJS in visual studio? (like after rendering use phantomjs to save as pdf)?

How to display my AngularJS correctly for GoogleBot and Optimizely?

I've a website called VoteCircle (www.votecircle.com), but i noted that it doesn't display well for Google Bot/Optimizely (used for A/B tests). It shows only the content that AREN'T in ng-view. All content in ng-view isn't displayed.
it was made in AngularJS and the content in ng-view isn't displayed for those bots/previews that i mentioned.
What's the best way to fix that?
Please, see attached screenshot.
Thanks.
There is a pretty easy fix for this. In your URL bar, click on the small key and enable mixed content. The browser blocks loading mixed content in the editor by default (HTTPS and HTTP resources combined). By enabling it you can load the rest of the page in the editor.

How do I debug Javascript in an AngularJS partial?

I am using the Routing and Multiple Views feature of AngularJS but I don't see the HTML partial file (or the embedded Javascript) in Chrome's "Sources" tab of the Developer Tools.
In my index.html file, it includes all the tags for AngularJS, jQuery and Bootstrap along with my custom app/controller Javascript file. These files all appear in the Sources tab.
My application works correctly. As I click around between the links on the page, the partial HTML files are loaded and displayed and the files are listed in the Network tab.
The problem is that the partial HTML files do not appear in the Sources tab. How can I debug the Javascript in those partial files?
It looks like the closest thing to a debugger is Zone.js from the Angular team, especially since it will be built right into AngularJS in the future.
This means adding some more code, but the benefits seem to outweigh the cost.

mailto: links gobbled in a route

I'm using the jquery linkify plugin on a relatively simple Backbone view. Links to web pages outside this app work properly and using browser view source, I see the mailto links are properly generated. But clicking a mailto link appends /mailto:q#example.com to the current URL (e.g., http://example-acme.staging.myservername.com/mailto:q#example.com).
If I copy the generated HTML using Inspector and paste it into the source of arbitrary pages (not in this app), the mailto links function as expected, opening a new message window from my mail client. Problem is the same in Chrome and Firefox.
Have you seen and fixed this issue?
Just for closure: I added an event handler in the handler use window.location.assign(mailto) and that gets the job done. Not necessarily 'correct' but practical. #Brad, thanks for chiming in!

Resources