Html scraping with JS support - screen-scraping

I am trying to scrape a company web page for automation purposes but the embedded scripts in the page prevent me to fully replicate the request. The biggest pain is in the script generated cookies.
I thought of automating IE with Watin but I am not comfortable with this solution under a service application.
What are your advices in this situation?
Thanks in advance.

screen-scraper is another tool (java based) that aims at being easy to use.
The basic idea is as Byron said- you will have to figure out what cookies are getting set (web proxy tools like Fiddler, Charles, or browser extensions like Firebug and Chrome's dev tools will come in handy).
So, you don't necessarily have to read or even execute the javascript on the page to imitate the same requests. Just use a proxy tool to see what cookies your browser is sending to the server, and once you know what cookies the site expects to receive, set them manually in whatever script or tool you use to do your scraping and you'll be golden.

You have several options.
The easiest is to generate the cookies in your script. You will have to read the javascript code yourself and figure out what it is doing and duplicate. Fiddler is always your friend when scraping.
Htmlunit is a java web browser library with JavaScript support. It has no gui and is made for testing web applications.
Selenium will driver a browser much the same way watir does, but it has rich api support for most major languages.

Related

NextJS advice for SSG/SSR and Google Analytics/cookies

I have a NextJS site which is statically generated at build (SSG).
There are two things I need to implement next
Google Analytics
GDPR compliant opt-in cookie options
The first one is easy enough to do, however i'm struggling with making this GDPR compliant.
The issue is I don't have access to cookies at server side when my site is statically generated. This means that without knowing whether the user has consented to cookies at the server, I can't serve (or not serve) the analytics script along with the rest of the page.
Possible solutions:
Handle everything at client side - ask for consent, then dynamically add the GA tag to the <head>. However i'm worried this will negatively effect the analytics, or break it altogether. Does anyone know?
Change my site to be server-side rendered (SSR). I'd love to avoid this if possible. I'm really happy with how fast the site is running with SSG. It's essentially just a basic blog so would be a shame to have to convert for the sake of analytics.
Any other ideas?...
If anyone has experience with this, whether they used Next or Nuxt, etc, your input would be greatly appreciated!
Thanks in advance
Use Google Tag Manager to manage everything, your GA integration and your cookie integration using something like CookieHub for example (How to set up Google Analytics through Google Tag Manager for Next-Js?)
GTM will allow you to trigger the GA script only if the user specified he accepts analytics cookies.
Eitherway you could use Vercel.com built in analytics since your website is using Next.js wich is Vercel's framework.

How to do performance testing for multiple user using Chrome Dev Tool for angular JS Web Site

I have developed an Angular JS Web Console. Web Console is basically creating, deleting, retrieving and deleting Users.
I want to do its performance testing using Chrome Dev Tool or Jmeter
If I use Jmeter how can I actually monitor the behavior of web console itself because from Jmeter I can only check the response time of API.
If I use chrome dev tool then how can I test it for multiple users against post and get operations.
For Example I have a Scenario that 10 Users are registering or signing in at a time. How can I test this behaviour.
OR
50 Persons are creating or deleting or retrieving a user using a form at a time.
OR
What will be the behavior of web console if 50 users are using web console at a time.
NOTE: Web Console is deployed on server. I want to test it locally and on server as well.
Need help. Thanks in advance!
Server side performance and client-side performance are different beasts so you can break down your performance testing requirements into 2 major parts:
Conduct the required load onto your web console using JMeter HTTP Request samplers. Make sure you configure JMeter properly to handle cookies, cache, headers, embedded resources (scripts, styles, images). See How To Make JMeter Behave More Like A Real Browser article for comprehensive explanation with regards to how to configure JMeter properly. If you need the requests to be fired in exactly the same moment of time also consider Synchronizing Timer
As JMeter neither actually render pages nor executes client-side JavaScript you can check client-side performance using one of below approaches (or any combination)
Using YSlow software
Using aforementioned Chrome Dev Tools
Using WebDriver Sampler (which provides Selenium and JMeter integration) so you will be able to measure page rendering time. If necessary you can add custom scripting logic using Navigation Timing API to get some extended information on page loading events in automated manner

What is the actual advantage of running an AngularJS web app over Node.js instead of a server like Xampp?

As stated in the title,
I don't really understand how Node.js works and above all why it's actually used to run an AngularJS application (e.g. in WebStorm IDE this is the default option when you create an AngJS project).
I've got this doubt since I could run a simple AngularJS app on an Apache web server (within Xampp) without any involvement of NodeJS.
Thank you in advance
Node.js is an application platform. It's good for running your applications on.
Apache HTTPD is a web server. It's good at serving web pages.
They're two very different things, not directly related, and not mutually exclusive.
You are correct that many apps can run anywhere, but some benefits we've seen are:
Simplicity, especially for web developer also developing the server-side code/config/deploy.
Real-time web - easier to add in things like WebSockets and Server Sent Events if you need them.

Can you import Selenium Web driver into the Google App Engin

I am sort of new to app development, so this may sound like a stupid question. The company I am working with is trying to get ride of most of there IT infrastructure, so that they don't need any more servers. I have been asked to develop a program that takes information from a google spreadsheet and then with this information puts it into a web browser. I am Planning on using Phyton and selenium web driver. Will I be able to install selenium if i host the application as a Google app engine?
The Reason I want/need to use selenium web driver is because I need to put the information from google into a legacy system. The only way to put information in the system is to mimic a user putting the information in manual in a web browser.
Thank you,
Kai
I don't understand what you think Selenium will be doing here. It seems a very strange way to want to get information from one Google property into another.
Google spreadsheets have a perfectly good API that allows you to read the data from your app and display it to users.
Edit after question update Well, now I don't see what you need GAE for. That is for hosting and running websites, and you only seem to want to enter data into an existing website. That's not what it does at all.

Quick and easy client for testing WSRP?

I am writing a JSR-168 Portlet to be exposed as a service via WSRP on the WebSphere Portal Server... is there a good tool I could use to test the WSRP service on my desktop? I'm looking for something that would be considerably less hassle than installing Sharepoint and getting its WSRP module to work.
Apache Pluto, or Sun's reference portal would be the most lightweight containers to test things out locally. Here is an introduction:
http://developers.sun.com/portalserver/reference/techart/openportal_wsrp.html
You can also download Liferay, a full-featured open source portal which has easy WSRP configuration and is less of a hassle to get running than Sharepoint. For more technical testing, I would use SoapUI and test the individual service calls. Something I probably need to write a blog post about one day ;)

Resources